Difference between revisions of "Git"

From Mpich
Jump to: navigation, search
(add a section about development branches and repositories)
Line 283: Line 283:
  
 
At the current moment we do not have any easy way to list the available <code>dev/</code> branches unless you have permissions to view the <code>gitolite-admin.git</code> repository.  If you think you should have access to a particular development branch, contact dev@mpich.org or the specific MPICH core developer with whom you are working.
 
At the current moment we do not have any easy way to list the available <code>dev/</code> branches unless you have permissions to view the <code>gitolite-admin.git</code> repository.  If you think you should have access to a particular development branch, contact dev@mpich.org or the specific MPICH core developer with whom you are working.
 +
 +
== Managing Access Controls ==
 +
 +
To be written.  In the interim, contact goodell@mcs.anl.gov if you need to know how to deal with this.

Revision as of 23:42, 8 January 2013

Until January 7, 2013, MPICH used Subversion (SVN) for its version control system (VCS). Now we use git. This wiki page documents important information about the use of git within MPICH.

Important URLs

writeable clone URL git@git.mpich.org:mpich.git
read-only clone URL (via git protocol) git://git.mpich.org/mpich.git
read-only clone URL (via http) http://git.mpich.org/mpich.git

The HTTP URL is also a git-web instance, so you can view information about the repository and its contents/history in your browser if you navigate there. This is also a good alternative to using trac links in email, documentation, and (sometimes) commit messages.

IF YOU CANNOT ACCESS THE git@git.mpich.org:mpich.git URL BUT THINK YOU SHOULD BE ABLE TO, CONTACT dev@mpich.org SO THAT WE CAN ADD YOUR SSH PUBLIC KEY TO THE DATABASE

Quick Start

If you do not have the actual git tool, you can get that from the git website or your preferred software package management system (brew install git, apt-get install git-core, etc.).

The next step is to add the following (substituting your name and email) into your ~/.gitconfig file:

[user]
        name = Joe Developer
        email = joe@example.org

[color]
        diff = auto
        status = auto
        branch = auto
        ui = auto

# optional, but helps to distinguish between changed and untracked files
[color "status"]
        added = green
        changed = red
        untracked = magenta

Quick start for authorized committers (core MPICH developers):

% git clone git@git.mpich.org:mpich.git

This will create an mpich directory in your $PWD that contains a completely functional repository with full project history.

If you do not have access to the writeable repository but think you should (because you are a core MPICH developer or collaborator), contact dev@mpich.org for access. The system works based on SSH keys, so we will need your SSH public key.

Everyone else who wishes to clone the repository must use one of the other URLs. These will still allow you to contribute back to MPICH with git format-patch.

Important Dos and Don'ts

Do:

  • Use git pull --rebase instead of git pull when the code in your current branch has never been pushed to the outside world. Otherwise you will end up creating unnecessary merge commits, which can make it difficult to understand development history. Better yet, sometimes it is easier to understand what is happening if you separate your git pull into git fetch followed by an explicit git rebase
  • Use git-style commit messages, with a short (~50 char) subject line and then a separate, more descriptive body.
  • Prefer making smaller logical commits rather than single large commits containing loosely or unrelated features.

Don't:

  • cherry-pick excessively. Cherry-picking should be a rare activity, not a frequent one. Talk to Dave if you want to understand this better.
  • merge just for the sake of merging. If you have a long-running, published topic branch, then don't merge from master (for example), just because "it's been a while". Instead, only merge to pick up specific features/fixes that are not suitable for cherry-picking. Name this feature/fix in the merge commit message.

Background Reading and References

  • Scott Chacon's "Pro Git" book is free on the web, and covers everything from installing git to concepts that are extremely advanced (like git replace).
  • Git for Computer Scientists provides background about how to think about git's object model. This is very useful for understanding the git man pages and reasoning about the effect of various commands.

Proposed Git Workflow

TBD. Right now, let's use it as a sort of much, much better SVN. One (somewhat heavyweight) option is to adopt git-flow. The upside is that it is a clearly documented process that is used by other groups. The downside is that it assumes a certain level of comfort with git that might be difficult to teach to students and others in a short time frame.

What follows is a comparison of one of the most common activities, committing a bug fix or small feature, shown in the old SVN approach and the recommended new git approach:

Old SVN Approach

From a fresh (or up-to-date and clean) svn working copy:

% vim foo.c   # edit an existing tracked file
...
% vim bar.c   # edit a new (not tracked by SVN) file
...
% svn status
M       foo.c
?       bar.c
% svn add bar.c
% svn status
M       foo.c
A       bar.c

Now let's say that you go home for the day without committing these changes. When you get in the next morning, you try to commit the change:

% svn update
...  # receive any updates made by others to the repository
% svn commit -m 'fixed bug in foo.c, using new bar.c to do so'
transmitting...
new revision rXYZ created  # (paraphrasing here, don't have output in front of me)

Now if there had been a conflict in foo.c at the svn update step, then we would have needed to do something like:

% vim foo.c
...   # search for conflict markers, resolve conflict
!!! ONE OF: (A) !!!
% svn resolved foo.c  # old-style SVN command
!!! OR (B) !!!
% svn resolve --accept working foo.c
% svn commit -m 'fixed bug in foo.c, using new bar.c to do so'
...

New Git Approach

And the same simple case in git (with extra git status commands thrown in for pedagogical purposes). This topic is covered more thoroughly in the Basic Merge Conflicts section of The Pro Git book, although in the equivalent context of merging instead of rebasing.

This example assumes that you have the master branch currently checked out, that your working tree and index are both clean (unmodified), and that the master branch is setup to track the origin/master "remote tracking branch":

% vim foo.c   # edit an existing tracked file
...
% vim bar.c   # edit a new (not tracked by git) file
...
% git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   foo.c
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       bar.c
no changes added to commit (use "git add" and/or "git commit -a")
% git status -s
 M foo.c
?? bar.c

Now let's say that you go home for the day without committing these changes. When you get in the next morning, you try to commit the change:

% git add bar.c
% git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       new file:   bar.c
#
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   foo.c
#
% git add foo.c
% git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       new file:   bar.c
#       modified:   foo.c
#
% git commit -m 'fixed bug in foo.c, using new bar.c to do so'
[master f36baae] fixed bug in foo.c, using new bar.c to do so
 1 file changed, 1 insertion(+)
 create mode 100644 bar.c

At this stage, you now have created a new commit that is only present in your local repository. In order to share this commit with others in the remote repository (named origin by default), we need to push the commit as well. But first we need to pull down and any changes made by others in the remote repository:

% git fetch origin
...
% git rebase   # b/c of tracking setup, "origin/master" is the implied argument
...
(the above two commands could be replaced by the equivalent "git pull --rebase")
% git push origin master
...

Now let's assume the same conflict in foo.c occurs as in our SVN example. The conflict will manifest itself at the rebase step. It will look something like this:

% git rebase   # b/c of tracking setup, "origin/master" is the implied argument
First, rewinding head to replay your work on top of it...
Applying: fixed bug in foo.c, using new bar.c to do so
Using index info to reconstruct a base tree...
M       foo.c
Falling back to patching base and 3-way merge...
Auto-merging foo.c
CONFLICT (content): Merge conflict in foo.c
Failed to merge in the changes.
Patch failed at 0001 fixed bug in foo.c, using new bar.c to do so
The copy of the patch that failed is found in:
   /Users/goodell/scratch/git-wiki-example/.git/rebase-apply/patch

When you have resolved this problem, run "git rebase --continue".
If you prefer to skip this patch, run "git rebase --skip" instead.
To check out the original branch and stop rebasing, run "git rebase --abort".

Now we need to resolve the conflict. We can do this in one of two ways:

% git mergetool
Merging:
foo.c

Normal merge conflict for 'foo.c':
  {local}: modified file
  {remote}: modified file
Hit return to start merge resolution tool (vimdiff): [I PRESSED ENTER]
... (search for conflict markers, fix conflict)

OR:

% vim foo.c
... (search for conflict markers, fix conflict)
% git add foo.c

The first option will evaluate the value of $EDITOR (or similar) in your environment and attempt to provide you with a useful mode for your editor to help you resolve the conflict. In my case, this is a vimdiff window showing the left, right, base, and working copy versions of the conflicted file. git mergetool will then automatically git add the file if all conflict markers have been removed from the file when you exit the mergetool editor. The second option is a slightly more manual version of the first option that looks more like the SVN approach to the problem.

Once all conflicts have been resolved, we simply continue the rebase operation:

% git rebase --continue
Applying fixed bug in foo.c, using new bar.c to do so

SVN History Migration

What has been imported?

Much of the history from our previous SVN repository has been migrated over to git. This includes:

  • All trunk history, with commit messages prefixed by "[svn-rXXXX] ". This history lives in the master branch, which is the git convention corresponding to SVN's trunk. The oldest history that was present in SVN was 1.0.6, so that's as far back as the git history goes.
  • All release tags (which are branch-like in SVN), with their history squashed down into a single commit. These commit messages have the format "[svn-synthetic] tags/release/mpich2-1.4.1p1". These commits were then tagged with annotated git tags with names like "v1.4.1p1".

Import Process and Caveats

The history was imported by a custom script because the MPICH SVN repository was more complicated than git-svn could handle. Specifically, the use of svn:externals caused a problem. Problem #1 is that git-svn cannot handle any form of SVN external natively. Problem #2 is that our past use of relative SVN externals (e.g., for confdb) was unversioned. This means that svn export -r XYZ $SVN_PATH (nor the pinned-revision variant, @XYZ) would not actually reproduce the correct working copy at revision XYZ if the confdb directory had been changed since XYZ. So the script jumped through a number of hoops in order to provide the expected result from svn export. Branch points were computed by hand, rather than attempt to teach the script to do this.

Why Git?

SVN has numerous, well documented deficiencies:

  • Branching and merging are nightmares.
  • Inspecting history is much more difficult than it is in git.
  • Working offline in SVN is limited.
  • Performance is slow.

The only three things that SVN had in its favor were inertia (we already had it installed, with other infrastructure built around it), support for fine-grained permissions via the MCS "authz" web page, and everyone basically knows how to use it at this point. Eventually our SVN pain began to exceed the inertia benefit and MCS Systems provided gitosis in order to self-administer permissions with finer granularity. The education issue is unfortunate, but this was an issue that simply must be overcome every time that a VCS becomes obsolete (it occurred for the CVS-->SVN migration).

Why was git chosen over another system (Mercurial, bzr, something else...)? Git arguably has the greatest slice of the distributed VCS market right now, so more of the world will know how to use it to interact with our project. Several existing team members already used git regularly, through a limited git-svn clone of the MPICH SVN repository.

Dealing With Development Branches/Repositories

In SVN we had branches in https://svn.mcs.anl.gov/repos/mpi/mpich2/branches/dev, which we would typically refer to as dev/FOO. Many of these branches had restricted permissions, especially when used to collaborate on a research paper or with a vendor. Because of git's distributed nature, it is difficult/impossible to restrict read permissions for a specific branch within a repository. So these development branches have each been put into their own new repositories. Not all development branches were migrated (only ones actively in use).

The basic pattern is that SVN branches named dev/FOO have been placed into a repository named dev/FOO, containing a sole branch also named FOO. These repositories are not listed via git-web or the git daemon, so you must use the SSH form of the clone URL when cloning or adding a remote for these development epositories. The basic procedure for adding these dev branches into your local repository is:

% git remote add dev/FOO --fetch git@git.mpich.org:dev/FOO.git
Updating dev/FOO
remote: Counting objects: 5858, done.
remote: Compressing objects: 100% (2354/2354), done.
remote: Total 3596 (delta 1777), reused 3008 (delta 1228)
Receiving objects: 100% (3596/3596), 2.90 MiB, done.
Resolving deltas: 100% (1777/1777), completed with 777 local objects.
From git.mpich.org:dev/FOO
 * [new branch]      FOO -> dev/FOO/FOO

That is, we are actually doing two things:

  • adding a new git "remote" by the name of dev/FOO;
  • fetching its content, especially the FOO branch. The "remote tracking branch" in our local repository is then named dev/FOO/FOO (mildly confusing, unfortunately).

You will probably then want to create a local branch to track the remote branch:

% git branch FOO dev/FOO/FOO
% git checkout FOO

You can now start hacking away on your local FOO branch.

At the current moment we do not have any easy way to list the available dev/ branches unless you have permissions to view the gitolite-admin.git repository. If you think you should have access to a particular development branch, contact dev@mpich.org or the specific MPICH core developer with whom you are working.

Managing Access Controls

To be written. In the interim, contact goodell@mcs.anl.gov if you need to know how to deal with this.