Difference between revisions of "Git"

From Mpich
Jump to: navigation, search
(Quick Start)
(Added information about how to create a new repository from the origin/master.)
Line 376: Line 376:
 
</nowiki></pre>
 
</nowiki></pre>
 
(blank left side of the <code>:</code> in the refspec indicates that the destination ref should be deleted)
 
(blank left side of the <code>:</code> in the refspec indicates that the destination ref should be deleted)
 +
 +
=== Creating a New Repository from MPICH Origin ===
 +
 +
To restrict permissions for a specific development branch, a new repository may be created. The example below shows how to create a new repository from the MPICH <code>origin/master</code>. Note that these steps need to be performed by users with administrative privileges on the MPICH Git system.
 +
 +
First, proceed to create a new repository: on the <code>gitolite-admin</code> repository, edit the <code>conf/gitolite.conf</code> file to add an entry for the new repository, setting the correct permissions (comments on that file will guide you through). For more information, see the section <code>Managing Access Controls</code> earlier.
 +
 +
Next, if the new repository is named <code>foo</code>, all you need are the following git commands:
 +
 +
<pre><nowiki>
 +
git remote add foo git@git.mpich.org:foo.git
 +
git checkout master
 +
git push foo origin/master:master
 +
</nowiki></pre>

Revision as of 00:05, 4 May 2013

Until January 7, 2013, MPICH used Subversion (SVN) for its version control system (VCS). Now we use git. This wiki page documents important information about the use of git within MPICH. Historical information about the SVN repository can be found here.

Important URLs

writeable clone URL git@git.mpich.org:mpich.git
read-only clone URL (via git protocol) git://git.mpich.org/mpich.git
read-only clone URL (via http) http://git.mpich.org/mpich.git

The HTTP URL is also a git-web instance, so you can view information about the repository and its contents/history in your browser if you navigate there. This is also a good alternative to using trac links in email, documentation, and (sometimes) commit messages.

IF YOU CANNOT ACCESS THE git@git.mpich.org:mpich.git URL BUT THINK YOU SHOULD BE ABLE TO, CONTACT devel@mpich.org SO THAT WE CAN ADD YOUR SSH PUBLIC KEY TO THE DATABASE

Quick Start

If you do not have the actual git tool, you can get that from the git website or your preferred software package management system (brew install git, apt-get install git-core, etc.).

The next step is to add the following (substituting your name and email) into your ~/.gitconfig file:

[user]
        name = Joe Developer
        email = joe@example.org

[color]
        diff = auto
        status = auto
        branch = auto
        ui = auto

# optional, but helps to distinguish between changed and untracked files
[color "status"]
        added = green
        changed = red
        untracked = magenta

# optional, but allows git to create 8-character abbreviated hashes, that are "trac-compatible" for automatic link generation in the comments.
[core]
        abbrev = 8

Quick start for authorized committers (core MPICH developers):

% git clone git@git.mpich.org:mpich.git

This will create an mpich directory in your $PWD that contains a completely functional repository with full project history.

If you do not have access to the writeable repository but think you should (because you are a core MPICH developer or collaborator), contact devel@mpich.org for access. The system works based on SSH keys, so we will need your SSH public key.

Everyone else who wishes to clone the repository must use one of the other URLs. These will still allow you to contribute back to MPICH with git format-patch.

Important Dos and Don'ts

Do:

  • Use git pull --rebase instead of git pull when the code in your current branch has never been pushed to the outside world. Otherwise you will end up creating unnecessary merge commits, which can make it difficult to understand development history. Better yet, sometimes it is easier to understand what is happening if you separate your git pull into git fetch followed by an explicit git rebase
  • Use git-style commit messages, with a short (~50 char) subject line and then a separate, more descriptive body.
  • Prefer making smaller logical commits rather than single large commits containing loosely or unrelated features.

Don't:

  • cherry-pick excessively. Cherry-picking should be a rare activity, not a frequent one. Talk to Dave if you want to understand this better.
  • merge just for the sake of merging. If you have a long-running, published topic branch, then don't merge from master (for example), just because "it's been a while". Instead, only merge to pick up specific features/fixes that are not suitable for cherry-picking. Name this feature/fix in the merge commit message.

Background Reading and References

  • Scott Chacon's "Pro Git" book is free on the web, and covers everything from installing git to concepts that are extremely advanced (like git replace).
  • Git for Computer Scientists provides background about how to think about git's object model. This is very useful for understanding the git man pages and reasoning about the effect of various commands.

Proposed Git Workflow

TBD. Right now, let's use it as a sort of much, much better SVN. One (somewhat heavyweight) option is to adopt git-flow. The upside is that it is a clearly documented process that is used by other groups. The downside is that it assumes a certain level of comfort with git that might be difficult to teach to students and others in a short time frame.

What follows is a comparison of one of the most common activities, committing a bug fix or small feature, shown in the old SVN approach and the recommended new git approach:

Old SVN Approach

From a fresh (or up-to-date and clean) svn working copy:

% vim foo.c   # edit an existing tracked file
...
% vim bar.c   # edit a new (not tracked by SVN) file
...
% svn status
M       foo.c
?       bar.c
% svn add bar.c
% svn status
M       foo.c
A       bar.c

Now let's say that you go home for the day without committing these changes. When you get in the next morning, you try to commit the change:

% svn update
...  # receive any updates made by others to the repository
% svn commit -m 'fixed bug in foo.c, using new bar.c to do so'
transmitting...
new revision rXYZ created  # (paraphrasing here, don't have output in front of me)

Now if there had been a conflict in foo.c at the svn update step, then we would have needed to do something like:

% vim foo.c
...   # search for conflict markers, resolve conflict
!!! ONE OF: (A) !!!
% svn resolved foo.c  # old-style SVN command
!!! OR (B) !!!
% svn resolve --accept working foo.c
% svn commit -m 'fixed bug in foo.c, using new bar.c to do so'
...

New Git Approach

And the same simple case in git (with extra git status commands thrown in for pedagogical purposes). This topic is covered more thoroughly in the Basic Merge Conflicts section of The Pro Git book, although in the equivalent context of merging instead of rebasing.

This example assumes that you have the master branch currently checked out, that your working tree and index are both clean (unmodified), and that the master branch is setup to track the origin/master "remote tracking branch":

% vim foo.c   # edit an existing tracked file
...
% vim bar.c   # edit a new (not tracked by git) file
...
% git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   foo.c
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       bar.c
no changes added to commit (use "git add" and/or "git commit -a")
% git status -s
 M foo.c
?? bar.c

Now let's say that you go home for the day without committing these changes. When you get in the next morning, you try to commit the change:

% git add bar.c
% git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       new file:   bar.c
#
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   foo.c
#
% git add foo.c
% git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       new file:   bar.c
#       modified:   foo.c
#
% git commit -m 'fixed bug in foo.c, using new bar.c to do so'
[master f36baae] fixed bug in foo.c, using new bar.c to do so
 1 file changed, 1 insertion(+)
 create mode 100644 bar.c

At this stage, you now have created a new commit that is only present in your local repository. In order to share this commit with others in the remote repository (named origin by default), we need to push the commit as well. But first we need to pull down and any changes made by others in the remote repository:

% git fetch origin
...
% git rebase   # b/c of tracking setup, "origin/master" is the implied argument
...
(the above two commands could be replaced by the equivalent "git pull --rebase")
% git push origin master
...

Now let's assume the same conflict in foo.c occurs as in our SVN example. The conflict will manifest itself at the rebase step. It will look something like this:

% git rebase   # b/c of tracking setup, "origin/master" is the implied argument
First, rewinding head to replay your work on top of it...
Applying: fixed bug in foo.c, using new bar.c to do so
Using index info to reconstruct a base tree...
M       foo.c
Falling back to patching base and 3-way merge...
Auto-merging foo.c
CONFLICT (content): Merge conflict in foo.c
Failed to merge in the changes.
Patch failed at 0001 fixed bug in foo.c, using new bar.c to do so
The copy of the patch that failed is found in:
   /Users/goodell/scratch/git-wiki-example/.git/rebase-apply/patch

When you have resolved this problem, run "git rebase --continue".
If you prefer to skip this patch, run "git rebase --skip" instead.
To check out the original branch and stop rebasing, run "git rebase --abort".

Now we need to resolve the conflict. We can do this in one of two ways:

% git mergetool
Merging:
foo.c

Normal merge conflict for 'foo.c':
  {local}: modified file
  {remote}: modified file
Hit return to start merge resolution tool (vimdiff): [I PRESSED ENTER]
... (search for conflict markers, fix conflict)

OR:

% vim foo.c
... (search for conflict markers, fix conflict)
% git add foo.c

The first option will evaluate the value of $EDITOR (or similar) in your environment and attempt to provide you with a useful mode for your editor to help you resolve the conflict. In my case, this is a vimdiff window showing the left, right, base, and working copy versions of the conflicted file. git mergetool will then automatically git add the file if all conflict markers have been removed from the file when you exit the mergetool editor. The second option is a slightly more manual version of the first option that looks more like the SVN approach to the problem.

Once all conflicts have been resolved, we simply continue the rebase operation:

% git rebase --continue
Applying fixed bug in foo.c, using new bar.c to do so

SVN History Migration

What has been imported?

Much of the history from our previous SVN repository has been migrated over to git. This includes:

  • All trunk history, with commit messages prefixed by "[svn-rXXXX] ". This history lives in the master branch, which is the git convention corresponding to SVN's trunk. The oldest history that was present in SVN was 1.0.6, so that's as far back as the git history goes.
  • All release tags (which are branch-like in SVN), with their history squashed down into a single commit. These commit messages have the format "[svn-synthetic] tags/release/mpich2-1.4.1p1". These commits were then tagged with annotated git tags with names like "v1.4.1p1".

Import Process and Caveats

The history was imported by a custom script because the MPICH SVN repository was more complicated than git-svn could handle. Specifically, the use of svn:externals caused a problem. Problem #1 is that git-svn cannot handle any form of SVN external natively. Problem #2 is that our past use of relative SVN externals (e.g., for confdb) was unversioned. This means that svn export -r XYZ $SVN_PATH (nor the pinned-revision variant, @XYZ) would not actually reproduce the correct working copy at revision XYZ if the confdb directory had been changed since XYZ. So the script jumped through a number of hoops in order to provide the expected result from svn export. Branch points were computed by hand, rather than attempt to teach the script to do this.

Why Git?

SVN has numerous, well documented deficiencies:

  • Branching and merging are nightmares.
  • Inspecting history is much more difficult than it is in git.
  • Working offline in SVN is limited.
  • Performance is slow.

The only three things that SVN had in its favor were inertia (we already had it installed, with other infrastructure built around it), support for fine-grained permissions via the MCS "authz" web page, and everyone basically knows how to use it at this point. Eventually our SVN pain began to exceed the inertia benefit and MCS Systems provided gitolite in order to self-administer permissions with finer granularity. The education issue is unfortunate, but this was an issue that simply must be overcome every time that a VCS becomes obsolete (it occurred for the CVS-->SVN migration).

Why was git chosen over another system (Mercurial, bzr, something else...)? Git arguably has the greatest slice of the distributed VCS market right now, so more of the world will know how to use it to interact with our project. Several existing team members already used git regularly, through a limited git-svn clone of the MPICH SVN repository.

Dealing With Development Branches/Repositories

In SVN we had branches in https://svn.mcs.anl.gov/repos/mpi/mpich2/branches/dev, which we would typically refer to as dev/FOO. Many of these branches had restricted permissions, especially when used to collaborate on a research paper or with a vendor. Because of git's distributed nature, it is difficult/impossible to restrict read permissions for a specific branch within a repository. So these development branches have each been put into their own new repositories. Not all development branches were migrated (only ones actively in use).

The basic pattern is that SVN branches named dev/FOO have been placed into a repository named dev/FOO, containing a sole branch also named FOO. These repositories are not listed via git-web or the git daemon, so you must use the SSH form of the clone URL when cloning or adding a remote for these development epositories. The basic procedure for adding these dev branches to a local, already cloned copy of "origin" is:

% git remote add dev/FOO --fetch git@git.mpich.org:dev/FOO.git
Updating dev/FOO
remote: Counting objects: 5858, done.
remote: Compressing objects: 100% (2354/2354), done.
remote: Total 3596 (delta 1777), reused 3008 (delta 1228)
Receiving objects: 100% (3596/3596), 2.90 MiB, done.
Resolving deltas: 100% (1777/1777), completed with 777 local objects.
From git.mpich.org:dev/FOO
 * [new branch]      FOO -> dev/FOO/FOO

That is, we are actually doing two things:

  • adding a new git "remote" by the name of dev/FOO;
  • fetching its content, especially the FOO branch. The "remote tracking branch" in our local repository is then named dev/FOO/FOO (mildly confusing, unfortunately).

You will probably then want to create a local branch to track the remote branch:

% git branch FOO dev/FOO/FOO
% git checkout FOO

You can now start hacking away on your local FOO branch.

In the unlikely case you just want to check out a dev branch and don't want to bother with the "origin" repo too, you can do this instead:

% git clone --origin dev/FOO --branch BRANCH git@git.mpich.org:dev/FOO mpich-FOO.git

(where BRANCH is probably FOO, but could be something else).

At the current moment we do not have any easy way to list all of the available dev/ branches unless you have permissions to view the gitolite-admin.git repository. If you think you should have access to a particular development branch, contact devel@mpich.org or the specific MPICH core developer with whom you are working. You can list the dev branches to which you already have access by running:

% ssh git@git.mpich.org info
hello goodell, this is git@caveat running gitolite3 v3.2-13-gf89408a on git 1.7.0.4

 R W    dev/FOO
 R W    gitolite-admin
 R W    mpich

The Pro Git book has more information about working with remotes online.

Advanced Topics

Managing Access Controls

The git repositories on git.mpich.org are hosted at MCS using gitolite. Gitolite has a very informative manual, which I recommend reading if you have questions about the overall setup or detailed permissions issues.

Access to these repositories is controlled through a git repository that is also hosted on the same gitolite server. To access it, clone git@git.mpich.org:gitolite-admin.git. This repository contains two primary parts: a configuration file (conf/gitolite.conf) and a directory full of public SSH keys. The configuration file specifies which repositories are valid on the server and which users have particular permissions to access those repositories. At the top of the configuration file is a nice big comment that explains the basic format and permissions rules. If at all in doubt, consult the manual and/or goodell@ before making a change. The rules are not hard, but, just like making firewall rule changes, small mistakes can lead to real problems. The keydir contains files with the format USERNAME.pub or USERNAME@NUM.pub. These files should each contain a valid public SSH key for the user given by USERNAME.

Permission changes and repository creation are triggered by pushes to the gitolite-admin.git repository. So once you make your changes to this repository, commit them (git commit ...) and then push the repository back up to git@git.mpich.org:mpich.git.

MPICH-core committers can also create repositories by pushing to any repository that has a name of the format u/USERNAME/REPONAME or papers/REPONAME. Permissions for these repositories are managed by the creator by running ssh git@git.mpich.org perms .... The gitolite manual has a section explaining this command. This eliminates the need to fiddle with the gitolite-admin repository, but you will then need to use this alternative ssh-based method if permissions need to be changed.

Updating the OpenPA Subtree

To be written.

Review Repository/Branches

(see also Jenkins#.22mpich-review.22)

In addition to the primary repository (mpich.git or origin), there is a repository to facilitate peer code review and testing. This repository is accessible through this git URL: git@git.mpich.org:review.git. If you have regular RW push access to origin, then you should have RW+ access to review.git. That is, you can push, rewind, and delete branches on review.git. So if a colleague reviews your code and spots a bug that needs to be fixed, or the continuous integration system catches a problem, then you can choose to rewrite the commits on the given topic branch and force-push the new branch to the review repository.

Any branch which is pushed to this repository will automatically be tested by our Jenkins server. Unstable/failure result emails will be sent to builds@mpich.org (list subscription is moderated), though unfortunately "stable" results will not reliably be emailed given our current configuration.

example review branch usage

Build a feature on a feature branch ("foo") which is based on origin/master:

% git checkout -b foo origin/master
...
% vim foo.c
...
% git add foo.c
% git commit -m 'added foo capability'
...

We want to make sure it passes the tests and give a colleague a chance to review this new code. So let's push it up to the review repo instead of pushing directly to origin/master:

% git push git@git.mpich.org:review.git foo:foo

(note that the git@... URL could be replaced with a proper "git remote" name if you choose to add the review repository as a real remote)

Now email your colleague and let him/her know that you would like him to review the code on review/foo. If you are really unsure that the continuous integration tests will pass, maybe wait until they complete. You can check the results here (from within the ANL firewall only). Let's say that either your colleague or the CI system found a problem. Let's fix it and re-push the updated version to the review repository to make sure that it still passes the tests:

% git checkout foo
...
% vim foo.c
...
% git add foo.c
% git commit --amend -m 'added foo capability, but better'
...
% git push git@git.mpich.org:review.git +foo:foo

(the + at the beginning of the refspec indicates that you want to forcibly overwrite the destination reference)

Once you are finally happy with the code, push it to origin/master. Obviously, master may have changed since you originally began the work, so you may need to either merge into the latest master or rebase on top of master first:

% git checkout master
% git merge foo
...
% git push origin master
...

(the origin/master branch is also watched by the continuous integration system, so the final merged version will also get tested if you don't want to push the merge version up to the review repo first)

At this point you would also ideally remember to delete the review branch, though these things are easy enough to delete later in a big cleanup pass:

% git push git@git.mpich.org:review.git :foo

(blank left side of the : in the refspec indicates that the destination ref should be deleted)

Creating a New Repository from MPICH Origin

To restrict permissions for a specific development branch, a new repository may be created. The example below shows how to create a new repository from the MPICH origin/master. Note that these steps need to be performed by users with administrative privileges on the MPICH Git system.

First, proceed to create a new repository: on the gitolite-admin repository, edit the conf/gitolite.conf file to add an entry for the new repository, setting the correct permissions (comments on that file will guide you through). For more information, see the section Managing Access Controls earlier.

Next, if the new repository is named foo, all you need are the following git commands:

git remote add foo git@git.mpich.org:foo.git
git checkout master
git push foo origin/master:master