The Universe of Disco


Thu, 15 Mar 2012

My Git Habits
Miles Gould asked his Twitter followers whether they used git-add -p or git-commit -a and how often. My reply was too long for Twitter, so here it is.

First the short version: I use git-add -p frequently, and git-commit -a almost never. The exception is when I'm working on the repo that holds my blog, where I rarely commit changes to more than one or two files at a time. Then I'll usually just git-commit -a -m ....

But I use git-add -p all the time. Typically what will happen is that I will be developing some fairly complicated feature. It will necessitate a bunch of changes and reshuffling elsewhere in the system. I'll make commits on the topic branch as I go along without worrying too much about whether the commits are neatly packaged.

Often I'll be in the middle of something, with a dirty work tree, when it's time to leave for the day. Then I'll just commit everything with the subject WIP ("work-in-progress"). First thing the next morning I'll git-reset HEAD^ and continue where I left off.

So the model is that the current head is usually a terrible mess, accumulating changes as it moves forward in time. When I'm done, I will merge the topic into master and run the tests.

If they pass, I am not finished. The merge I just created is only a draft merge. The topic branch is often full of all sorts of garbage, commits where I tried one approach, found it didn't work later on, and then tried a different approach, places where I committed debugging code, and so on. So it is now time to clean up the topic branch. Only the cleaned-up topic branch gets published.

Cleaning up messy topic branches

The core of the cleanup procedure is to reset the head back to the last place that look good, possibly all the way back to the merge-base if that is not too long ago. This brings all the topic changes into the working directory. Then:

  1. Compose the commits: Repeat until the working tree is clean:
    1. Eyeball the output of git-diff
    2. Think of an idea for an intelligible commit
    3. Use git-add -p to stage the planned commit
    4. Use git diff --cached to make sure it makes sense
    5. Commit it
  2. Order the commits: Use git-rebase --interactive
Notice that this separates the work of composing the commits from the work of ordering them. This is more important than it might appear. It would be extremely difficult to try to do these at the same time. I can't know the sensible order for the commits until I know what the commits are! But it's very hard to know what the commits are without actually making them.

By separating these tasks, I can proceed something like this: I eyeball the diff, and the first thing I see is something about the penguin feature. I can immediately say "Great, I'll make up a commit of all the stuff related to the penguin feature", and proceed to the git-add -p step without worrying that there might be other stuff that should precede the penguin feature in the commit sequence. I can focus on just getting the penguin commit right without needing to think about any of the other changes.

When the time comes to put the commits in order, I can do it well because by then I have abstracted away all the details, and reduced each group of changes to a single atomic unit with a one-line description.

For the most complicated cases, I will print out the diffs, read them over, and mark them up in six colors of highlighter: code to throw away gets marked in orange; code that I suspect is erroneous is pink. I make many notes in pen to remind me how I want to divide up the changes into commits. When a commit occurs to me I'll jot a numbered commit message, and then mark all the related parts of the diff with that number. Once I have the commits planned, I'll reset the topic ref and then run through the procedure above, using git-add -p repeatedly to construct the commits I planned on paper. Since I know ahead of time what they are I might do them in the right order, but more likely I'll just do them in the order I thought of them and then reorder them at the end, as usual.

For simple cases I'll just do a series of git-rebase --interactive passes, pausing at any leftover WIP commits to run the loop above, reordering the commits to squash related commits together, and so on.

The very simplest cases of all require no cleanup, of course.

For example, here's my current topic branch, called c-domain, with the oldest commits at the top:

        055a2f7 correction to bulk consumer template
        d9630bd DomainActivator half of Pobox Domain consumer
        ebebb4a Add HasDomain role to provide ->domain reader for domain consumers
        ade6ac6 stubbed domain test
        e170e77 start templates for Pobox domain consumers
        067ca81 stubbed Domain::ThumbTwiddler
        685a3ee cost calculations for DomainActivator
        ec8b1cc test fixes; trivial domain test passes now
        845b1f2 rename InvoiceCharge::CreateDomain to ..::RegisterDomain
(e)     6083a97 add durations to Domain consumers and charges
        c64fda0 tests for Domain::Activator consumer
        41e4292 repeat activator tests for 1-year and 3-year durations
        7d68065 tests for activator's replacement
(d)     87f3b09 move days_in_year to Moonpig::Util
        3cd9f3b WIP
        e5063d4 add test for sent invoice in domain.t
        c8dbf41 WIP
        9e6ffa4 add missing MakesReplacement stuff
        fc13059 bring in Net::OpenSRS module
(c)     52c18fb OpenSRS interface
        893f16f notes about why domain queries might fail
(b)     f64361f rename "croak" method to "fail" to avoid conflicts
        4e500ec Domain::Activator initial_invoice_charge_pairs
(a)     3c5cdd4 WIP
3c5cdd4 (a) was the end-of-day state for yesterday; I made it and pushed it just before I dashed out the door to go home. Such commits rarely survive beyond the following morning, but if I didn't make them, I wouldn't be able to continue work from home if the mood took me to do that.

f64361f (b) is a prime candidate for later squashing. 5c218fb (c) introduced a module with a "croak" method. This turned out to be a stupid idea, because this conflicted with the croak function from Perl's Carp module, which we use everywhere. I needed to rename it. By then, the intervening commit already existed. I probably should have squashed these right away, but I didn't think of it at the time. No problem! Git means never having to say "If only I'd realized sooner."

Similarly, 6083a97 (e) added a days_in_year function that I later decided at 87f3b09 (d) should be in a utility module in a different repository. 87f3b09 will eventually be squashed into 6083a97 so that days_in_year never appears in this code at all.

I don't know what is in the WIP commits c8dbf41 or 3cd9f3b, for which I didn't invent commit messages. I don't know why those are left in the tree, but I can figure it out later.

An example cleanup

Now I'm going to clean up this branch. First I git-checkout -b cleanup c-domain so that if something goes awry I can start over completely fresh by doing git-reset --hard c-domain. That's probably superfluous in this case because origin/c-domain is also pointing to the same place, and origin is my private repo, but hey, branches are cheap.

The first order of business is to get rid of those WIP commits. I'll git-reset HEAD^ to bring 3c5cdd4 into the working directory, then use git-status to see how many changes there are:

         M lib/Pobox/Moonpig/Consumer/Domain/Activator.pm
         M lib/Pobox/Moonpig/Role/HasDomain.pm
         M lib/Pobox/Moonpig/TemplateSet.pm
        ?? bin/register_domains
         M t/consumer/domain.t
        ?? t/lib/MockOpenSRS.pm
(This is the output from git-status --short, for which I have an alias, git s. I use this probably 99 times as often as plain git-status.)

Not too bad, probably no need for a printout. The new bin/register-domains program can go in right away by itself:

        % git add bin
        % git commit -m 'new register_domains utility program'
Next I'll deal with that new mock object class in t/lib/MockOpenSRS.pm. I'll add that, then use git-add -p to add the related changes from the other files:

        % git add t/lib
        % git add -p
        ...
        % git s
        MM lib/Pobox/Moonpig/Consumer/Domain/Activator.pm
         M lib/Pobox/Moonpig/Role/HasDomain.pm
         M lib/Pobox/Moonpig/TemplateSet.pm
        A  t/lib/MockOpenSRS.pm
        MM t/consumer/domain.t
        % git ix
        ...
The git ix command at the end there is an alias for git diff --cached: it displays what's staged in the index. The output looks good, so I'll commit it:

        % git commit -m 'mock OpenSRS object; add tests'
Now I want to see if those tests actually pass. Maybe I forgot something!
        % git stash
        % make test
        ...
        OK
        % git stash pop
The git-stash command hides the unrelated changes from the test suite so that I can see if the tests I just put into t/consumer/domain.t work properly. They do, so I bring back the stashed changes and continue. If they didn't, I'd probably amend the last commit with git commit --amend and try again.

Continuing:

        % git diff
        ...
        % git add -p lib/Pobox/Moonpig/Role/HasDomain.pm
        ...
        % git commit -m 'Domains do not have explicit start dates'
        % git diff
        ...
        % git add -p
        ...
        % git commit --fixup :/mock
That last bit should have been part of the "mock OpenSRS object" commit, but I forgot it. So I make a fixup commit, which I'll merge into the main commit later on. A fixup commit is one whose subject begins with fixup!. Did you know that you can name a commit by writing :/text, and it names the most recent commit whose message contains that text?

It goes on like that for a while:

        % git diff
        ...
        % git add -p ...
        ...
        % git commit -m 'Activator consumer can generate special charges'
        % git diff
        ...
        % git checkout lib/Pobox/Moonpig/Role/HasDomain.pm
The only uncommitted change left in HasDomain.pm was a superfluous line, so I just threw it away.

        % git diff
        ...
        % git add -u
        % git commit -m 'separate templates for domain-registering and domain-renewing consumers'
By this time all the remaining changes belong in the same commit, so I use git-add -u to add them all at once. The working tree is now clean. The history is as I showed above, except that in place of the final WIP commit, I have:

        a3c0b92 new register_domains utility program
        53d704d mock OpenSRS object; add tests
        a24acd8 Domains do not have explicit start dates
        17a915d fixup! mock OpenSRS object; add tests
        86e472b Activator consumer can generate special charges
        5b2ad2b separate templates for domain-registering and domain-renewing consumers
(Again the oldest commit is first.) Now I'll get rid of that fixup!:

        % git rebase -i --autosquash HEAD~6
Because of --autosquash, the git-rebase menu is reordered so that the fixup commit is put just after the commit it fixes up, and its default action is 'fixup' instead of 'pick'. So I don't need to edit the rebase instructions at all. But I might as well take the opportunity to put the commits in the right order. The result is:

        a3c0b92 new register_domains utility program
        ea8dacd Domains do not have explicit start dates
        297366a separate templates for domain-registering and domain-renewing consumers
        4ef0e28 mock OpenSRS object; add tests
        c3ab1eb Activator consumer can generate special charges
I have two tools for dealing with cleaned-up branches like this one. One is git-vee, which compares two branches. It's just a wrapper around the command git log --decorate --cherry-mark --oneline --graph --boundary A"..."B.

Here's a comparison the original c-domain branch and my new cleanup version:

        % git vee c-domain
        * c3ab1eb (HEAD, cleanup) Activator consumer can generate special charges
        * 4ef0e28 mock OpenSRS object; add tests
        * 297366a separate templates for domain-registering and domain-renewing consumer
        * ea8dacd Domains do not have explicit start dates
        * a3c0b92 new register_domains utility program
        | * 3c5cdd4 (origin/c-domain, c-domain) WIP
        |/  
        o 4e500ec Domain::Activator initial_invoice_charge_pairs
This clearly shows where the original and cleaned up branches diverge, and what the differences are. I also use git-vee to compare pre- and post-rebase versions of branches (with git-vee ORIG_HEAD) and local branches with their remote tracking branches after fetching (with git-vee remote or just plain git-vee).

A cleaned-up branch should usually have the same final tree as the tree at the end of the original branch. I have another tool, git-treehash, which compares trees. By default it compares HEAD with ORIG_HEAD, so after I use git-rebase to squash or to split commits, I sometimes run "git treehash" to make sure that the tree hasn't changed. In this example, I do:

        % git treehash c-domain HEAD
        d360408d1afa90e0176aaa73bf8d3cae641a0850 HEAD
        f0fd6ea0de7dbe60520e2a69fbec210260370d78 c-domain
which tells me that they are not the same. Most often this happens because I threw away all the debugging code that I put in earlier, but this time it was because of that line of superfluous code I eliminated from HasDomain.pm. When the treehashes differ, I'll use git-diff to make sure that the difference is innocuous:

        % git diff c-domain
        diff --git a/lib/Pobox/Moonpig/Role/HasDomain.pm b/lib/Pobox/Moonpig/Role/HasDomain.pm
        index 3d8bb8c..21cb752 100644
        --- a/lib/Pobox/Moonpig/Role/HasDomain.pm
        +++ b/lib/Pobox/Moonpig/Role/HasDomain.pm
        @@ -5,7 +5,6 @@ use Carp qw(croak confess);
         use ICG::Handy qw(is_domain);
         use Moonpig::Types qw(Factory Time);
         use Moose::Util::TypeConstraints qw(duck_type enum subtype);
        -use MooseX::SetOnce;

         with (
           'Moonpig::Role::StubBuild',

Okay then.

The next task is probably to deal with the older WIP commits. This time I'll omit all the details. But the enclosing procedure looks like this:

        % git checkout -b wip-cleanup c8dbf41
        % git reset HEAD^
        % ... (a lot of git-add -p as above) ...
        ...

        % git vee c8dbf41
        * 4c6ff45 (wip-cleanup) get rid of unused twiddler test
        * b328de5 test full payment cycle
        * 201a4f2 abstract out pay_invoice operation
        * 55ae45e add upper limit (default 30d) to wait_until utility
        | * c8dbf41 WIP
        |/  
        o e5063d4 add test for sent invoice in domain.t

        % git treehash c8dbf41 HEAD
        7f52ba68923e2ede8fda407ffa9c06c5c48338ae
        % git checkout cleanup
        % git rebase wip-cleanup
The output of git-treehash says that the tree at the end of the wip-cleanup branch is identical to the one in the WIP commit it is supposed to replace, so it's perfectly safe to rebase the rest of the cleanup branch onto it, replacing the one WIP commit with the four new commits in wip-cleanup. Now the cleaned up branch looks like this:

        % git vee c-domain
        * a425aa1 (HEAD, cleanup) Activator consumer can generate special charges
        * 2bb0932 mock OpenSRS object; add tests
        * a77bfcb separate templates for domain-registering and domain-renewing consumer
        * 4c44db2 Domains do not have explicit start dates
        * fab500f new register_domains utility program
        = 38018b6 Domain::Activator initial_invoice_charge_pairs
        = aebbae6 rename "croak" method to "fail" to avoid conflicts
        = 45a224d notes about why domain queries might fail
        = 80e4a90 OpenSRS interface
        = 27f4562 bring in Net::OpenSRS module
        = f5cb624 add missing MakesReplacement stuff
        * 4c6ff45 (wip-cleanup) get rid of unused twiddler test
        * b328de5 test full payment cycle
        * 201a4f2 abstract out pay_invoice operation
        * 55ae45e add upper limit (default 30d) to wait_until utility
        | * 3c5cdd4 (origin/c-domain, c-domain) WIP
        | = 4e500ec Domain::Activator initial_invoice_charge_pairs
        | = f64361f rename "croak" method to "fail" to avoid conflicts
        | = 893f16f notes about why domain queries might fail
        | = 52c18fb OpenSRS interface
        | = fc13059 bring in Net::OpenSRS module
        | = 9e6ffa4 add missing MakesReplacement stuff
        | * c8dbf41 WIP
        |/  
        o e5063d4 add test for sent invoice in domain.t
git-vee marks a commit with an equal sign instead of a star if it's equivalent to a commit in the other branch. The commits in the middle marked with equals signs are the ones that weren't changed. The upper WIP was replaced with five commits, and the lower one with four.

I've been planning for a long time to write a tool to help me with breaking up WIP commits like this, and with branch cleanup in general: It will write each changed hunk into a file, and then let me separate the hunk files into several subdirectories, each of which represents one commit, and then it will create the commits automatically from the directory contents. This is still only partly finished, but I think when it's done it will eliminate the six-color diff printouts.

[ Addendum 20120404: Further observation has revealed that I almost never use git-commit -a, even when it would be quicker to do so. Instead, I almost always use git-add -u and then git-commit the resulting index. This is just an observation, and not a claim that my practice is either better or worse than using git-commit -a. ]

[ Addendum 20120825: There is now a followup article about how to manage rewriting of published history. ]


[Other articles in category /prog] permanent link

Sun, 04 Mar 2012

Why can't Git resolve all conflicted merges?
I like to be prepared ahead of time for questions, and one such question is why Git can't resolve all merge conflicts automatically. People do show up on IRC asking this from time to time. If you're a sophisticated user the answer is obvious, but I've made a pretty good living teaching classes to people who don't find such things obvious.

What we need is a nice example. In the past my example was sort of silly. You have a file that contains the instruction:

           Pay potato tax every April 15
One branch adds an exception:
           Pay potato tax every April 15
             (Except in years of potato blight.)
While another branch broadens the original instruction:

           Pay all tax due every April 15
What's the correct resolution here? It's easy to understand that mashing together the two changes is a recipe for potential catastrophe:

           Pay all tax due every April 15
             (Except in years of potato blight.)
You get fined for tax evasion after the next potato blight. And it's similarly easy to construct scenarios in which the correct resolution is to leave the whole thing in place including the modifier, change the thing to something else completely, delete the whole thing, or to refer the matter to Legal and shut down the whole system until you hear back. Clearly it's outside Git's scope to recognize when to call in the lawyers, much less to predict what their answer will be.

But a few months ago I ran into a somewhat less silly example. At work we had two seprate projects, "Moonpig" and "Stick", each in its own repository. Moonpig contained a subsystem, "Collections", which we decided would make more sense as part of Stick. I did this work, removing the Collections code from the Moonpig project and integrating it into the Stick project. From the point of view of the Moonpig repository, the Collections system was deleted entirely.

Meanwhile, on a parallel branch of Moonpig, R.J.B. Signes made some changes that included bug fixes to the Collections. After I removed the collections, he tried to merge his changes into the master branch, and got a merge conflict, because some of the files to which he was making bug fixes were no longer there.

The correct resolution was to perform the rest of the merge without the bug fixes, which Git could conceivably have done. But then the unapplied bug fixes needed to be applied to the Collections module that was now in the completely separate Stick project, and there is no way Git could have done this, or even to have known it should be done. Human intervention was the only answer.


[Other articles in category /prog] permanent link