The Universe of Disco


Wed, 06 Jul 2022

Things I wish everyone knew about Git (Part II)

This is a writeup of a talk I gave in December for my previous employer. It's long so I'm publishing it in several parts:

The most important material is in Part I.

It is really hard to lose stuff

A Git repository is an append-only filesystem. You can add snapshots of files and directories, but you can't modify or delete anything. Git commands sometimes purport to modify data. For example git commit --amend suggests that it amends a commit. It doesn't. There is no such thing as amending a commit; commits are immutable.

Rather, it writes a completely new commit, and then kinda turns its back on the old one. But the old commit is still in there, pristine, forever.

In a Git repository you can lose things, in the sense of forgetting where they are. But they can almost always be found again, one way or another, and when you find them they will be exactly the same as they were before. If you git commit --amend and change your mind later, it's not hard to get the old ⸢unamended⸣ commit back if you want it for some reason.

  • If you have the SHA for a file, it will always be the exact same version of the file with the exact same contents.

  • If you have the SHA for a directory (a “tree” in Git jargon) it will always contain the exact same versions of the exact same files with the exact same names.

  • If you have the SHA for a commit, it will always contain the exact same metainformation (description, when made, by whom, etc.) and the exact same snapshot of the entire file tree.

Objects can have other names and descriptions that come and go, but the SHA is forever.

(There's a small qualification to this: if the SHA is the only way to refer to a certain object, if it has no other names, and if you haven't used it for a few months, Git might discard it from the repository entirely.)

But what if you do lose something?

There are many good answers to this question but I think the one to know first is git-reflog, because it covers the great majority of cases.

The git-reflog command means:

“List the SHAs of commits I have visited recently”

When I run git reflog the top of the output says what commits I had checked out at recently, with the top line being the commit I have checked out right now:

    523e9fa1 HEAD@{0}: checkout: moving from dev to pasha
    5c31648d HEAD@{1}: pull: Fast-forward
    07053923 HEAD@{2}: checkout: moving from pr2323 to dev
    ...

The last thing I did was check out the branch named pasha; its tip commit is at 523e9f1a.

Before that, I did git pull and Git updated my local dev branch from the remote one, updating it to 5c31648d.

Before that, I had switched to dev from a different branch, pr2323. At that time, before the pull, dev referred to commit 07053923.

Farther down in the output are some commits I visited last August:

    ...
    58ec94f6 HEAD@{928}: pull --rebase origin dev: checkout 58ec94f6d6cb375e09e29a7a6f904e3b3c552772
    e0cfbaee HEAD@{929}: commit: WIP: model classes for condensedPlate and condensedRNAPlate
    f8d17671 HEAD@{930}: commit: Unskip tests that depend on standard seed data
    31137c90 HEAD@{931}: commit (amend): migrate pedigree tests into test/pedigree
    a4a2431a HEAD@{932}: commit: migrate pedigree tests into test/pedigree
    1fe585cb HEAD@{933}: checkout: moving from LAB-808-dao-transaction-test-mode to LAB-815-pedigree-extensions
    ...

Flux capacitor (magic
time-travel doohickey) from “Back to the Future”

Suppose I'm caught in some horrible Git nightmare. Maybe I deleted the entire test suite or accidentally put my Small Wonder fanfic into a commit message or overwrote the report templates with 150 gigabytes of goat porn. I can go back to how things were before. I look in the reflog for the SHA of the commit just before I made my big blunder, and then:

    git reset --hard 881f53fa

Phew, it was just a bad dream.

(Of course, if my colleagues actually saw the goat porn, it can't fix that.)

I would like to nominate Wile E. Coyote to be the mascot of Git. Because Wile E. is always getting himself into situations like this one:

Wile E., a cartoon coyote has just fired a
shotgun at Bugs Bunny.  For some reason the shotgun has fired
backwards and blown his face off, as Git sometimes does.

But then, in the next scene, he is magically unharmed. That's Git.

Finding old stuff with git-reflog

  • git reflog by itself lists the places that HEAD has been
  • git reflog some-branch lists the places that some-branch has been
  • That HEAD@{1} thing in the reflog output is another way to name that commit if you don't want to use the SHA.
  • You can abbreviate it to just @{1}.
  • The following locutions can be used with any git command that wants you to identify a commit:

    • @{17} (HEAD as it was 17 actions ago)
    • @{18:43} (HEAD as it was at 18:43 today)
    • @{yesterday} (HEAD as it was 24 hours ago)
    • dev@{'3 days ago'} (dev as it was 3 days ago)
    • some-branch@{'Aug 22'} (some-branch as it was last August 22)

    (Use with git-checkout, git-reset, git-show, git-diff, etc.)

  • Also useful:

    git show dev@{'Aug 22'}:path/to/some/file.txt
    

    “Print out that file, as it was on dev, as dev was on August 22”

It's all still in there.

What if you can't find it?

Don't panic! Someone with more experience can probably find it for you. If you have a local Git expert, ask them for help.

And if they are busy and can't help you immediately, the thing you're looking for won't disappear while you wait for them. The repository is append-only. Every version of everything is saved. If they could have found it today, they will still be able to find it tomorrow.

(Git will eventually throw away lost and unused snapshots, but typically not anything you have used in the last 90 days.)

What if you regret something you did?

Don't panic! It can probably put it back the way it was.

Git leaves a trail

When you make a commit, Git prints something like this:

    your-topic-branch 4e86fa23 Rework foozle subsystem

If you need to find that commit again, the SHA 4e86fa23 is in your terminal scrollback.

When you fetch a remote branch, Git prints:

       6e8fab43..bea7535b  dev        -> origin/dev

What commit was origin/dev before the fetch? At 6e8fab43. What commit is it now? bea7535b.

What if you want to look at how it was before? No problem, 6e8fab43 is still there. It's not called origin/dev any more, but the SHA is forever. You can still check it out and look at it:

    git checkout -b how-it-was-before 6e8fab43

What if you want to compare how it was with how it is now?

    git log 6e8fab43..bea7535b
    git show 6e8fab43..bea7535b
    git diff 6e8fab43..bea7535b

Git tries to leave a trail of breadcrumbs in your terminal. It's constantly printing out SHAs that you might want again.

A few things can be lost forever!

After all that talk about how Git will not lose things, I should point out the exceptions. The big exception is that if you have created files or made changes in the working tree, Git is unaware of them until you have added them with git-add. Until then, those changes are in the working tree but not in the repository, and if you discard them Git cannot help you get them back.

Good advice is Commit early and often. If you don't commit, at least add changes with git-add. Files added but not committed are saved in the repository, although they can be hard to find because they haven't been packaged into a commit with a single SHA id.

Some people automate this: they have a process that runs every few minutes and commits the current working tree to a special branch that they look at only in case of disaster.

The dangerous commands are git-reset and git-checkout

which modify the working tree, and so might wipe out changes that aren't in the repository. Git will try to warn you before doing something destructive to your working tree changes.

git-rev-parse

We saw a little while ago that Git's language for talking about commits and files is quite sophisticated:

            my-topic-branch@{'Aug 22'}:path/to/some/file.txt

Where is this language documented? Maybe not where you would expect: it's in the manual for git-rev-parse.

The git rev-parse command is less well-known than it should be. It takes a description of some object and turns it into a SHA. Why is that useful? Maybe not, but

The git-rev-parse man page explains the syntax of the descriptions Git understands.

A good habit is to skim over the manual every few months. You'll pick up something new and useful every time.

My favorite is that if you use the syntax :/foozle you get the most recent commit on the current branch whose message mentions foozle. For example:

    git show :/foozle

or

    git log :/introduce..:/remove

Coming next week (probably), a few miscellaneous matters about using Git more effectively.


[Other articles in category /prog/git] permanent link