Sat, 16 Apr 2016
I was not expecting to have a demonstration ready so soon. But today
I finished working on a project, I had all the files staged in the
index but not committed, and for some reason I no longer remember I
chose that moment to do
If the files had only been in the working tree, there would have been nothing to do but to start over. Git does not track the working tree. But I had added the files to the index. When a file is added to the Git index, Git stores it in the repository. Later on, when the index is committed, Git creates a commit that refers to the files already stored. If you know how to look, you can find the stored files even before they are part of a commit.
(If they are part of a commit, the problem is much easier.
Typically the answer is simply “use
Each file added to the Git index is stored as a “blob object”. Git
stores objects in two ways. When it's fetching a lot of objects from
a remote repository, it gets a big zip file with an attached table of
contents; this is called a pack. Getting objects from a pack can be
a pain. Fortunately, not all objects are in packs. When when you just
To make a loose object from a file, the contents of the file are checksummed, and the checksum is used as the name of the object file in the repository and as an identifier for the object, exactly the same as the way git uses the checksum of a commit as the commit's identifier. If the checksum is 0123456789abcdef0123456789abcdef01234567, the object is stored in
The pack files are elsewhere, in
So the first thing I did was to get a list of the loose objects in the repository:
This produces a list of the object IDs of all the loose objects in the repository:
There were 500 loose objects in my repository. The goal was to find the eight I wanted.
There are several kinds of objects in a Git repository. In addition
to blobs, which represent file contents, there are commit objects,
which represent commits, and tree objects, which represent
directories. These are usually constructed at the time the commit is
done. Since my files hadn't been committed, I knew I wasn't
interested in these types of objects. The command
Then I just grepped out the blob objects:
Now it was time to grep through the blobs to find the ones I was
looking for. Fortunately, I knew that each of my lost files would
contain the string
So this loop fills the file
for i in $(cat /tmp/OBJBLOB ) ; do git cat-file blob $i | grep -q org-service-curr && git cat-file blob $i > /tmp/rescue/$i; done
Instead of printing out the matching blob ID number, this loop passes
The rest was simple. I made 8 subdirectories under
Finally, I went back to the working tree and added and committed the rescued files.
It seemed longer, but it only took about twenty minutes. To recreate the eight files from scratch might have taken about the same amount of time, or maybe longer (although it never takes as long as I think it will), and would have been tedious.
But let's suppose that it had taken much longer, say forty minutes instead of twenty, to rescue the lost blobs from the repository. Would that extra twenty minutes have been time wasted? No! The twenty minutes spent to recreate the files from scratch is a dead loss. But the forty minutes to rescue the blobs is time spent learning something that might be useful in the future. The Git rescue might have cost twenty extra minutes, but if so it was paid back with forty minutes of additional Git expertise, and time spent to gain expertise is well spent! Spending time to gain expertise is how you become an expert!
Git is a core tool, something I use every day. For a long time I have been prepared for the day when I would try to rescue someone's lost blobs, but until now I had never done it. Now, if that day comes, I will be able to say “Oh, it's no problem, I have done this before!”
So if you lose something in Git, don't panic. There's a good chance that you can find someone who will be able to hunt it down again.