Perils of hacking on mature software
Yesterday I wrote up an interesting bug in git-log --follow 's
handling of empty files. Afterward
I thought I'd see if I could fix it.
People complain that the trouble of working on mature software like
Git is to understand the way the code is structured, its conventions,
the accumulated layers of cruft, and where everything is. I think
this is a relatively minor difficulty. The hard part is no so much
doing what you want, as knowing what you want to do.
My original idea for the fix was this: I can give git log a new
option, say --follow-size-threshhold= n. This would disable all
copy and rename detection for any files of size less than n bytes.
If not specified or configured, n would default to 1, so that the
default behavior would disable copy and rename detection of empty
files but not of anything else. I was concerned that an integer
option was unnecessarily delicate. It might have been sufficient to
have a boolean --follow-empty-files flag. But either way the
programming would be almost the same and it would be easy to simplify
the option later if the Git maintainers wanted it that way
I excavated the code and found where the change needed to go. It's
not actually in git-log itself. Git has an internal system for
diffing pairs of files, and git-log --follow uses this to decide
when two blobs are similar enough for it to switch from following one
to the other. So the flag actually needed to be added to git-diff ,
where I called it --rename-size-threshhold . Then git-log would
set that option internally before using the Git diff system to detect
renames.
But then I ran into a roadblock. Diff already has an undocumented
flag called --rename-empty that tells it to report on renames of
empty files in certain contexts — not the context I was interested in
unfortunately. The flag is set by default, but it is cleared internally
when git-merge is resolving conflicts. The issue it addresses is
this: Suppose the merge base has some empty file X. Somewhere along
the line X has been removed. In one branch, an unrelated empty file
Y has been created, and in the other branch a different unrelated
empty file Z has been created. When merging these two branches, Git
will detect a merge conflict: was file X moved to location Y or to
location Z? This ⸢conflict⸣ is almost certainly spurious, and is is
very unlikely that the user will thank us for demanding that they
resolve it manually. So git-merge sets --no-rename-empty
internally and Git resolves the ⸢conflict⸣ automatically.
(See this commit for further details.)
The roadblock is: how does --rename-empty fit together with my
proposed --rename-size-threshhold flag? Should they be the same
thing? Or should they be separate options? There appear to be at
least three subsystems in Git that try to decide if two similar or
identical files (which might have different names, or the same name in
different directories) are “the same file” for various purposes. Do
we want to control the behavior of these subsystems separately or in
unison?
If they should be controlled in unison, should
--rename-size-threshhold be demoted to a boolean, or should
--rename-empty be promoted to an integer? And if they should be the
same, what are the implications for backward compatibility? Should
the existing --rename-empty be documented?
If we add new options, how do they interact with the existing and
already non-orthogonal flags that do something like this? They
include at least the following options of git-diff , git-log , and
git-show :
--follow
--find-renames=n
--find-copies
--find-copies-harder
-l
Only git-log has --follow and my new feature was conceived as a
modification of it, which is why I named it
--follow-size-threshhold . But git-log wouldn't be implementing
this itself, except to pass the flag into the diff system. Calling it
--follow-size-threshhold in git-diff didn't make sense because
git-diff doesn't have a --follow option. It needs a different
name. But if I do that, then we have git-diff and git-log options
with different names that nevertheless do exactly the same thing.
Confusing!
Now suppose you would like to configure a default for this option in
your .gitconfig . Does it make sense to have both
diff.renameSizeThreshhold and log.followSizeThreshhold options?
Not really. It would never be useful to set one but not the other.
So eliminate log.followSizeThreshhold . But now someone like me who
wants to change the behavior of git-log --follow will not know to
look in the right place for the option they need.
The thing to do at this point is to come up with some
reasonable-seeming proposal and send it to Jeff King, who created the
undocumented --rename-empty feature, and who is also a good person
to work with. But coming up with a good solution entirely on my own
is unlikely.
Doing any particular thing would not be too hard. The hard part is
deciding what particular thing to do.
[Other articles in category /prog]
permanent link
|