| 
 
Perils of hacking on mature software
 Yesterday I wrote up an interesting bug in git-log --follow's
handling of empty files.  Afterward
I thought I'd see if I could fix it. People complain that the trouble of working on mature software like
Git is to understand the way the code is structured, its conventions,
the accumulated layers of cruft, and where everything is.  I think
this is a relatively minor difficulty.  The hard part is no so much
doing what you want, as knowing what you want to do. My original idea for the fix was this: I can give git loga new
option, say--follow-size-threshhold=n.  This would disable all
copy and rename detection for any files of size less than n bytes.
If not specified or configured, n would default to 1, so that the
default behavior would disable copy and rename detection of empty
files but not of anything else.  I was concerned that an integer
option was unnecessarily delicate.  It might have been sufficient to
have a boolean--follow-empty-filesflag.  But either way the
programming would be almost the same and it would be easy to simplify
the option later if the Git maintainers wanted it that way I excavated the code and found where the change needed to go.  It's
not actually in git-logitself.  Git has an internal system for
diffing pairs of files, andgit-log --followuses this to decide
when two blobs are similar enough for it to switch from following one
to the other.  So the flag actually needed to be added togit-diff,
where I called it--rename-size-threshhold.  Thengit-logwould
set that option internally before using the Git diff system to detect
renames. But then I ran into a roadblock.  Diff already has an undocumented
flag called --rename-emptythat tells it to report on renames of
empty files in certain contexts — not the context I was interested in
unfortunately.  The flag is set by default, but it is cleared internally
whengit-mergeis resolving conflicts.  The issue it addresses is
this: Suppose the merge base has some empty file X.  Somewhere along
the line X has been removed.  In one branch, an unrelated empty file
Y has been created, and in the other branch a different unrelated
empty file Z has been created.  When merging these two branches, Git
will detect a merge conflict: was file X moved to location Y or to
location Z?  This ⸢conflict⸣ is almost certainly spurious, and is is
very unlikely that the user will thank us for demanding that they
resolve it manually.  Sogit-mergesets--no-rename-emptyinternally and Git resolves the ⸢conflict⸣ automatically. (See this commit for further details.) The roadblock is: how does --rename-emptyfit together with my
proposed--rename-size-threshholdflag?  Should they be the same
thing?  Or should they be separate options?  There appear to be at
least three subsystems in Git that try to decide if two similar or
identical files (which might have different names, or the same name in
different directories) are “the same file” for various purposes.  Do
we want to control the behavior of these subsystems separately or in
unison? If they should be controlled in unison, should
--rename-size-threshholdbe demoted to a boolean, or should--rename-emptybe promoted to an integer?  And if they should be the
same, what are the implications for backward compatibility?  Should
the existing--rename-emptybe documented? If we add new options, how do they interact with the existing and
already non-orthogonal flags that do something like this?  They
include at least the following options of git-diff,git-log, andgit-show: --follow
--find-renames=n 
--find-copies
--find-copies-harder
-l 
 Only git-loghas--followand my new feature was conceived as a
modification of it, which is why I named it--follow-size-threshhold.  Butgit-logwouldn't be implementing
this itself, except to pass the flag into the diff system.  Calling it--follow-size-threshholdingit-diffdidn't make sense becausegit-diffdoesn't have a--followoption.  It needs a different
name.  But if I do that, then we havegit-diffandgit-logoptions
with different names that nevertheless do exactly the same thing.
Confusing! Now suppose you would like to configure a default for this option in
your .gitconfig. Does it make sense to have bothdiff.renameSizeThreshholdandlog.followSizeThreshholdoptions?
Not really.  It would never be useful to set one but not the other.
So eliminatelog.followSizeThreshhold.  But now someone like me who
wants to change the behavior ofgit-log --followwill not know to
look in the right place for the option they need. The thing to do at this point is to come up with some
reasonable-seeming proposal and send it to Jeff King, who created the
undocumented --rename-emptyfeature, and who is also a good person
to work with.  But coming up with a good solution entirely on my own
is unlikely. Doing any particular thing would not be too hard.  The hard part is
deciding what particular thing to do. 
 
[Other articles in category /prog] 
permanent link
 
 |