Perils of hacking on mature software
Yesterday I wrote up an interesting bug in git-log --follow 's
handling of empty files. Afterward
I thought I'd see if I could fix it.
People complain that the trouble of working on mature software like
Git is to understand the way the code is structured, its conventions,
the accumulated layers of cruft, and where everything is. I think
this is a relatively minor difficulty. The hard part is no so much
doing what you want, as knowing what you want to do.
My original idea for the fix was this: I can give git log a new
option, say --follow-size-threshhold= n. This would disable all
copy and rename detection for any files of size less than n bytes.
If not specified or configured, n would default to 1, so that the
default behavior would disable copy and rename detection of empty
files but not of anything else. I was concerned that an integer
option was unnecessarily delicate. It might have been sufficient to
have a boolean --follow-empty-files flag. But either way the
programming would be almost the same and it would be easy to simplify
the option later if the Git maintainers wanted it that way
I excavated the code and found where the change needed to go. It's
not actually in git-log itself. Git has an internal system for
diffing pairs of files, and git-log --follow uses this to decide
when two blobs are similar enough for it to switch from following one
to the other. So the flag actually needed to be added to git-diff ,
where I called it --rename-size-threshhold . Then git-log would
set that option internally before using the Git diff system to detect
renames.
But then I ran into a roadblock. Diff already has an undocumented
flag called --rename-empty that tells it to report on renames of
empty files in certain contexts — not the context I was interested in
unfortunately. The flag is set by default, but it is cleared internally
when git-merge is resolving conflicts. The issue it addresses is
this: Suppose the merge base has some empty file X. Somewhere along
the line X has been removed. In one branch, an unrelated empty file
Y has been created, and in the other branch a different unrelated
empty file Z has been created. When merging these two branches, Git
will detect a merge conflict: was file X moved to location Y or to
location Z? This ⸢conflict⸣ is almost certainly spurious, and is is
very unlikely that the user will thank us for demanding that they
resolve it manually. So git-merge sets --no-rename-empty
internally and Git resolves the ⸢conflict⸣ automatically.
(See this commit for further details.)
The roadblock is: how does --rename-empty fit together with my
proposed --rename-size-threshhold flag? Should they be the same
thing? Or should they be separate options? There appear to be at
least three subsystems in Git that try to decide if two similar or
identical files (which might have different names, or the same name in
different directories) are “the same file” for various purposes. Do
we want to control the behavior of these subsystems separately or in
unison?
If they should be controlled in unison, should
--rename-size-threshhold be demoted to a boolean, or should
--rename-empty be promoted to an integer? And if they should be the
same, what are the implications for backward compatibility? Should
the existing --rename-empty be documented?
If we add new options, how do they interact with the existing and
already non-orthogonal flags that do something like this? They
include at least the following options of git-diff , git-log , and
git-show :
--follow
--find-renames=n
--find-copies
--find-copies-harder
-l
Only git-log has --follow and my new feature was conceived as a
modification of it, which is why I named it
--follow-size-threshhold . But git-log wouldn't be implementing
this itself, except to pass the flag into the diff system. Calling it
--follow-size-threshhold in git-diff didn't make sense because
git-diff doesn't have a --follow option. It needs a different
name. But if I do that, then we have git-diff and git-log options
with different names that nevertheless do exactly the same thing.
Confusing!
Now suppose you would like to configure a default for this option in
your .gitconfig . Does it make sense to have both
diff.renameSizeThreshhold and log.followSizeThreshhold options?
Not really. It would never be useful to set one but not the other.
So eliminate log.followSizeThreshhold . But now someone like me who
wants to change the behavior of git-log --follow will not know to
look in the right place for the option they need.
The thing to do at this point is to come up with some
reasonable-seeming proposal and send it to Jeff King, who created the
undocumented --rename-empty feature, and who is also a good person
to work with. But coming up with a good solution entirely on my own
is unlikely.
Doing any particular thing would not be too hard. The hard part is
deciding what particular thing to do.
[Other articles in category /prog]
permanent link
Language fluency in speech and print
Long ago I worked among the graduate students at the University of
Pennsylvania department of Computer and Information Sciences. Among
other things, I did system and software support for them, and being
about the same age and with many common interests, I socialized with
them also.
There was one Chinese-Malaysian graduate student who I thought of as
having poor English. But one day, reading one of his emailed support
requests, I was struck by how clear and well-composed it was. I
suddenly realized I had been wrong. His English was excellent. It
was his pronunciation that was not so good. When speaking to him in
person, this was all I had perceived. In email, his accent vanished
and he spoke English like a well-educated native. When I next met him
in person I paid more careful attention and I realized that, indeed, I
had not seen past the surface: he spoke the way he wrote, but his
accent had blinded me to his excellent grammar and diction.
Once I picked up on this, I started to notice better. There were many
examples of the same phenomenon, and also the opposite phenomenon,
where someone spoke poorly but I hadn't noticed because their
pronunciation was good. But then they would send email and the veil
would be lifted. This was even true of native speakers, who can get
away with all sorts of mistakes because their pronunciation is so
perfect. (I don't mean perfect in the sense of pronouncing things the
way the book says you should; I mean in the sense of pronouncing
things the way a native speaker does.) I didn't notice this unless I
was making an effort to look for it.
I'm not sure I have anything else to say about this, except that it
seems to me that when learning a foreign language, one ought to
consider whether one will be using it primarily for speech or
primarily for writing, and optimize one's study time accordingly. For
speech, concentrate on good pronunciation; for writing, focus on
grammar and diction.
Hmm, put that way it seems obvious. Also, the sky is sometimes blue.
[Other articles in category /lang]
permanent link
|