potential git-diff speed up by skipping is-binary check
While working on gsoc 2011, I identified the is-binary check as a potential slowdown in my personal notes. (Yes, it's done for every file.) I tried to disable it with a command-line switch to git-diff, but that was ugly and hack-ish.
Recently I was looking at diff codepath again (due to a patch for --word-diff that turned out to be misjudged), so I gave this another shot.
In diff.c
, we find diff_filespec_is_binary()
:
int diff_filespec_is_binary(struct diff_filespec *one)
{
if (one->is_binary == -1) {
diff_filespec_load_driver(one);
if (one->driver->binary != -1)
one->is_binary = one->driver->binary;
else {
if (!one->data && DIFF_FILE_VALID(one))
diff_populate_filespec(one, 0);
if (one->data)
one->is_binary = buffer_is_binary(one->data,
one->size);
if (one->is_binary == -1)
one->is_binary = 0;
}
}
return one->is_binary;
}
On L11, we call xdiff's buffer_is_binary()
, which tries to find a NUL
with memchr()
. Looking at the extract, we see that it is possible to skip the is-binary check by defining a diff driver and setting diff.<driver>.binary.
(You might have noticed the dozen-or-so diff drivers for programming languages that git has built-in, but those won't do, because they are "undecided" about binary-ness, ie. driver.binary=-1
.)
It turns out that we don't have to define our own custom diff driver. In userdiff.c
, we have the built-in "dummy" driver driver_true
that has driver.binary=0
(false). We can turn this on by setting the diff
attribute in .gitattributes
, like this:
# glob [attr1 [attr2 [...]]]
* diff
However, some light testing shows the gains are not worth the trouble. Running time git log -p v0.99 >/dev/null
in the git repo itself (which has 1075 commits), here are the best of 5 numbers on a Solaris machine in NUS:
-
without
diff
: real 0m4.645s, user 0m3.946s, sys 0m0.692s -
with
diff
: real 0m4.556s, user 0m3.860s, sys 0m0.689s
Oh well.