Git bisect And Nose -- Or how to find out who to blame for breaking the build.
Posted on Fri 03 August 2012 in Posts
How did I not ever discover
git bisect before today? Git bisect allows you to identify a particular commit which
breaks a build, even after development has continued past that commit. So for example, say you:
- Commit some code which (unknowing to you) happens to break the build
- You then (not realizing things have gone sideways) continue on doing commits on stuff you're working on
- You then are about to push your code up to a remote mainline, so you finally run all those unit tests and realize you broke the build somewhere, but you don't know which commit introduced the problem
In a typical environment you'd now have a fun period of checking out a previous revision, running the tests, seeing if that was the commit that broke the build, and continue doing so until you identified the commit that introduced the failure. I have experienced this many many times and it is the complete opposite of fun.
If you were smart you might recognize that a binary search would be effective here. That is, if you know commit (A) is bad, and commit (B) is good, and there's 10 commits in-between (A) and (B) then you'd checkout the one halfway between the two, check for the failure, and in doing so eliminate half the possibilities (rather than trying all 10 in succession).
And if you were really smart you'd know that this is exactly what
git bisect does. You tell git bisect which commit
you know is good, and which commit you know is bad, then it steps you through the process of stepping through the
commits in-between to identify which commit introduced the failure.
But wait, there's more! There's also a lesser-known option to
git bisect. If you do a
git bisect run <somecommand>" then the process becomes completely automated. What happens is git runs
<somecommand> at each iteration of the bisection, and if the command returns error code 0 it marks that commit as
"good", and if it returns non-zero it marks it as "bad", and then continues the search with no human interaction whatsoever.
How cool is that?
So then the trick becomes "what's the command to use for
<somecommand>?" Obviously this is project dependent
(probably whatever command you use to run your unit tests), but for those of us who are sane Python devs we probably use
Nose to run our tests. As an example, I often organize my code as follows:
project/ +--- src/ +--- module1/ +--- module2/ +--- test/
Where "module1" contains code for a module, "module2" contains code for another module, and "test" contains my unit
tests. Nose is smart enough that if you tell it to start at "src" it will search all subdirectories for tests and then
run them. So lets say we know that commit
022ca08 was "bad" (ie the first commit we noticed the problem in) and
"0b52f0c` was good (it doesn't contain the problem). We could then do:
git bisect start 022ca08 0b52f0c -- git bisect run nosetests -w src
Then go grab a coffee, come back in a few minutes (assuming your tests don't take forever to run), and git will have
identified the commit between
022ca08 that introduced the failure. Note that we have to run
from the top of the source tree (in my example the "project" directory) hence we need to tell
nosetests to look in
src via the