At Newsbrane internally we use Git to manage our repository, Python for much of our Web code, and Django as our framework. I just had an interesting time diagnosing a problem which I thought I would share.
Git is a well-known revision control system. I'm not going to go into it much save to say that it lets you edit the history of your changes to the code, as long as you haven't shared those changes with anyone. Normally when you're working with Git you make a bunch of changes and then when you want to share them, you merge your changes with anything that's changed the last time you synced up. Doing this in Git requires making a 'merge commit'.
At Newsbrane, we prefer not to make merge commits when they're not really needed -- we pretend that most small changes to the code happen linearly, one change after the other. This reduces clutter in the log and makes it easier to examine the history. But in order to do this, when you sync up with the repository, you have to perform a 'rebase' -- edit the history so that your changes seem to have been applied after the other guy's changes instead of in parallel. So while most people don't have much use for editing the history, we make the rebasing process a common part of our development cycle. The way rebasing works is that it generates the diff of the actual changes you made and 'replays' them atop the other guy's changes, one by one.
Anyway, I had just tested and committed a local change and intended to share it with the company's repository. I downloaded the code, noticed that there had been a remote change, so I rebased against it. Since this rebasing process is a bit fragile, the best practice is to test your code before pushing to ensure that the rebase didn't screw anything up. I tested the code, and it was broken. Oops! The rebase screwed up? I started looking at the backtrace. I stared it it for about 5 minutes and I just couldn't believe that that line of code could fail. Not in the way that it did. So I went looking elsewhere.
Maybe my testing failed. Django, the Python web framework we use, has a development server which lets me test my changes locally without having to publish them. The development server has a nice feature where it reloads the code when you change the source, so you don't have to explicitly restart it. Thus I generally leave it running all the time. In this case, it was running when I tried to test the code, and so I had an idea: "oh, the rebase confused it somehow, I'll restart it".
Well, restarting the dev server didn't fix the problem. Okay, so what's wrong? Based on the error, it seemed like this one file wasn't properly updated after the rebase. But I checked the source and the file was fine. Some weird operator precedence error? I looked at the backtrace again. Impossible. Check 'git status' -- if anything is different in my source code from the committed version, it'll let me know. But nothing is different. Pull my hair out, walk away from the computer.
Is the operating system's buffer cache becoming corrupted due to the interaction between 'git rebase' and the devserver? Seems very far-fetched; the operating system isn't that fragile. I'm thinking about every layer between my monitor and the disk. There's caching everywhere... surely one of these caching layers is responsible?
Programming language. Python. Python "compiles" its modules to *.pyc (bytecode) files when you load them. It doesn't compile them every time, or there would be no need for pyc files at all (it would just keep them in memory). But it has to compile them if you changed the source. Wait! Does it really?
How does Python actually determine whether you've changed the source since the last time it compiled it? It uses the date. If the date recorded in the compiled file doesn't match exactly the operating system's date of the source, it will recompile the file.
OK, now I understand what happened: we saw a race condition between the rebase and the development server. The devserver watches for changes in the files (I still have no idea how it does this), and rebase updates them quickly. As part of its operation, the rebase must update the working copy files in place for each change that it is applying. The devserver noticed the mid-rebase update (with the old version of the source), reloaded the code, causing Python to compile the module. Right after it started to reload the code, the rebaser applied my change and wrote the new Python source file out -- in the same second that it had written the previous, unchanged source file. Once the devserver finished loading the module, the .pyc for the old source is written to disk, with the modification time of the old source file (same as that of the new source file), tricking Python: Python has no way to know to recompile the source file.
So there you have it. A rare race. Who to blame? I don't think I can really blame the development server. It didn't do anything especially unexpected -- it was really the victim. Maybe we can blame it for trying to read the working copy directory while Git was writing to it. Should we blame Git? Perhaps: I can't think why it should need to write to the working copy during a rebase until it's done or finds a conflict, but maybe the developers were lazy and this was the easiest way to implement it.
But I would actually like to blame Python, which uses a fragile caching strategy for module recompilation. Modification time, especially with such a coarse (1 second) granularity, shouldn't be the only shootdown criterion for any type of cache, because sometimes lots of stuff can happen in 1 second.
Anyway, to solve the problem, I just used the UNIX command 'touch' on the Python source file (which just updates its modification time) and it started working. Like magic. Exciting, isn't it?