Friday, July 30, 2004

Lucene Index Locks
After Wednesday night's blog I was able to find all the build problems and compile everything. The problems were due to using the right import statements, in the wrong files. Well, it was getting late. As I said at the time, I just needed a break in order to find the problem.

I didn't get to Thursday's blog since I didn't get to do much productive work. Before proceeding with testing the N3 loader I wanted to make sure that I had a stable system and hadn't broken anything. So of course, when the Lucene tests started failing I had to start wondering what I'd broken, and how.

My first thought was that I could not have possible broken the Lucene code, as the N3 loading code was in a completely unrelated area. However, I've been mistaken in the past when I thought that two pieces of code were completely unrelated, so I wasn't prepared to dismiss the possibility of the problem being in my code.

The test failure was caused by Lucene failing to obtain a lock on a file. Perhaps it was some file access I'd done which prevented the lock from being acquired.

Each time I ran the tests it took quite a long period to get through them all. I was able to track the Lucene tests down to the store-test target, so I tried to concentrate on running this test only. I quickly discovered that the Lucene tests would always pass when the tests were restricted to the set of tests in store-test, and they would always fail when run as the full set of tests for Kowari. So now I started to think that it was some test which ran before the Lucene tests which was causing the problem. Perhaps I'd influenced this other test in some way.

To confirm that it was definitely my code which was causing the problem, I got a fresh checkout of Kowari, and tried a full build with all the tests. All the tests passed, further indicating that it was my changes which had caused a problem, though I still couldn't work out how.

I asked advice from AN and TJ, but they couldn't help. TJ was able to tell me that these errors had been seen before, but was unable to say what they were indicative of.

Finally I started copying my files from my normal Kowari checkout to the clean checkout, one file at a time. I did this to see which file could be causing the problem. By the end of the day I had copied over every file, and the fresh checkout could still run all of the tests with no errors. I finally started comparisons between all the files in both checkouts, and there were no differences.

So at the end of the day I had two sets of identical files. One would build and pass its tests every time. The other would build but fail its Lucene test every time. This morning I even used find to run through all the files and do an MD5 comparison, but they came up the same.

All I could do in the end was to abandon the checkout with the failing test. I have no idea why two identical sets of files can behave differently. Time to stick to the working checkout and move on.

No comments: