Thursday, July 01, 2004

Resolver Factory
Today's episode with the resolver went as slowly as yesterday's. In this case the problem stemmed from the Resolver Factory being unable to provide a Resolver for the remote model being requested. Many stack traces ensued while I tried to work out why this was happening.

Unfortunately I was a little hamstrung by being unable to log anything, as the code in question is being run by JUnit, and logging configuration is therefore unavailable. This meant that I had to put anything I was interested in into the messages being passed around with the thrown exceptions. Consequently I took longer to see the problem than I would have otherwise.

The client code was asking for a model which is found on a server according to the URI rmi://jaws.bne.pisoftware.com/server1 (We've been using the names of Bond henchmen for the desktop machines). The Resolver Factory was reporting that it couldn't connect to a session called "server". It took me some time to spot that it was supposed to be saying "server1" instead. Once I found that, I started looking through what URI had been configured, and what had been requested. This is where the lack of logging caused so many problems, as I couldn't just log the URI value whenever I wanted to see it.

Once I'd gone this far I asked AM for some help, and he suggested that it might have been trying to remove a "#" from a model URI, and was taking off the "1" instead. This made perfect sense, and after only a little searching I found it. This then led to 2 questions. First of all, why was a server URI being passed in when the code clearly expected a model URI? I suspect that this is to do with an assumption made for local models that isn't holding for remote models. It isn't a major problem, but I should work it out, in case it has consequences I don't see yet.

The second question is less important, but more perplexing. The problem code looked like this:
URI server = new URI(modelStr.substring(0, modelStr.indexOf('#') - 1));
Now unless I'm misreading this, if modelStr does not have a "#" character, the indexOf method should be returning -1, and the substring method should be throwing an exception when it gets -2 as the second parameter. Instead it is returning the full modelStr minus the final character. I have seen this before, and it seems to be a bug in the String class, at least on Linux.

So I fixed this to not change the URI string if it does not contain a "#" character. Alternatively it goes ahead and makes the above modifications if it does contain the character. This should have led me to the next bug in the chain of Resolver bugs, only things didn't quite go according to plan...

Lucene Indexes
Now that I'm using the "new" build script for the resolvers, I have found that some of the ant configuration is not quite right. Other than the path problems I have mentioned in the last few days, it seems to have some fundamentally different behaviour for some classes which I didn't realise have changed.

The main problem I've had has been with a locking file used by Lucene. Whenever I tried to start a Kowari server it would fail to run with an error describing a timeout waiting for a lock file. To remedy this I have taken to using rm /tmp/lucene-*.lock each time I've needed to run a server. This didn't always work, and occasionally I would see a startup error saying that Lucene was unable to delete the /tmp/lucene-xxxxxxxxxxx.lock file because it did not exist. To work around this I would just remove the server directory, do a clean build, and run it all again. Until this afternoon this worked fine. Suddenly it doesn't. I either get a timeout on the lock file if it exists, or an error attempting to delete the lock file if it doesn't. Either way the server fails to start. Clean builds and empty directories don't fix it either.

It looks like I need to learn what Lucene is doing before I can do any further work.

No comments: