Thursday, May 26, 2005

Unmarshalling
After integrating the rules interfaces I thought I'd run up the Kowari server and see that everything was still OK before proceeding. It wasn't of course (otherwise I wouldn't be talking about it here).

My error message was really helpful: Couldn't answer query.

So I started by grepping for this error message, and found it in several places within ItqlInterpreterSession. This was frustrating, as this means that I was seeing the problem at the entry point to the query process, rather than wherever the error was really occurring.

Of more concern was that each time this message was printed, the cause of the exception was supposed to be printed as well, but I was seeing nothing. I decided to print the message from the exception as well (not just the cause of the exception), only I saw nothing here either. Finally, in frustration, I changed each occurrence of the "Couldn't answer query" message to something unique, so I could tell which one was being printed. At this point the cause of the exception suddenly started to be printed as well.

I must have been running into a dependency problem where it wasn't building the new code modifications. I don't know why it suddenly started to work. If the new strings I'd created were not printed then I'd have known that it was a build script dependency problem, and I'd have done a clean build (I was on the verge of this already).

Given this problem I realised that I needed to perform clean builds more often in this debugging process. Incremental Kowari builds are already very time consuming, so performing a clean build each time meant that the rest of the debugging operation was guaranteed to be very time consuming (my latest clean build took 3 minutes, 17 seconds).

Anyway, I now knew that my problem was an "Unmarshalling" exception. So the problem was in RMI.

My first suspicion was that I'd updated a versioned interface at one end and not the other. The only changed interface was Session so I ran serialver to get the new number (the number generated by this program is essentially a checksum of relevant parts of the interface signature). However, the serial ID was exactly the same as before.

To confirm the problem I was seeing I commented out the new methods (buildRules() and runRules()), and tried to run another query. This took a while, as I had to make sure I got every implementation of the Session interface. Sure enough, it worked. So my problem was definitely in these methods, but where?

Wondering about the other rules classes I worked out which ones are to be transferred across RMI, and made sure that each was serializable (the exceptions were already serializable, but Rules was not). I couldn't see that this would make a difference, as I hadn't yet called a method that would move these objects, but I was trying everything I could think of. While I was at it, I made sure that each of these classes had serial IDs.

Once I had these changes throughout all the Session implementations I re-ran the code, and had no change in the output. I guess this was to be expected, but I had still be hopeful. :-)

While going through the various classes, it occurred to me that I had not yet made the necessary changes to RemoteSession. I wasn't yet passing the new function calls across RMI, so I thought that these would be safe to leave unimplemented for the moment. To intercept any inadvertent calls when the RMI interface was not complete, the RemoteSessionWrapperSession (blame Simon for the name, though it does make sense) was just throwing an UnsupportedOperationException for the new rules methods. With few other options to try, I decided to add the rules methods to the RemoteSession interface as well (after all, they were about to be needed anyway). Just a few extra lines, and they were implemented on SessionWrapperRemoteSession as well.

Unexpectedly, this did the trick! I still had an exception, but suddenly I had messages going over RMI, and the exceptions were being printed. With this information I discovered that the rule classes were not present in the classpath at the server end. Obviously they'd made it into the classpath during compilation, but when the distribution jar was built they were not being included.

Once rules-base-1.1.0.jar and krule-base-1.1.0.jar were included it all ran fine. I'm now back into the integration of rules with a database session.

No comments: