Thursday, June 10, 2004

Reflections on Resolvers
Reflection code came easily. All that time implementing JMX for Enhydra I guess. (Was that really 4 years ago?)

Of course actually running the tests showed up lots of things not available. SR has been trying to keep these packages insulated from the rest of the system, but it seems that he's been a little too successful.

Anyway, the tests start successfully, but then the SessionFactoryFinder is failing to find the remote session, and I'm not sure why yet.

Distributed Queries
The next version of TKS is coming out tomorrow. It does a lot more, and it does it all better, which I'm pleased to see, but the resolvers didn't make it this time around. For that reason the lack of distributed resolver tests hasn't become an issue yet.

The old method of doing distributed queries had to be reintroduced, since the plan was to take this task over with the distributed resolver. SR only got onto this task a few days ago, which was poor timing given that he has left for an 8 week trip through the States today. Consequently, what code SR was able to finish has been dumped on AM, to have ready for tomorrow afternoon. Needless to say, AM is now swamped!

As a result, this afternoon several of us in the office were pulled in to give AM a hand with his bugs. Mine was to work out why constraints were not resolving when applied across multiple remote models.

I quickly narrowed the problem down to the constraint operation that joined the two models. Once I got there I started adding logging messages to see exactly what was happening, but I had to laugh when I saw the following:


if (i == null) {
logger.warn("ce2riMap.keys() = " + ce2riMap.entrySet());
try {
throw new QueryException("Failed to find register matching constraint: " + ce +
"generateJoin(" + term + ", " + destReg + ", , " + first + ")");
} catch (QueryException e) {
logger.warn("In ExecutionPlan.generateJoin", e);
throw e;
}
}
So a message was logged, an exception thrown, and then immediately caught just so it could be logged and then rethrown! Surely this code looked different once upon a time. I can only guess that it got this way by incremental changes accumulating over time, with each coder in too much of a hurry to notice what was happening. This view is supported by the first log message, as it says that it is displaying keys to a map, when it is really displaying entries.

It was the ce2riMap map mention in this code which was the problem. This is a mapping of ConstraintExpressions to an index into the list of registers which holds them. This list is used by the query optimizer to resolve queries. The problem was caused by a type of constraint expression that was being used in this context for the first time. This object did not have defined hashCode or equals methods meaning that it could not be used as a key in a map.

This ConstraintExpression object represents the results from a model, but it doesn't seem to hold its provenance. This means that the only way to compare this object with another for equality would be to iterate over the entire set. Now that would be a bad idea.

Fortunately constraints are only supposed to be resolved once, meaning that if a constraint is going to be compared as equal to another, then it must have been a clone of the original. I was able to take advantage of that behavior and give each constraint an ID in its constructor (which is based on the address of the object, obtained from System.identityHashCode()), and then made sure that ID was duplicated in the clone method. This worked fine.

Fixing this just served to show the next bug. Sometimes this job is like playing with Russian dolls.

The first bug showed up in the iTQL shell with a lot of debugging information describing the problem (in fact, the message shown was a list of the contents of ce2riMap). The new response in the iTQL shell simply said that the query could not be resolved. Log files soon showed that there was an RMI problem.

RMI was throwing an exception when it tried to serialize a GlobalizedAnswer object, for transport from one of the models' servers to the other. RMI exception traces are a pain, as they stop short of telling you where they came from with a line like: "and 21 more...".

I went through every place that an Answer interface gets returned over RMI, and confirmed that on each occasion the object was remotable or serializable. I also added in a heap of logging to tell me exactly what kind of objects were being returned over RMI. The resulting logs consistently showed that the returned objects were all fine, and yet I was still getting the exception trace.

At this point I remembered reading the javadoc for ObjectOutputStream last week (while writing the serialization code for org.jrdf.graph.memory.GraphImpl. Note to self: put this on Sourceforge). It explains that you can prevent serialization by implementing writeObject and throwing a NotSerializableException. This seemed perfect, as it would allow me to log a complete stack trace during serialization.

Of course, the modifications I made seemed to do nothing, until I realised that GlobalizedAnswer now appears in two packages, and having VIM find the class for me with ctags was taking me to the new one which is used for resolvers. Hopefully I'll remember to keep an eye out for this until resolvers are fully integrated and the obsolete packages go away.

Once I had GlobalizedAnswer throwing fully logged serialization exceptions, I quickly realised the obvious. The serialization was happening as the object was being sent to another server, rather than being returned. This is because Query objects hold an optional Answer field as their GIVEN clause. This field is almost never used, except in the context of remote queries, which is exactly why the problem only just started showing up again.

The solution is to make sure that any queries that are to be sent to a remote server, convert any Answers they include into a serializable form. To prevent any unnecessary conversions unless they were required, I added a writeObject method to Query and if the Answer representing the GIVEN clause is not an instance of Serializable then it gets replaced with a serializable Answer built from the old GIVEN. After that the default serialization method for Query gets called.

If the Answer is too large then it should be sent in a remotable form. I didn't bother with that part, partly because GIVEN clauses are typically small, and partly because this will all go away soon when resolver framework becomes available.

It all works now, but it has started logging a few Tuples as not being closed when their finalizers run. The finalizers are not there to do any work, they just clean up when it is needed, and report the need. The logged messages are probably just because SR can be a little sloppy with closing Answer and Tuples objects, and I'm sure it was just made harder for him with the deadline he was under. No wonder he hates programming in C or C++: all those free and delete calls.

CQU Results
Late last night I noticed that my academic transcript from CQU says that I failed a subject... only I didn't. In fact, the transcript says that I failed the very last subject I did in that degree, which is an obvious mistake, as I would never have been allowed to graduate if that had been the case.

So this morning I was on the phone finding out what happened. No one knows for sure, but it seems that CQU recently changed over their whole computer system, and one of my subject entries didn't make the changeover. So now I have to wait until the lecturer find the original paperwork so it can be corrected, and THEN I have to order another transcript. So much for a quick application.

No comments: