Friday, April 08, 2005

Differences
The other big thing this week has been the difference code. I believe that I'd already mentioned that I seemed to have it working, but sorting the tuples resulted in an empty set.

After wasting some time on debugging, I tried another tack. The result of a difference is essentially just the original minuend, filtered by the data in the subtrahend. Hence, all the method calls on the difference calls which ask about the structure of the tuples should return exactly the same information as the original minuend (with the exception of the row count).

So what was I returning that was different to the minuend? For almost every method I was passing the call straight through to the minuend. The exceptions were next(), isMaterialised(), getOperands(), close() and clone(). None of this seemed suspicious, so why did the sorting code think that the Difference class looked different to the minuend?

Thinking about this for a while made me realise that Difference does not implement Tuples, but rather, it extends AbstractTuples. This means that it was providing some default implementations of a couple of methods, rather than using the minuend's implementation. I don't remember the exact methods now, but I think that the default methods I was using were isUnconstrained() and getComparator(). I tracked these all down, and implemented them myself, or else passed them on to the minuend.

After that, I tried the unit tests again, and had a lot more success. I still had quite a few errors, but in almost every case this was due to the unit tests being incorrect, so it was easy (though time consuming) to track them all down.

The only unit tests I had trouble with were those which included unconstrained variables. It appears that beforeFirst(prefix) is not implemented correctly for cases where the prefix includes UNBOUND. This is not so bad if the unbound variable is at the end of the prefix, as I could try to detect that and truncate the length of the prefix, but it would make the method very non-general.

Considering the general case, having unbound variables at the start of the prefix would require suffix matching, which is something that Andrae and Simon have pointed out is explicitly not supported yet. I'm not sure how I'd go about handling an unbound in the middle of a prefix, except by iterating or re-ordering the tuples. Neither of these are appealing options.

On reflection, I can't think of a use case where I need support for unbound variables in the difference, so I was more than happy to disable these tests. If I ever see a need for them I'll reconsider, but my current needs always result in bound variables.

From the passing unit tests I moved on to testing the iTQL. I'm pleased to say that this all worked perfectly on the first attempt. :-) I'm now putting together some JXUnit tests to script these iTQL tests. (JXUnit is certainly convenient, but it's frustrating how much effort goes into creating even a simple test).

I'll check it all in this evening.

No comments: