Tuesday, April 12, 2005

Time to Write
At the moment I'm spending my days programming, and my evenings studying. That's not leaving a lot of time to blog! I'd love to write more, but it's just really difficult.

Evening study has become more pressing at the moment because I have my confirmation on the 29th of this month (that's Friday fortnight for those of you who understand the term). Because of this I'm thinking that I might spend my next few days "inverting" my working habits... study in the day, and programming at night. That's because I'm getting easily distracted from study in the evenings, but Anne thinks that when I write code I wouldn't notice a bomb going off. :-) Besides, I tend to be at my most productive when programming in the evenings. I always have.

Testing and Debugging
The subtraction operation testing went slowly, but smoothly. Once I got through the full suite of tests the first time, it didn't take long to discover that the test target I needed was test_tuples. That helped it all go faster.

I ended up with some problems getting the full set of JXUnit tests to work. While the simple case worked correctly, I tried some variations on the subtrahend, which led to some failures. The Difference was fine (the unit tests proved that), and the problem was due to sorting the arguments in the TuplesOperations.subtract() method. This ended up being due to an inverted test, but it took me a little while to find it.

The difficulty in finding the problem was because I could not modify the log4j configuration (again!). I thought I'd solved this problem months ago, but no matter which XML configuration files I modified, the system always picked up a file that was internal to the JAR file, and would not see my modifications to display debug info. Rather than spend another day on this, I just resorted to error messages, and removed them when I had fixed everything.

Whoever installed the log4j system need to document exactly how to modify the configuration for this. As I said, I thought I'd worked it out some time ago, but obviously not. This time around I modified the log4j.conf file in the main directory and in the conf directory. I then did a "clean" and a full rebuild, and there was no effect. I also did a "find" to look for any other XML files that might affect the configuration, but couldn't find any.

The problem with my desire to get the installer of log4j to document it, is that no one remembers who did it. I suspect Simon, but he doesn't remember. :-)

OWL Again
The next thing I worked on was the underlying semantics for OWL. Bijan has led me down a dark path, and now it seems I have to re-evaluate a lot of what I knew about OWL. It's strange and fortunate, but many of the changes in my understand have not resulted in practical differences to my implementation, although a couple have. At least it has led me to the documentation I need for a more complete implementation of OWL inferencing.

I should also respond to a few people who have written in about doing a closed world assumption for cardinality testing. I can now say with complete confidence that this is a bad idea. For a start, it is not OWL, and will break legitimate inferencing. You can't just switch between open and closed world like that. I agree that the closed world is better in many ways, particularly when working with databases, but it can't be used here, else it will make OWL something that it is not.

For those people who'd like these kind of semantics on cardinality, then perhaps it is time to create a "closed world" namespace. I don't know if it really works with an open world RDF, but if people are doing this sort of thing anyway, then it would be a good idea to do it in a carefully closed off area, rather that changing the meaning of OWL.

Resolvers
I could only spend a short while on that before getting back to the collection resolver. Well, it's just a sequence resolver for the moment, as I don't need to traverse the rdf:rest links yet. When I do need to handle those collections, I'll need to update trans to variables bound to a set, instead of single value bindings. This is proceeding smoothly, though it doesn't run yet, as I'm still in the row comparator for the string pool.

This comparator isn't as nice to use as the one from the type resolver, as it needs to look at the string pool entry from the block files, which means much more disk activity. I'll go with the obvious implementation for the moment (I don't want to prematurely optimise), but I'm wondering if there is a hint I can give to the index about where the rdf namespace starts and finishes.

In the best possible world I'd rebuilt the string pool as a Patricia tree. However, I don't know how I can do that and still maintain our read/write phases. Maybe I could build a Patricia tree on the side, and update it periodically. It wouldn't be guaranteed to give perfect answers all the time, but that may still be useful. After all, the updatedb/locate database on many systems works like this. Even Google falls behind changes around the world, and yet no one would claim that it is not useful. Unfortunately, that is one of those pie in the sky ideas that I'll have to put aside until I get some time.

SOAP
I had a request from someone at UQ today about the SOAP interface (I'm linking to Apache instead of the canonical site, because that's what Kowari uses). He was using Ruby to talk to the interface, so Java's RMI was not an option for him. Unfortunately, SOAP didn't seem to work for him either.

That made me think about our external interfaces. If you want to use Kowari from an external package, you need some sort of IPC to talk to it. RMI is great if your code is in Java, but what about those projects which are not? For instance, how about Ruby? SOAP would seem like a good idea, but there are apparently problems.

The first problem came about when the server recognised that the wrong servername was being used to access the local database. I was concerned about this problem, as I fixed it in November. I looked at my own source code, and discovered that the reported line number in SessionFactoryFinder was for a blank line, meaning that the JAR file being used was at least as old as November. I looked into this, and discovered that the suspect JAR is the latest binary download from Kowari. We had better get the latest one out there. (That is a hint Andrew!)

In the meantime, I asked this person to use a CVS checkout, only to discover that Sourceforge would not let him check out a copy! In the end, I had to use my own account to get a checkout. Bizarre.

While this is all a bit of a diversion from my real task, keeping this stuff running is still important. I'll see how far I can get with it tomorrow, and if it's taking too long I'll palm it off to Simon, since he wrote the SOAP code in the first place.

Again, the time is too short, so I'll have to leave it here. It's annoying, as I have so much more to write! :-(

No comments: