Working notes: 12/01/2004

Friday, December 10, 2004

RMI Bug
TJ approached me in the morning with a new bug. He claimed that there was a regression bug, where creating a model starting with "rmi://localhost/server1#" threw a NonRemoteSessionException.

My first thought was that this was a problem due to the create command not going through the resolver interface (like it is supposed to). I should ask SR and AM about that, as the interface currently lies. It also means that we don't get model creation when new resolvers are created, which drastically reduces some of the effectiveness of the interface. Anyway, it turned out that this thought was wrong. Yes, model creation does not go through the resolver interface, but that was not the root of the problem.

When I tried to duplicate TJ's problem I found that everything was working for me. I reported this back to TJ, and he showed me that one of his 2 bug reports had mentioned that he was using the web UI (I assumed that both reports were the same, and looked at the report which didn't mention the web UI). I never use the web UI, so it hadn't even occurred to me to try his query this way.

Just like the descriptors, the web UI runs in the same JVM as the server. This means that any RMI connections it makes will be back to itself. I kludged this for the descriptors by allowing RMI connections which use the canonical host name (as these won't loop anyway), however this wasn't working for the web UI since it could use any host name, depending on the user.

I was stuck on this for a little while, as I had no idea how to distinguish a web UI connection from any other. Even if I could work it out, then it would still be using the wrong server name. That second problem made me realise that I needed to rename the server before the model could go through. This was already being done within the resolver framework, but it now needed to happen when creating a model as well. This code is in ItqlInterpreter.

While I'm at it, ItqlInterpreter is a terrible place for this stuff. This is the class for taking parsed iTQL and dispatching it to the correct modules. Having it do the work of creating a model seems to be inappropriate, and another argument for getting the resolvers to do it for us.

All the same, I used similar code to the resolvers to look up ServerInfo if it exists, and use the information found in there to rename the server. Unfortunately, the list of acceptable server names was not stored in there, but was instead determined by Database. So I modified this class to set the aliases on ServerInfo once it was determined. Because Database and ServerInfo are in different packages I had to make the setter method public, which I didn't want. The only thing I could do to mitigate this is to make sure that this setter can only be called once.

I was supposed to leave early for our Christmas party, but I ended up staying almost until 5pm before I had this working.

Thursday, December 09, 2004

Tests
Not much today. Mostly a lot of tests and confirming that things were running the way they are supposed to. Since the tests take so long at the moment this seems to take forever, and it is hard to do much else in the meantime (except, perhaps to read).

I started to look at SOFA, but really made little headway.

Everything seems a little slow at the moment. It must be the wind down before Christmas.

IEE
I went to the IEE Christmas Luncheon today, which was a nice diversion. I suppose I go along out of a sense of obligation, as there are very few people my age who go along, or even bother to be a member at the moment. Fortunately I sat next to a really nice guy named Teck Wai, who just a little younger than I am (or I assume he was. I often mistake Chinese people for being younger than they really are).

Since the lunch is also the AGM, the new committee was elected, and I took up a position again. It's not too onerous when you're not on the executive, and you can actually get something done when you get involved. While I was at it I convinced Teck Wai to nominate as well. Another friend of mine seconded the nomination the moment that he saw that I was suggesting a candidate who was aged under 40. :-)

The IEE is keen on expanding their involvement in IT, as many members like myself work in it. I should work on organising a few more events which would be of interest to younger members (like myself) as the main relevance of institutions like this is to introduce professionals in the same area to one another. I'll see what I can come up with, otherwise I may find myself dropping my membership due to lack of others' interest.

Wednesday, December 08, 2004

Catching Up
Back from the conference today, and I spent quite a bit of time catching up on email. One of these emails was from ML about how to traverse an RDF list in iTQL. Curious about his use of subqueries I had a go myself, and immediately found an exception.

It turns out that if a subquery is doing a walk using a blank node as the starting point, then Kowari will throw an exception as it tries to localize the blank node. I know that this has worked for me before, so it has left me a little confused. I had to go and meet Bob at this point, so I emailed AN about it.

Masters
Things went well with Bob, and his recommendation at the moment is to stop reading for a bit and concentrate on writing my confirmation. He suggests that if I have any "holes" in my knowledge in the literature review, then I should leave an annotated space for the moment, and come back to it later.

Another suggestion was to write a semantic matrix for the top few papers. This means writing a list of all the things of interest to me, and then describing each paper according to each criteria. Sounds useful, so I'll be drawing something like that over the next few weeks.

SOFA
There wasn't a lot of time at the end of the day, so I just had enough time to get the latest SOFA version and put it into a local CVS project. Creating a CVS project was new for me, and seems a little clunky, but after a discussion with DJ I agreed that it was probably appropriate to create projects that way. The big issue is that we keep a lot of things in separate projects around here rather than separate modules in a single project, which means that we do a lot more project creation than is really common.

I want to write about yesterday's Evolve conference, but I'll have to come back to it.

Monday, December 06, 2004

Tests and SOFA
Quiet day today. With the latest release happening I was intent to test TKS as much as possible. SR had been having trouble, so I also picked up a clean checkout to confirm that all was well (which it was).

I discovered what had happened to my file modifications on Friday evening. I'd done everything required, but I hadn't removed the org.kowari.resolver version of the directories. So while all the new stuff was correct, the old code was also there. I'd been in such a hurry to leave on Friday that I failed to notice that the files that "lost" their changes were actually in the wrong area. TJ and SR found this problem, as the system was trying to compile these extra classes, and kept giving them errors.

In the meantime I finished reading the SOFA introduction and design whitepaper, and have now started making my way through the API. Next, I'll have to look at KA's implementation of the net.java.dev.sofa.model.OntologyModel and net.java.dev.sofa.model.ThingModel interfaces.

Example Rule
It turned out that the owl:inverseOf statement from OWL Lite hadn't had the entailment iTQL documented for it. This was pretty easy, and I'm surprised that it wasn't already done. However, I updated some of the example RDF, and wrote the iTQL along with a description for it. Getting everything right and tested can take a little while, so by the time I'd finished GN had gone home for the day, meaning it won't make it into the documentation this time around.

This means that the only OWL statements which aren't documented with iTQL are from versioning. Theoretically this can involve inferencing as well (is this document which is compared to be compatible with another document actually consistent?) but we won't be considering that in the first instance. Well.... maybe TJ wants to, but I won't be. :-)

Friday, December 03, 2004

Remote Moves
Today was spent moving the remote resolver out of Kowari and into TKS. This means that distributed queries will now be restricted to TKS only. A distributed query is one where models from more than one server are all used.

The reason for the move is because the remote resolver was never supposed to be in Kowari yet. The only system to have distributed queries in the past was TKS. With the new resolver framework this functionality is being performed by the remote resolver, so this resolver is only supposed to be in TKS.

TKS is built on top of Kowari, meaning that as Kowari changes, so too does TKS. With the recent release coming up, Kowari was moving quickly, and I needed to make sure that the remote resolver was working with the latest build. To create the resolver in TKS would have meant updating TKS all the time, which would have become to expensive to get any effective work done. So I built it all in Kowari, with the intention of eventually moving it all.

I'll confess, it was in the back of my mind that maybe someone would forget about it and it would accidentally make it into the Kowari release, but unfortunately that didn't happen. :-) Not to worry. As TKS picks up new features (the so-called "value-adds") then older features like this will eventually make it back into Kowari. OTOH, maybe someone out there will really want this feature and do it themselves. After all, it is probably the simplest of all of our resolvers. I'm tempted to do it myself, but somehow I think that would be inappropriate. :-)

Be that as it may, the day was spent moving this stuff. That meant changing package names, removing the open source licence, changing config files, removing from one part of CVS and adding to another.... the list went on.

By the end of the day everything was working, and I went to do one final test. However, the build failed to compile, and when I looked, all of the source files in TKS were back to their original (Kowari) form. I don't know how this happened, but did not take too long to fix. The real work had been in the configuration files and the build scripts, so I had it going again in about 5 minutes.

While testing the changes, I read more about SOFA and KA's implementation of it in Kowari. I haven't quite finished the documentation yet, but I've covered most of it.

Thursday, December 02, 2004

Note
I'm too tired to proof read this (or yesterday's). I'll just have to do it in the morning. In the meantime I'd be grateful if you'll skip typos, and try to infer what I really meant when I write garbage. :-)

Final Tests
TJ checked in a lot of things the previous night, so he was keen for me to do a full set of tests again. I really needed to run them again anyway, as my machine had shut down over night for some reason.

This latest test showed up a new problem with the filesystem resolver, but this did not look as simple as the I'd seen yesterday. I showed it to ML, who recognised the problem and assured me that it was not something that I had broken. I didn't think it was, but you never know!

With everything passing, I was finally able to check this code in. I wanted to commit the files individually, so I could annotate the commit for each file appropriately. This meant that I had to carefully choose the order of commits, and try to do it when everyone else was out, just in case I made a mistake in the ordering, and they picked up a half completed set of files.

SOFA
While the tests were running I was able to read a lot more about SOFA. It certainly has a lot going for it. The main shortcomings that I see are that it does not scale very well, and there are a few OWL constructs that it cannot represent.

In the case of the latter, this is not really a problem, as most of the things that it can't do are OWL Full, with only a little OWL DL missing. For instance, there is little scope for writing relations for relations. Restrictions do not seem to cover owl:someValuesFrom or owl:complementOf, and unions on restrictions are not covered at all. However, where SOFA does not permit certain OWL constructs to be represented, often there is an equivalent construct which will suit the same purpose.

The scaling issue is really due to one of SOFA's strengths. SOFA manages to keep all of it's data in memory, such that it knows what kind of relationships everything has to everything else. Our RDF store scaled much better, but there is no implicit meaning behind any of the statements. As a result, modifying anything in a SOFA ontology results in consistency checks and inferencing checks being done quickly and easily. To do the same in RDF means querying a lot of statements to understand the relationships involved with the data just modified.

So while SOFA won't apply well to a large, existing dataset, it works very well with data that is being modified one statement at a time. It's a nice way of dealing with the change problem that I've avoided up until now. Experience with this should also help to apply similar principles to changing data in the statement store. Similarly, it may be possible to apply some SOFA inferences on our data be using appropriate iTQL, making the SOFA interface more efficient.

One way to make SOFA work with larger data sets is to serialize an ontology out to a file, and then to bring it back in via SOFA, but this is not very efficient. For this reason, the need to write a proper rules engine has not been removed. I had been wondering about this when I discovered that SOFA did some inferencing.

Sub Answers
Today TJ discovered a problem with some answer types running out of memory. This occurs when an answer is serialized for transport over RMI. The problem is that an answer might have only a couple of lines, but those lines contain subanswers which are huge.

When I serialized answers for RMI it occurred to me that I didn't really know how large a subanswer could be. I initially worried that a large enough set of subanswers could be too much to be handled in memory. However, I couldn't see subanswers getting too large, and so I went ahead with what I had.

Never commit code when you think that in some circumstances there could be a problem with it. I know that. Why did I choose to forget it?

After a few minutes of thought I came up with a solution. The RemoteAnswerWrapperAnswer class determines that how to package an answer for the network. This decision needs to be replaced with an Answer factory. The factory then makes a choice:

If the answer has subanswers as its rows, then return an RMI reference (higher network overhead, but no memory usage).
If the answer contains data and is less than a configured number of rows, then serialize the subanswer.
If the answer contains data and is larger than a configured number of rows, then use a paged answer object.

This factory needs to be used recursively as subanswers are traversed. This will result in the large chunks of data at the bottom of the answer tree getting serialized effectively, while the tree itself will be traversed with RMI. The result should handle any amount of data. It shouldn't even take too long to implement.

Remote Resolver
One of our "value adds" (I hate that term) for TKS, is to support distributed queries. All queries may be done to remote server, but only distributed queries can refer to more than one server in a single query.

These distributed queries now get handled by the remote resolver. When I worked on this resolver I kept it in Kowari, but now that it works properly it has to be moved to TKS. As new features come into TKS, they may reduce the "high-level" value of the remote resolver, and hence allow it down into Kowari (presuming someone else hasn't rewritten it already - after all, that's what open source is about). But for the moment, it has to stay a TKS-only feature.

AN had been looking to move this code, but was waiting until I finished with the RmiSessionFactory code. However, he was having a little difficulty with it, so now that I'm done with the looping bug I've been asked to move the remote resolver myself.

Other than changing the package names of the code, the only real differences seem to be in the Ant build scripts. By the end of the day I'd managed enough that TKS will now build the remote resolver, but I have not yet run all the tests on it.

In the meantime I tried the tests on Kowari now that the remote resolver is gone. The first thing that happened was that many of the Jena tests, and all of the JRDF tests failed. I agonized over this for a while, but then I realised that I'd been doing the TKS build while these tests were running. According to AN, a TKS build can briefly start a server, which would definitely conflict with any set of Kowari tests which were being run at the same time.

I'm now re-running the Kowari tests, and I have my fingers crossed that when I get there in the morning they will have all passed. Then I just have to see how well (or poorly) the TKS tests run.

Wednesday, December 01, 2004

Class Paths
Well it was nearly working today. The tests which failed seemed unrelated to the changes I'd made, so I was initially quite confused. A little logging eventually showed the way.

The problem was occurring during the constructor to RmiSessionFactory where it gets the URI of the local server from EmbeddedKowariServer via reflection. The confusing part was that it was giving a ClassNotFoundException for the class "SimpleXAResourceException", which at first glance appears to be completely unrelated.

Coincidentally, it was less than a week ago that I was having a conversation with DM about just this. Even though I only needed a simple static method from EmbeddedKowariServer, the class loader does not know this, and so attempts to load up the entire class. This includes having to recursively load all the classes for the return types and the exceptions. The SimpleXAResourceException was being thrown by the startServer method, and since this class wasn't available, the reflection code for this class failed.

There were two approaches to this problem. The first was to make sure that all referenced classes are available. However, this is fraught with difficulty for two reasons. The first is that I'll only discover which classes are needed at runtime. So even if I figured out where I needed to add SimpleXAResourceException, I could simply end up with a report on the next class which wasn't found.

The second problem is that it becomes difficult to know that I caught everything. In some instances the classes are all available and everything runs flawlessly. In other parts of the code some classes are not available. I don't know where or when the classpath changes, and while I might be able to get it working for every code path run in the tests, it's always possible that a new type of usage will call this code again without some required classes available.

The other approach was to factor out all the static information from EmbeddedKowariServer and put it in a simple public class specifically designed for handling this information. This works much better, as it does not have any dependencies on non-java packages. It also has the nice effect of putting all the server configuration info into one place.

The class I built for this was called ServerInfo. All methods and variables on this class are static. The getter methods are public and the setter methods are package scope, the intention being to only call them from EmbeddedKowariServer.

As usual, the tests to make sure all was well took a very long time to run on each occasion. In the meantime I used the opportunity to learn more of the SOFA API.

One little hiccough that I encountered was with the "filesystem" resolver. Fortunately, it turned out that it was just a JXUnit test that was getting back XML which differed from what it expected. The problem was that someone had checked in their own computer's name hardcoded in the file. I checked the CVS log to see who was the culprit, and I discovered that the problem had already been recognised and fixed.

With everything apparently going, I did a final CVS update and started the tests again for the night. At that point, the fact that it was all working at last, and a headache that I'd developed in the meantime convinced me to leave a half an hour early. :-)

Working notes