Wednesday, May 10, 2006

Work, work , work

Yeach. Long hours in the last few weeks have left me feeling wrung out. I tried to make time for family through this (not always successfully), so I didn't have time for anything else. The first thing I tried to catch up on afterwards was exercise. This blog came way down in the list of priorities.

Bugs

I discovered a couple of problems in Kowari in the last couple of weeks. Unfortunately I had goals to push towards, so I haven't been able to look at them yet. I'll get back to them, but in the meantime I'll write them down here for others to look at.

Full Text Searching

One thing I wanted to do (fortunately I didn't need to do it at the time) was full text indexing and querying. This is straightforward using a Lucene model. Unfortunately I couldn't get it to work! All my queries came up with empty results.

I figured that I must have been doing something wrong, so I went to the examples in the documentation. I couldn't get any results here as well. So now I think that something must be wrong. I haven't had an opportunity to run the full test suite lately (I really only have a notebook, and it's slow enough without having to run the tests in the background), but I'll have to run it soon to make sure everything is still OK. If it is, then I'll be running Kowari through a debugger to figure out why I can't get any results from Lucene models.

Normally I'd have my big Linux box doing these tests for me, but it's still in a shipping container, and I've been told that I won't see it before June. Ask me if I'm frustrated.

There is one other machine that I could use, and that's the Mulgara server. I haven't discussed Mulgara here, so I'll discuss it more later.

Datatypes

This bug doesn't affect me at all (at the moment), but it's surprising to see all the same. I discovered this bug when scanning an model which restricted a class using an owl:cardinality of "1". The data type for owl:cardinality is xsd:nonNegativeInteger, so that's what I used. So far, so good.

Then in some test code, I inserted a series of numbers into a Sequence. These were just 1, 2 and 3, and each of them was typed as xsd:int. However, when looking at the full contents of the model I was surprised to see the following numbers in the sequence:
  "3"^^<http://www.w3.org/2001/XMLSchema#int>
"2"^^<http://www.w3.org/2001/XMLSchema#int>
"1"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>
So if the number was already in the string pool as a "nonNegativeInteger", then it just picks up that value again, rather than re-inserting it.

This will be OK for simple cases like this, but it won't work in the general case. Looking at the hierarchy of XSD data types you can see that int and nonNegativeInteger are both subtypes of integer, but they are not in the same branch. So neither can be used in the place of the other, unless an application specifically calls for an integer (their closest common ancestor).

So having the string pool automatically convert one to the other is going to be a problem one day. I've already started checking literal data types, and need to make an exception for this case. I can certainly conceive of a system which would fail a necessary consistency check because of this.

Now I'm left with the option of fixing it now, or waiting to put it into the XA2 store. I still don't know when we'll have time to get to XA2, so maybe an interim fix is the way to go.

RMI Connections

By far my biggest problem is RMI connections. I've seen posts complaining about how slow Kowari is, which I find really frustrating. Queries in Kowari are nearly instantaneous, and insertions are pretty fast as well. So why do people think it's slow? The answer lies somewhere in the Session interface.

For some reason, when the first query is performed on a Kowari database, the Session can take several seconds to establish a connection. This happens whether or not the system is remote or local. I have yet to profile this delay, but it all seems to be in the RMI code. I can't imagine that RMI is that bad, so I'm guessing that it's doing a DNS lookup in there, though I may be wrong. Either day, it's an unacceptable delay. We need some way to speed this up, particularly when running on a local machine. I put a lot of work into making Kowari really, really fast, and the frustration I feel at these network layer delays nearly drives me to tears.

Another problem is the what happens with an error from the ItqlInterpreterBean. It seems that a single error (often avoidable as they are normally caused by programming bugs) can render the Session nearly useless. There is some kind of synchronization issue after an Exception is thrown, that a Session can't seem to overcome, at least for a couple of operations after that. I'm not sure why this is, but it bears investigating.

A lot of these problems seem to come down to RMI. I don't know what the best replacement technology would be (SOAP will also have DNS issues, etc) but we need to look at something. Whatever we choose, we will probably need to bypass it when accessing a local data store. At the moment we do this transparently, but perhaps we should consider something a little more optimized for local connections, since this forms the majority of the use cases I've seen.

Mulgara

As David Wood pointed out, we are forking the Kowari Project. The new project will be called "Mulgara". We felt this was necessary to keep developing the RDF database when there was uncertainty due to the relationship between Kowari and Northrop Grumman.

Some of the problems appear to be caused by a misunderstanding of Open Source Software by Northrop. This has made me reluctant to discuss Mulgara until now, as I didn't want further misunderstandings to develop. Fortunately, we've had legal advice from several quarters confirming our right to fork the project (since the project is licensed under the MPL).

My employer, Herzum Software, has offered to host the new project, putting up a new Linux box for this purpose (running Fedora Core 4). Unfortunately, the administrator for the machine has been overseas for the last couple of weeks, and with my recent long hours neither of us has had time to configure all of the necessary services. I hope to change that this week. In particular, I've discovered that Yum is not a very good tool for installing Mailman (it didn't even notice that there was no MTA installed!)

Some of the services are already going, such as Subversion, but I still need to organize a certificate (at the moment it just provides the "localhost" certificate that comes standard with Fedora Core).

Check back soon and we should have it all running.

1 comment:

Paula said...

I'd like to think you're right. After all, a literal can only be inserted if it's in the correct range for its data type (you shouldn't be able to insert "-1"^^<xsd:nonNegativeInteger>, or "1024"^^<xsd:byte>). So from that perspective it can't break any applications.

But what happens when the range of a predicate is set to xsd:int, and it gets used on a xsd:nonNegativeInteger (let's say it's the value "1")?

Come to think of it, entailment would infer a new type of xsd:int. If it's a value in the range type (which it must be) then this is a valid entailment, even though the type hierarchy does not indicate that this is possible.

I suppose this shows how the type hierarchy is not a class hierarchy. Going further down a branch leads to subsets of the elements higher up, which you'd expect from a class hierarchy. The difference is that parallel branches can have intersecting sets, which you don't expect from a class system.

Look at me conflating the elements of the XSD type hierarchy with a class hierarchy. I've been writing Java for too long. :-)