Thursday, April 15, 2004

Beta release day. Fortunately the Ontology API was not required, but I was hoping to get it done anyway. However I'm still working on it.

It turns out that the Ontology API is iterating over the underlying dataset in order to inferencing. This is horribly inefficient, and now I understand why we want to implement OWL with Kowari queries. If we can make it work then we'll have several orders of magnitude speed improvement, and scalability over Jena.

I was able to determine the first part of the problem. When returning anonymous nodes to Jena we always wrapped them in URI resource objects, so Jena had no way of working out what was anonymous. That's been fixed, but it might cause problems for other parts of the system that expect to see URI objects. SR and AN agree that I should continue on even though it might break the existing system, as this is the more correct way to do it.

Jena wants to use its own internal identifiers for anonymous objects, so I've been creating these objects and storing them in a map from Kowari nodes to anonymous objects. However, I've yet to find where Jena tries and maps its internal identifiers back to the original anonymous nodes. This is a problem as the very first query that the OWL example attempts finds and anonymous node that appears to have no properties because it doesn't know how to map them back to the original nodes in Kowari. I'm tracing this in JDB to find the attempted mapping.

In the meantime, the old RDFLoadUnitTest bug showed up again today. It works fine on a faster Windows machine, and refuses to work on a slower box. I added some roll-your-own logging to one of the files, and saw some very strange behaviour. By adding a few print statements here and there I've caused one failing test to start passing, and another to stop working. Other times I've seen a log claim that a value was 5, but when junit tested the *same* value it reported 7. This sort of thing is normally caused by threads, but there are none in use here. The nature of the errors seems to indicate that there are some file flushing problems in Windows, but I'm not sure. I'm considering giving each test its own file, and trying to make them work that way, but I've tried this before and I'm not confident.

No comments: