Friday, April 16, 2004

Writing this after the fact, so my memory may be a little faulty.

The first item for the day was the ongoing problem with RDFUnitTest. The timing problems here are quite frustrating, and offer little direction for things I should be trying to get it fixed. When DM suggested that he could look I was only to happy to let him. This let me get back onto the more interesting task of debugging the Jena Ontology API on Kowari.

Spent most of the day using ddd/jdb to trace through what is going on with the Ontology code. Unfortunately ddd for Fedora seems to be slightly broken in that it refuses to set the current execution position, either with glyphs (the arrow) or with text. This meant that I had to manually select the current line in the source code window, so that the correct source would be shown. I also typed "list" a lot, just to see the context of my current position. This was only marginally better than using jdb on its own.

The resolver in the Ontolgy code is failing when looking for a property called "first". This is the property used for the data in each element of a list. The problem occurs when first is supposed to return an anonymous node. In this case, no node is found at all.

Tracing through what happens, it became apparent that every single check for properties like this is performed by forming a triple like , where node is the node you want the property for. Jena then iterates through the entire database looking for all matches. This is really inefficient, and demonstrates enourmous scope for improving scalability and performance. Indeed, it seems that once the property is found, it iterates though the entire database again, based on the results of the first find.

The problem we are encountering is due to our mapping nodes into the database. It seems that Jena is keeping it's own RDF structure internally, which is how it knows the relationships between each node (though I've yet to completely confirm that). Jena finds the anonymous node that it needs, and then asks Kowari to give it some details on this node. Kowari then asks the JRDFFactory that it keeps for the details it stored on this node.

JRDF is the interface than AN wrote to let Jena communicate with Kowari more easily. It keeps a cache of the nodes which have passed through it, and AbstractDatabaseSession asks this interface for the nodes when Jena needs them. However, the nodes in question did not go through JRDF, and instead went through the Jena wrapper that AN wrote. After discussions with AN, it was determined that both the JRDF and Jena interfaces should be keeping their own mappings, as they each map nodes in different directions. To do this, the JRDFFactory was removed from the database session, and put in the current model along with a new non-static JenaFactory. Both of these factories are then passed to the database session whenever they are needed. The factories also inform each other of all of the nodes they have used, so that reverse mappings are always kept in sync. Unfortunately, an initial run of this was not successful. I will have to check the contents of each map to see if all the nodes are present. If so, then I'll need determine why they weren't found when needed, otherwise I'll have to determine why they weren't put in there in the first place.

No comments: