Friday, June 15, 2007

Debugging

It's a dilemma I've suffered for some time. Do I blog about getting stuff done, or do I get it done? This has had an influence on Mulgara for years, and in more recent times has also been a conflict for university.

Blogging about Mulgara is typically useful, as it gives me time to take stock of what I've done, and also which direction I should be going in. However, for university, it's really substituting one form of writing for another. Comparatively, my blogging lacks coherence and has no formal value at all. But it's not entirely useless, and when I factor in the advantages, particular for Mulgara, then blogging usually seems to win for me. I suppose this is the basis of my idea of micro-papers. At least the ideas would get written down.

All the same, there's no point in writing about work if you never do any. So tonight was a big debugging session, to follow the big debugging session I had today. Right now I'm running tests, so blogging is suddenly acceptable again.

N3

We've had an issue about N3 files not loading correctly. Every time a blank node appeared more than once, a new one was generated for each one, rather than reusing the original. I know that this code worked in the past, so this bug was really bothering me. But like everything else, it was getting pushed down the stack. However, I'm keen to get the Mulgara release out soon, and this was an important one. Another incentive was that the same non-reuse of blank nodes was happening when doing a select/insert between models on separate machines. Since I only just wrote the code to make that happen, I to get that code working too.

I started out by setting a breakpoint in StatementStoreResolver.modifyModel(). This gave me a starting point. I should mention that parsing occurs in a background thread with synchronization happening between appending triples to the queue, and removing them from the head. Apparently threading can be an issue, as Eclipse gave me some real grief trying to do simple things like "stepping" over a line. Sometimes pressing the "step" button would gray all the other buttons out, and you couldn't re-enable them withotu doing something weird, like moving up and down in the call stack.

Regardless of the vagaries of Eclipse, it didn't take long to discover that it was expecting all blank nodes in N3 files to be of the form _:### where the # characters are digits. However, all the files I had were actually in the form _:node###. I've put off fixing this expectation, since the code is expected to manage with non-numeric identifiers, and I wanted to check that it was working first.

It quickly became apparent that the author had expected to not see strings at all, especially when this possibility led to an insertion into a map with the following comment:
// don't expect to use this map
Eventually I noticed that after a blank node was put into the map, any attempt to retrieve it would always return a 0, indicating that nothing was there. But what was going in if zeros were coming out? It turned out that all the blank nodes being created had internal IDs of 0, so the number being stored was a 0, which was like storing nothing at all.

I had a hint that the blank nodes being created should have been given identifiers from the node pool, because the code contained some very suspicious comments:
  // need a new anonymous node for this ID
blankNode = //BlankNodeFactory.createBlankNode(nodePool);
new BlankNodeImpl();
Now it looked like this was wrong, but someone had apparently changed it for a reason. Could I change it back with impunity?

It took a long time to work out exactly what was happening with all of the data, but I finally tracked it all down, and was able to satisfy myself that I needed to create a blank node in the way it used to be done. This left me with two questions. First, who made this change, why, and when? Second, where could I find the node pool to allocated new node IDs?

As for the who/why/when, I can't really say. It occurred sometime between Kowari-1.0 and Kowari-1.1, along with a lot of other refactoring. Now that Kowari's CVS logs have departed for the big Sourceforge bit-bucket in the sky (ie. the project is now "inactive") then there's no way to find out who. And without the who, I can't really work out the why, since I can't see why it was done.

The appropriate place to get a new node is out of the string pool, since this is the object that manages the node pool. It makes sense too, since adding URIs and literals means allocating a node, and mapping it onto a value. Creating a blank node just means allocating a node, and not doing anything else with it. Unfortunately, the ResolverSession class which accesses the string pool for this kind of functionality doesn't provide the facility to allocate an unmapped node. I hate making changes across a large number of classes (ResolverSession is an interface with a lot of implementations), but once I satisfied myself that it really is a general operation I went ahead and did it anyway.

Of course, it's never smooth sailing, and the method name I chose (newBlankNode) conflicted with a private one already in JRDFResolverSession. So there was yet another class to be modified, but in the end it all worked properly. I can say that now, as the tests finished a few minutes ago.

Unfortunately, this is a hard test to automate. We can never guarantee what blank node IDs we will be given, and all our scripted tests rely on exact string matches. To avoid this problem in the future may need something a little trickier.

Hopefully, this has fixed the distributed insert bug as well. I'll just do a quick test of that, and then I'm off to bed.

No comments: