Thursday, June 30, 2005

Axioms
Well the test didn't work, but I had to go to bed. When I got up in the morning yesterday, I spent 15 minutes looking at the problem, and discovered it wasn't really a problem. The code was working just fine, but I had a typo in the RDF for the axioms. Doh! :-)

It's really impressive how many entailments come out of the axioms alone. There are 33 axioms and this was increases to 120 statements after the entailments. When I ran the RDFS rules over the RDFS model (and it's OWL description) then it turned 643 RDF statements into over 1259 statements.

It was while I was looking at this that I discovered how useful entailments are, even simple ones like this. For a start, all of the types suddenly become available, even those that are not immediately evident. This in turn showed up a couple of simple errors I had in the RDF which had slipped past me before. It turns out that entailments from bad data can lead to a lot of statements that are very obviously wrong. Quite handy.

I didn't get to work much more on this, as I needed to get back to the NGC preparation.

XA Store
Yesterday I was lucky to have DavidM online while writing Javadoc for the transaction layer. This helped me to learn a lot about the newer parts of the code much faster than if I'd had to do it on my own. This will definitely help with the handover.

Today I spent several hours working out a timetable for the handover. I expected to do this in a couple of days, but DavidW needed something from me quickly. So I tried to write an email with a broad outline. However, I ended up writing a lot more than I thought I would, and I think I'm halfway to the full preparation. :-)

One more thing I'd like to do is to write up some diagrams of the file formats. This is all available by reading the source files, but they have to be carefully interpreted to work it out. Having diagrams on hand should be very useful.

Well I'm exhausted tonight, so I'll leave it here and get to more Javadoc in the morning.

Tuesday, June 28, 2005

TKS
For anyone who missed it, the rights to TKS have been purchased by Northrop Grumman. Importantly, they've agreed to support the Kowari project. Wow.

I've agreed to do a handover of the code, and as a part of this I'm writing Javadoc for the XA store. I helped write this over 4 years ago, so I sort of know it, and I sort of don't. DavidM has modified a few things since I was last in that code, so it's been a learning experience for me. Fortunately the process of Javadoc is helping me learn it all.

I'll be flying to the US next week on the 6th of July, and will be returning on the 27th of July. I'm not looking forward to missing Luc (and Anne). On the other hand, I love traveling for work, particularly to the States, so it won't be all bad. It will definitely slow my blogging down though.

Rules
In the meantime I'm still working on the rules engine and the tests. It appears to work remarkably well. Must be all that effort I put into it. :-)

Since I have to work all day for NGC at the moment, I'm spending my evenings (and the upcoming weekend) working on the rules. All that is left of my paid work is the proper set of tests, and perhaps a short document describing how to write a rule configuration (in case something more than RDFS is desired). I'm pretty sure I can get all that done before I fly out.

Many of the tests specified by the W3C are quite trivial, but they reminded me that RDFS requires axiomatic statements. That is fine, as I could just load them into the destination as a separate RDF file. This bothered me though, as it's a very manual process, and a necessary one for many rule systems. So rather than expanding on the tests tonight I've been encoding the axioms into the rules engine. Now the RDFS rules file includes all the axioms, and the rules engine knows how to read them in and insert them at the beginning of execution. I still need to test that this is working correctly, and it is compiling as I type.

It's now 1am, and I have to give Luc his bottle when he wakes up in the morning, so I'd better run this thing and get to bed (please work first time!)

Monday, June 27, 2005

Tests and Improvements
The tests for the rules seem to be going fine. I could continue to write tests for it forever, with each one getting more complex, so it's hard to know how far to go. After all, they can take a long time to write. At the moment, I'm looking to test each RDFS entailment one at a time. Eventually I'll be moving on to how entailments interact, but I'll only go so far with that, as I really want to move on to the next part, which is OWL entailment.

Still, the code (without tests) is checked in. Have a look in the "rules" directory for an example of running the RDFS rules against the RDF for the RDFS rules. ;-)

I had a talk with Andrae today about how RDFS/OWL and how the Rules engine in Kowari works with it. I also showed him some of the internal structure, and he had a couple of ideas that will help with efficiency. I'm starting to accumulate a list. It's frustrating, as I want to make it all as fast as I can, but I also want to make it as functional as I can, and I can only work on one objective at a time. :-(

The efficiency things I want to do are:

  • Make the various implementations of Tuples.getRowCount() run in log time, rather than linear time (this needs a count in the nodes of the AVL tree).
  • Add a new version of Session.insert() which accepts an Answer rather than a query
  • Properly merge the source and destination models for rules if they are the same model
  • Keep a map of Constraints to their counts for each rule, and ensure that answers are using these Constraints.
There's more, but I can't think of them at the moment.

Javadoc
I've also been adding some long-needed documentation to the storage layer.

I pair programmed with with DavidM starting 4 years ago. I worked with him on it for over a year before moving onto other things. So I mostly know how it works, but there are subtle changes since I saw it last. Plus it's been nearly 3 years since I last worked in that area. Back when this code was written, David and I were really pushed for time. To help us get it done in time we were told that we could skip the documentation. It allowed us to meet the deadline that we had, but of course we were never given the time to go back and document it like we were supposed to.

It's been an interesting experience coming back to something so complex after so long. I don't remember all of it clearly, but this process of documenting the methods is really helping me work it out. It's also interesting discovering some of the changes which have come about since I last worked on it.

Saturday, June 25, 2005

Whew
The rules engine works. Yay. :-)

While thinking about the RMI problem on Thursday night, I realised that I do want the rule configuration to serialize (I use the American spelling, with the "z" here, since all the interface and method names are spelt that way). The reason for this is because the server with the rules might be different to the server with the data.

My concern with serializing like this is unnecessarily serializing from the server to the client and back again, particularly when the client is on a separate machine to the server. Serendipitously, I'd already wrapped the rule structure in a remoteable class. This works well, because nothing gets moved to the client at all. When the server gets this reference and asks for the rules then this is the only time that the rules get serialized. If the two servers are separate then the RMI serialization is necessary, and this approach works just fine. If the server for the rules is the same as the server holding the data, then the rules will still be serialized, but it only happens once, and the RMI transfer is within the one JVM, so it should be very fast. Either way, there is no unnecessary transfer to the client.

It's amazingly satisfying to see the entailments being generated. :-)

Merging
For stability purposes, I've been coding against a version of Kowari that I picked up a couple of months ago. My code has been largely independent of other modification in the system, so this should be safe. Nevertheless, I'm about to update everything and confirm that it all still works.

Once that's done, I'll need to tidy up my tests in order to check everything in. Since it's all working (and can't break any other components), I might check it in without all of the tests. That's because a few people have been asking after these rules. Hopefully they'll like them. :-)

The algorithm for running the rules is not quite complete. At the moment each rule is dependent on finding a change in the total result of a query in order to determine if it needs to run. This requires processing, even when the rule does not need to run. The completed algorithm tests each individual constraint for changes, before testing the joined result. This means that there will be no join processing for (almost all) rules that don't need to be run, and no additional expense for those rules which do need to be run. I expect these changes to take just a few days, but they are not needed straight away.

One feature that might be nice to add would be to keep a total of all entailed statements and return this to the user when done. I'll look at doing this soon as well.

Eclipse
I've gained quite a bit out of using Eclipse, but it's been frustrating me lately. I often find myself typing up to a line ahead of the rendering, and often have to wait for the IDE to finish processing before I can do anything. As a result, I've started using VIM again whenever I want to do fast changes. I'll have to work out some way to use Eclipse again without having to spend half my day waiting for it.

Maybe it's just that my notebook is too slow. It's only a 1.33 GHz G4 (with a GB of RAM). This isn't as fast as many desktop systems, but it's quite responsive on every other application I use, so I expected that it should be fast enough.

Last night I had a go at running Eclipse on my Linux desktop, and piping the display to the X server on this notebook, but that kept crashing. On the Linux-GTK setup it crashed with a GTK Window error whenever I selected the "File" menu. So I tried the Linux-Motif setup, but that caused a Hotspot error. Yuck. I haven't tried it on the local display, plus I was using Java 1.5, so there are a few things that could be causing these errors. I guess that will keep me occupied for a day or so, when I get the time to look at it.

Thursday, June 23, 2005

Slow and Steady
I've been plodding along with the rules engine over the last week. Unfortunately, I'd initially planned to be done by now (not planning on getting sick, etc), and so I'd already committed myself for some other modeling work. That meant that I had to juggle the two together. However, I really need to finish the rules work quickly. I want to get it working, so I can move on to the next part of OWL support (needed for my thesis) and also because I won't get any money for my last couple of months until it's done. I think I'm actually supposed to invoice for time rather than the product, but I made a commitment to complete this phase, so I'll have it working before sending in the invoice. I just hope that the payment won't be too long after! :-)

After this last week, the RDF seems to be configured correctly, the code reads it all as required, and the elements all work properly. But it's tough getting them to work together in the way that I want. I'm half tempted to glue it all together in a way that I've already seen work, but I need the more general solution. At the moment, the problem is RMI. That figures.

When I tell ItqlInterpreter to apply a set of rules, I want to read the rules structure and return a result to the ItqlInterpreter. Then ItqlInterpreter can change it's current session to point to the data to be worked on, and run the rules. This all seems to work (it's not fully tested, but what I've done so far works) if I have it all occur in one step on the serve, but then the rules can't be separated from the data to be worked on (though I am presuming it is all found on the same server). I need to make it happen in the two steps I've outlined.

Unfortunately, I'm having real trouble getting the rules structure to pass over RMI correctly. By default, getting the rule structure from ItqlInterpreter would serialize the structure and create it in the client space. This has two problems. The first is that it's inefficient. There's no reason to move all that data over to the client, since it will only ever get used at the server. The more important problem is that the client would need to have access to the rule structure classes for de-serialization, and I'm trying to keep the client completely oblivious to all but the interfaces.

The better way to manage this situation is to pass back a remote reference to the class. I struggled for some time getting this right (I forgot how annoying RMI can be in this regard). To simplify things I decided not to ship a reference to the entire object (since the methods on that object should never be called remotely), and instead created a remotable wrapper to hold a local reference. The remote reference to the wrapper can be shipped over RMI, and when it comes back it can be queried for the local object that I want. Only that's not what I'm getting.

It took a LONG time to get a stack trace that was useful (it's being completely hidden in RMI, and was never being thrown from where I thought it was being thrown), but I finally worked out that the problem is coming from the server when it tries to extract the local reference to the rules structure. At this point it tries to serialize the rules structure, which is not legal (intentionally so). I believe that this is because the object doesn't know that it is now back on its machine of origin, and does not know that it can just pass back a local reference.

Perhaps I'm approaching it all wrong. Now that I have the RMI compiling and running correctly (this took me some time) I could possibly drop the indirection of the wrapping object, and pass back the rule structure as a remote reference (like I originally tried). My only concern is that calling run() needs to pass in a DatabaseSession, and I want it to take the local session without trying to serialize it. I'll have a go in the morning anyway, and see what happens. In the worst case I can always keep a local map of the remotable objects to the local rules structures they represent. Then when an object comes in for a "run request" I can get the local rules out of the map.

It's late. I'll think about this overnight and have a go in the morning.

OWL
Bob asked that I work with him on looking at extensions to OWL. He knows I've been thinking about this lately, and it turns out that he has some ideas as well.

Yesterday was the first opportunity I've had to really see how he works on these problems, and I have to say that I'm impressed. While he doesn't have a strong background in OWL, he is able to draw on a good formal understanding of category theory, E-R diagrams, and other areas. It allows him to see where OWL is missing functionality, or how certain functionality might be achieved using OWL constructs. There were a few occasions when I could suggest using an OWL or RDF construct to achieve something, and he could quickly show why this would or wouldn't work, and if it wouldn't work then why not.

The most impressive thing is that he really knows the boundaries of his knowledge, and he knows where to go looking when he doesn't know something. Conversely, I don't know how much I don't know,. Even when I recognise that I need to learn more about something, I don't even know the name of the field that I need to learn more about. I guess that just comes with experience.

The two big things to come out of our conversation was Cartesian product classes, and predicate composition. Cross products allow for several important relationships, but most importantly they would permit the description of a pair of relationships which may be individually repeated, but together must be unique (like composite keys in a database table). Predicate composition ends up covering a couple of the ideas I've already described, such as Euclidean predicate relationships, only it does it in a single construct.

I know that Ian Horrocks has explained that things like predicate composition can lead to undecidability, which needs to be considered for OWL DL. While that is no problem for OWL Full, a lot of people are more interested in OWL DL for the moment, so I think that it is important that any new constructs have some use in OWL DL. However, decidability is not something that either of us know a lot about. Fortunately, I've been given a reading list by Guido, plus I have those pointers from Ian. I just need to find some time to sit down and read them! :-)

Tuesday, June 14, 2005

Quiet
Just like every night lately, I'm tired... so this will be short.

Testing and debugging went well over the last week. Since I found the dynamic compiling and error reporting of Eclipse to be so useful I decided to sink a bit of time into making the project integrate properly. I made some real progress, but it's still not done completely. Fortunately Eclipse still lets me run programs when errors are present in the build, otherwise the work I'd done would be a waste of time. It would still be nice to get everything working though. In the meantime, I decided I've spent enough time on it for now, else I won't be saving any more time than I've spent.

There are so many details that I learnt in setting this up, and it's annoying that I didn't blog them all as I went... particularly the problems. For instance, CVS didn't pick up one of the directories on SourceForge. Why not? No idea. But I could use the CVS perspective to "Check out" a subdirectory to an already existing project. So I just had to find the missing directories, and give them the correct destination.

The debug environment that I have running now lets me use breakpoints, and lets me properly analyse Kowari while it's running. The last class loader (with embedded jar loading) prevented this from working, so it's a significant step over what I used to have. However, it's still annoying that I can't get Log4J working properly. I've discovered that the correct config file is loading (I put an error into the XML, and looked for an error message), but it ignores the "Category" statements which stipulate that Debug messages should go to the appenders. The debugging environment helps here, but logging is still important (especially when you consider how slow Eclipse is). For the moment I use warning messages instead of debug messages, and I'll have to remember to change them all back before checking in.

Anyway, the configuration now works (of course, the problems were all in those areas I tacked on to the RDF at the last minute), and I've traced all of the problems, which I'm currently working on. It should be finished by the end of the week, but I still need time to write all the tests.

Public Holiday
Yesterday was a public holiday here (the Queen's birthday - not that anyone seemed to care about that). I worked for some of the weekend, and I planned on working for Monday as well, but Anne also needed to work. She's supported me a lot with work lately, so I agreed to look after Luc on my own. I spend a lot of time with him every day, but yesterday was the first time in a long while that I spent an entire day with him, where he was the centre of attention. No internet access, just taking him places and doing things with him. He's doing very well at the moment, trying out new words and solving lots of everyday problems.

We had a great day, and I'm pleased that I was "forced" into it. I'd better concentrate on work for another few weeks, but when it's done I should take another full day like that. It's good for the soul. :-)

Monday, June 06, 2005

Substantive Coding Done
I spent the last few days concentrating on coding rather than blogging, as I have been getting close to the end. I've now finished the substantive coding for the first iteration of the rules engine. That doesn't mean it's done, but I feel good about it anyway. :-)

What I mean by substantive effort is that the code is all written, and it compiles... but that's it. Except for iterative testing during the process of coding, I haven't even tried to run it yet. I plan on starting the debugging tomorrow. This always takes time, but just starting on the debugging stage makes me feel like I'm in the home stretch.

The current implementation is based on performing the equivalent of a "select", followed by an "insert/select" if needed. This is a lot like the original proof-of-concept code, only now it is fully integrated and is being done on the server side. This is not the ideal implementation, but it will work, and should demonstrate that everything is implemented correctly. It should work pretty quickly too.

I think I'm on schedule for completion of this round of the paid work, but a few sick days recently mean that I'll need to go beyond my scheduled finishing date. I believe I'll have everything written that I'm supposed to have written, but in reality I'm being paid for time, not for completed code. The other reason to keep working on this full time is that I'm enjoying it! However, I can only afford to go for the extra time needed to make up those days I was sick. Hopefully the system will be so useful and show so much promise for further development that someone will pay me to do more of it! (Well... I can dream, can't I?) :-)

Once I have it working, and RDFS is executing correctly, I can move on to the next stage. This version will count the size of each individual constraint, rather than the size of a completed Answer, making it much more efficient. More importantly, it will match the design I'm writing my thesis about! The initial code to do this should take less than a day, but I'll need to spend some time in the query engine to make sure that constraints are being cached and re-used correctly. I'm not sure if that time should be considered "coding" or "debugging", as I'll be using constraints in a manner slightly outside of their original design (which feels like re-designing and coding), but I'll be approaching any problems like I would any unexpected error (which is debugging).

Rules vs. Ontologies
A more pressing concern is the need to make transitive constraints accept a variable predicate. This is not needed for RDFS (since the only transitive predicate is rdfs:subClassOf) but it will be needed for OWL. Once OWL is introduced, the specific rule for transitivity of sub-classes can be dropped in lieu of a declaration in OWL that rdfs:subClassOf (and owl:subClassOf) is a transitive property.

This brings me to a point that I've been thinking about for a while. I sort of understood it before, but I think I've only just started to really get it. What does an ontology language give us that we don't get from rules? After all, ontology inferencing (and consistency checking, but I won't go there right now) is performed by rules. Rules also allow much greater flexibility than we can achieve with a ontology languages like OWL. The commonly cited example of OWL's limitations is that it can't express the "uncle" relationship. An "uncle" relationship is relatively straightforward. If person A has parent B, and person B has brother C, then A has an uncle C. This is easy to describe in rules, and impossible in OWL.

If we can do everything in rules, and OWL is limited, then why use OWL?

The answer (for me) is demonstrated with the transitivity of subClassOf. If we just had a rule system, then we would need to have a rule for inferences on this predicate. ie:

  if A rdfs:subClassOf B
and B rdfs:subClassOf C
then A rdfs:subClassOf C
That's fine, but what about "less than"? We need a new rule:
  if A < B
and B < C
then A < C
How about "greater than"? New rule. "Equal to"? New rule. Every time a new transitive predicate appears we need a new rule to handle it. This means that rules have to be very domain specific. They can't handle anything that wasn't known about at the time they were created.

However, using OWL a predicate can be declared to be an owl:TransitiveProperty. Suddenly we have just one rule:
  if property is transitive
and A property B
and B property C
then A property C
Whenever a new property is introduced which is transitive, then we can just declare it in OWL. Of course, this goes for all of the properties of properties that are definable in OWL. So the ability to describe the properties of a property means that we can write generalised rules to make deductions on them.

I've sort of understood this for a while, but it was only while thinking about Euclidean properties that it finally crystallised for me. Ideally, it would be possible to assert something like:
  parentOf isEuclideanTo siblingTo
So we could end up with a rule like:
  if property1 isEuclideanTo property2
and A property1 B
and A property1 C
then B property2 C
Of course, isEuclideanTo would not be symmetric, though it would be possible to infer backwards on it (a person's parent must be the same as their sibling's parent).

There are more complex types of relationships between relationships. Uncle is a good one to demonstrate this, as it requires 3 different types of relationship which are all related to each other. While possible, the RDF required to describe something like this is starting to get messy. Complexity is introduced when you realise that one of the relationships can actually be either "brother" or "brother-in-law". Also, an uncle relationship can be deduced, as can a nephew/niece, but the parent in the middle of the relationship cannot be.

All the same this kind of knowledge about properties is something we use every day. If an ontology is to describe real world objects and relationships then it will need to be capable of describing relationships between relationships.

With this in mind, I ask the OWL mailing list what people thought of such a construct. I half expected to be shouted down, but at least I'd get to find out why. Instead the response was encouraging. Ian Horrocks (who wrote half of the papers I cite) explained that property relationships are indeed useful, but that care must be taken to ensure they are decidable. He's suggested I read one of his papers on the topic.

Who knows? Maybe one day OWL can include something like this.

Friday, June 03, 2005

Modeling
Sure enough, the sore throat developed. I'll sure be glad when my immune system is used to seeing all these bugs that Luc brings home. It slowed me down a little in the last couple of days, but I still wrote a lot of code. Well, I think it was a lot. :-)

The main impact was that I didn't go out to exercise (which I really need to do in order to work efficiently) and I was also too tired to blog. I'm too tired again tonight, but sometimes you have to push the envelope.

The code over the last few days has been building data structures based on RDF. I had already done some of this work before, but it turns out that integration has needed a more thorough exploration of the data store.

I've realised that there are two ways to write this code. I can write it with a full knowledge of the data structures I'm reading, or I can write it with very little knowledge, and use the ontology to tell me how to build the data structure.

I started out by using the ontology of the rules just a little, while mostly relying on my own knowledge of the data structure and putting that explicitly into the code. This is fine, but it's not very extensible. It was while building the constraint tree that I started to see some other potential problems with the explicit approach.

Each node in the constraint tree is either a leaf, or refers to two or more child nodes. To query an RDF structure about a tree of arbitrary depth it is necessary to get all the links from parents to children as a set, and to connect them together. As each node is found for the first time, the associated Java object must be created to go with it. A problem can arise here if the node is of a type that extends another concrete type. In that case it is necessary to put off creating a node until all of its types have been found, and to then build the most specific type. This is where the ontology starts to play a part.

For a start, simple RDF will just give me the concrete types of the nodes, with no information about the superclasses. It is only with an ontology that the other types would become available (I don't know about anyone else, but I'll probably end up running the inferences against the rules themselves, just to see if I can). At that point I'll need to structure the types together (using transitive owl:subClassOf) and check that there are no loops (except the obligatory subClassOf(A,A)), before finding the most specific type to instantiate. Instantiation will be of a class that would be associated with the node (I don't have that yet, but it would be trivial to add). Of course, multiple inheritance makes it that little bit harder.

The advantage of such an approach would be that extra classes in the structure would just require an OWL definition. Then they could be used in the RDF with no changes to the Java code.

This would apply to any kind of data structure that is expressed in RDF, not just my queries. Building a system like this would be similar to a UML modeling tool with runnable objects (something that I know some tools do). It's cheating a little to link it to a class name, but that is where the idea of storing a Java AST (Abstract Syntax Tree) in RDF would come in. In that way, the basic ontology of a set of classes could be written in OWL, with RDF annotations to describe the complete implementation of the classes. An RDF structure with rdf:types referring to these classes would describe an instance graph.

I really like this idea, and it shows some of the modeling power that comes with OWL and RDF. Unfortunately this would take a few weeks, and I don't have the time for it right now. Doing a Java AST representation in RDF would take a lot longer, though the compiler would be fun.

For the moment I'm sticking to what I know of the data structure, and almost ignoring the ontology. That's a shame, but it's letting me finish it in just a few days, instead of weeks. Fortunately the none of the instantiable classes in this system have descendents, so there won't be any potential confusion about the class to instantiate for any nodes in the constraint tree.

Eclipse
I spent Wednesday morning with Brad. He offered to show me how to get Kowari working with Eclipse. I had a go at this last year when we had the jars-inside-jars class loader in Kowari, and configuring this was very difficult, highly manual, and didn't really work well. It seems that the new flattened class arrangement works much better.

Brad was also able to show some of the useful tools included with Eclipse, such as refactoring, which certainly makes it seem quite compelling. I haven't made the transition just yet, as I haven't got CVS going yet. Eclipse does not seem to understand CVS directories that were created outside of it, so I'll need to get a fresh checkout, and port over my modified files. That might be a job for the weekend.

More importantly, it was very good to have a chat with someone who is using Kowari. It helps to keep focus on what I'm doing it for when I speak to people who actually interact with this stuff. In a similar way, I worked with Andrae this morning, and I appreciated hearing what another developer is doing.

Final Word
Oh, and I promised to make a comment about a "strange Luigi guy". If he's reading, then "Ciao".

Wednesday, June 01, 2005

Modal Logic
Luc got himself kicked out of day care yesterday. He was running a temperature, coughing etc. So I had to babysit while working today... slowing things down quite a bit. That means more work on the weekend, unfortunately. I hate to think what my sore throat tonight means!

Fortunately, my sister was able to watch Luc while I went in for the Logic group that's held on Wednesdays. Today we covered Modal Logic. I found it all interesting, right up to the point where we looked at the group of Kripke logics.

These logics are based on 5 axioms, labeled T, D, 4, B and 5. Each of these axioms describe specific types of relationships: Reflexive, serial, transitive, symmetric, and Euclidean. It was this last one that got my attention.

OWL has no mechanism for describing how relationships relate to one another (other than with inheritance). Consequently, it is impossible to describe an "uncle" relationship as being built from a "parent" and a "brother" relationship. This is often devolved to a rule language to perform. However, it is a valid and useful thing to describe in an ontology.

Euclidean relationships go some way to addressing this. They allow a description between entities A, B and C. If A relates to B and C in the same way, then B relates to C in some other way. That doesn't work for the "uncle" relationship, but it would work for describing siblings if they share a common parent.

I wonder if there is some way to incorporate this effectively into OWL?

Linking Blank Nodes
I finally worked out the best way to traverse these nodes.

The rules all need to sit in memory (even though the corpus of RDF data does not), so it is appropriate to build up the data in memory. I was concerned about the difficulty of this, particularly when it involved new queries for every branch on the tree, but I now realise that this is not needed. Instead, I have been able to get most of the information in a raw form with a simple query, and put it into HashMaps. Then I can walk my way through the data quite easily.

More importantly, this has the added advantage that almost all of the data comes from a single query. This means that there are no concerns about blank nodes comparing incorrectly, regardless of transactions. However, I've tried them from one query to the next, and all seems well. This was important, as I really needed to split the queries up into at least two anyway. My only other option was to union the results of two vastly different queries together, using the same variable names, along with predicates which are given variable names but set using <tucana:is>. Yuck.

That reminds me. No one commented on me wanting to change all the tucana references to kowari. That must mean that no one minds, huh? :-) Perhaps I should change the code to accept either for the time being, before finally dropping the tucana support.