Wednesday, July 07, 2004

Drools Rules
I now have a much better grasp of Drools, and feel confident enough to proceed using it. I'm still discovering some of the consequences of particular aspects of the system, so some of my ideas change from time to time. However, I've made a start implementing RDFS, and it seems to be going well... so far.

As I learned some of the specifics of Drools it occurred to me that the natural mapping for this stuff is if all of the statements of a graph were in the Drools working memory, and the rules were operating directly on the statements. This would mean that almost all the work could be done in the .drl description files. Unfortunately this has several problems. For a start, there's no way it would scale. The only way to go about it would be to rip out the working memory implementation in Drools and replace it with a Kowari/TKS graph. This would work, and actually has some merit, but it would also be a BIG job. Perhaps I'll consider it at a later date.

Another problem is that letting Drools loose on the statements would completely sidestep all the good work done to make the trans statements work so efficiently. This is a really bad idea.

At the end of the day, it may be worthwhile taking the algorithms of Drools and applying them directly to Kowari/TKS, but not to use the Drools framework. That way it could use the efficiencies of things like trans, but still do all the work in a rules based environment.

For now though, the rules have to be implemented differently. Each RDFS rule is being represented by a Java object. These objects will know the type of query to make, plus know the types of statements to generate based on the query. I'm planning on using a generic class for all of the rules, with the differences being defined in the constructors. Given that I've been able to express all of the RDFS rules in iTQL (except for rule XI), then this should not be too hard. It also lets us use such constructs as trans which will result in a massive reduction in triggered rules (this was the whole point of writing trans). In fact, a conversation with AN today has me thinking that many of the circular triggering of RDFS rules will not need to occur because we don't need to reiterate rules for anything commutative.

Since we want the rules to be in a configurable XML file, they will need to be instantiated and put into the working memory from inside the .drl file, and not in the Java code that invokes Drools. To do this they need to be inserted by a Bootstrap rule, whose only purpose is to initialise the working set. This Bootstrap rule will also pick up an initialisation object that the Java code will put into the Drools working memory for the purposes of providing access to iTQL and a Kowari/TKS database session.

Once going, each RDFS rule will be represented by a Drools rule. The first thing these rules will do is to pick up the Java object associated with the rule (confirming by name), plus any other rules which are triggered as needed. The Java object will then perform its query, and compare the returned size against a cached value. If there is a difference, then the condition for the rule will be met. Otherwise the rule won't have to run. The result from a query can only be the same size or larger than a previous result (since data is only ever added).

The consequence of a rule will be to tell the Java object for that rule to perform the required insert. It then tells the Drools working memory that the to-be-triggered Java object rules have all been updated. There is no need to touch any of these objects, as the results of their queries may already be different, and if not, then there was no need to execute the triggered rule, so its consequence will not be invoked.

This all seems to be coming together with one exception. The data to be inferred will go into a separate model from the base facts. If it is possible to infer a statement that appears in the base data, then it will end up in the inferred model as a tautology. I'll leave it this way to start with, just to get it working, but it needs to be addressed.

There are two ways to prevent these tautological statements. The first is to let them all go in, and then remove the intersection of the base data and the inferred data. This would not seem to be ideal, and if the intersection is large then a lot of statements will be redundantly added and removed, incurring a significant overhead. The other way is to test each statement for redundancy as it is going in. This means that I can't use an "INSERT INTO.... SELECT FROM" type of statement in iTQL. However, if I use some lower level interfaces it may not be all that expensive. At least it will only result in lots of lookups and no insert/removes. Kowari/TKS was specifically designed to look statements up very quickly, while modifications take much more time.

I'm still writing the XML, and I plan to have it finished early tomorrow. That then leaves me with the interesting task of building the RDFS Java objects. I'm putting off thinking about that too much until I can't avoid it. :-)

DVD Burners... again
I've had partial success with the new burner. I had a single disc burn correctly under Windows, and another not (it failed halfway through the verification). It turned out that I have 2 types of DVD-R. The first is a x4 burn, while the second is x1 only. I did both burns at x2, so it's no wonder that the second one failed. This had me feeling better. Under Windows at least, it looks like the drive works just fine. So far.

Back under Linux was a different story. I was able to blank a DVD-RW with dvdrecord, but I can't use dvdrecord or cdrecord-ProDVD to burn an image. Whenever I try, I get an error straight away. Perhaps I should re-enable scsi-ide and try it again (so I can say dev=0,1,0 instead of dev=ATAPI:0,0,0). I might also take a look for the latest in firmware to see if that helps.

It's for reasons like this that I'm thinking I'd like to go to Mac OSX. After all, who wants to stuff around with hardware configurations when you just need to back up some files?

No comments: