Monday, January 24, 2005

I'm sure Andrew will have blogged this years ago, but I just found Jess. This is a Java implementation of the Rete algorithm.

I still don't need Rete for Kowari yet, but I'm wondering if it will work well when considering ABox changes, as opposed to a full inference run like I'm trying to do. Rete is supposed to be good at that, but it makes all of these assumptions about having to process individual statements one at a time. Kowari is not like that, so I wonder if Rete is really the best solution, even for the case of deltas in the data.

I'm expecting to hear what's going on with work shortly. My ideal is work on OWL full time. However, I have to eat (more important, so do Anne and Luc) so I have to consider the options. Kowari/OWL will continue, but it would be a lot easier if I get paid to do it, rather than have to chew into my copious free time!

This leaves me really busy at the moment, making it difficult to work on any of this stuff. At least it has been a break from the usual, which I've appreciated.

In the mentime, I'm off to my brother's wedding in Maui this week. We really can't afford to go (think "extending the mortgage"), but he's not likely to get married again (I hope not anyway!), and besides, how many times do I get to go to Hawaii? :-)

The DSTC asked that I give a talk on Kowari for them soon. I'm looking forward to it, and dreading it at the same time. I'm really out of practice with this stuff. At least it will be good practice for my Masters confirmation.

Speaking of the Masters, my supervisor, Bob, works with the DSTC, so the odds are good that he will attend. This just gets better and better. :-}


Danny said...

The Jess license is a bit inconvenient :-(
But there's also the Drools kit, a Java-object extension of Rete, and the Maryland folks have just done Rete in Python "Pychinko" which may be portable (maybe hit it with Jython). Then there's the one in Jena, and Bossam...

Quoll said...

I agree about the Jess licence. However, when I saw another rules engine I thought it worthwhile blogging, just so I had it written down somewhere that I'd seen it.

I've already seen and used Drools. I was very disappointed with the implementation, especially given it's mechanism for identifying objects when running a rule. However, it introduced me to Rete, and it has taught me a lot about the direction in which I want to go.

As for Jena... it uses Rete, which is both good and bad. Rete is only useful for data which is being added and deleted one statement at a time (though it is very efficient at this). This actually suits Jena quite well, as all of its data is accessed one statement at a time. The big problem with this approach is that it only handles data which is coming in (or being removed). It does not handle inferences which are based both on new data and existing data. The only way around that is to drop the existing data and reload it along with the new data.

woolfel said...

I'm not convinced Jena2 is RETE. I've looked at their implementation and it seems to be incorrect. I did some searching and it appears Drools2 might have been the inspiration. Since drools2 was a simple forward chaining and doesn't perform or scale like strict RETE (aka forgy's original RETE algorithm), it's likely Jena2 made the same mistake. Looking at the design of the node classes in Jena2 it looks like someone mis-interpreted Drools2 and took another step away from Forgy's RETE. The new Drools3 now finishes Manners 64 within 10 seconds and should be faster than CLIPS.

I don't understand the statement that RETE only handles data that is being asserted/retracted. All RETE does is define how to compile rules in a optimal way and how the compiled rule executes at runtime. Whether the data is new or existing has nothing to do with RETE. It's the result of each engine's implementation choices. The common approach for solving this type of problem is to do both forward and backward chaining. The way it works is that new data is asserted and old data residing in a database is queried in a backward chaining fashion. This approach is commonly known as bi-directional chaining.

OMG is currently working on Production RuleML, which will have the ability to tell the rule engine if a given object pattern should use forward or backward chaining. It's my tiny contribution to the PR RuleML spec, which is slated for public review summer 2006.

peter lin

Quoll said...

My understanding of inferencing in Jena comes from Brian McBride when he visited Brisbane at the end of 2004. From memory (it was a while ago now), he said that incoming data was handled with RETE, but operating against a full set of data was done with a Tableaux algorithm.

I've never actually looked at the Jena code to see what they're doing. I've just relied on descriptions from others.

As for RETE, it's based on keeping memory in each node, and using these memory tables to test if data should be processed by downstream nodes. So as data goes into the system over time, these tables build up, meaning that processing of future data is lessened. By the end, very piece of data is receiving the absolute minimal processing to put it through the system. This is one reason why RETE is good at processing data as it streams in (being asserted/retracted).

One thing that RETE doesn't do, is look at more than a single statement at a time (though there is some room for concurrency through pipelining). So while each statement gets minimal processing, each statement is looked at consecutively. The approach that I'm taking is to take all statements in a store that match particular patterns, conjoin them with statements meeting other patterns, and the results make up the inferences. Finding all statements that match a pattern simply requires a pair of log(n) lookups, no matter how large the list is, so the statements aren't being processed individually, but in bulk.

This technique won't work with data that is streaming in, as it requires all the data to be available up front, and it has to be fully sorted/indexed. However, for Kowari it is faster to insert (an operation which includes indexing) everything and then process the statements in bulk than it is to process the statements on the way in, and then insert them (along with the inferred statements). The idea is to leverage what Kowari already provides.

On the other hand, a small number of insertions would work poorly with the "bulk processing" approach, as all the rules would be run, instead of just the ones associated with the changes (assuming that these changes are limited in scope across the system). In this case, a more RETE-like approach would be much better.

I have to check it out a bit more, but KAON seems to be taking the same approach. It's a little disappointing that they got theirs out before I did (it would have been nice to be original), but at the same time it validates what I'm doing. :-)