Friday, January 05, 2007

Mulgara Progress

Mulgara is going ahead nicely at the moment, thanks mostly to Andrae. He has implemented some major changes and bug fixes which were sorely needed. My only problem with any of this has been that my main support has been administrative, rather than technical.

Once upon a time I was writing TKS/Kowari/Mulgara code every day. It was very challenging and very satisfying. I also found that the whole process of blogging was useful to help keep my thoughts focussed, and keep other people in touch with what I was up to.

These days I'm involved in a lot of other people's code, and I can't blog about at all. It also keeps me away from coding, which is very frustrating, as I don't get that rush of implementing something cool, and seeing it all the way through to completion. That leaves me to do the interesting work at night, but after a long day at work, and time with family, I don't have the motivation to put in long hours of coding. Of course, two young boys mean that my weekends are packed as well.

Any remaining time does go to Mulgara, but the less frequently I work on it, the more I have to re-learn the context of where I was the "last time I was here". Just yesterday someone sent me an email, where they included a message they received 2 months ago. The email was coherent, and apparently written by someone who really knew the issues. I found myself agreeing with a lot of it and thinking, "this guy knows more than I do". Unfortunately (or fortunately?) I soon realized that the original author was myself.

It can be difficult to make a lot of progress when you have to bring yourself back up to speed every time you look at something.

I'm not sure what to do about this lack of time for non-work activities, but I'll have to resolve it soon. I'm supposed to start back at university shortly, and without any free time available I won't be writing a thesis! That's particularly frustrating, since I've been making all sorts of progress in processing OWL recently.

RLog

So what have I been doing in Mulgara/OWL?

Some time ago I realized that I needed to get an OWL processor out there, even if it were only partly complete. Hey, I can call it OWL-Lite if I want to! The specification explicitly says that there is no list of requirements to fulfill, so I'm covered. Fortunately, there are a lot of simple OWL inferences that I can perform already, courtesy of Raphael Volz. A number of others also come to mind, given some of the operations in iTQL, so this looked like a good first step.

The problem is that most of this work is easy to code using description logic (as Raphael has done in the paper I mentioned), but converting it into the RDF format I've created for the Krule engine is tedious and prone to bugs.

I found myself wishing that someone would write a programs that would parse the description logic, and convert it into Krule syntax. Even better, the dependencies could be automatically generated, based on the the format of the output of one rule, and the input to another. This would avoid many of the bugs I could potentially introduce through manual transposition, and make all future rules MUCH easier to encode.

So I'm motivated by laziness (who wants to manually encode all those OWL/RDFS rules and axioms?), impatience (why should I have to find all those dependencies, when computers are better at that kind of thing?), and hubris (hey, I could write a program to do it for me, and it would be cooler than every other logic interpreter out there because it works on my database).

So it looks like I want to re-write Prolog (yes, I've even looked at an interruption function in the algorithm to permit "cut" operations, à la Prolog). That takes hubris to a whole new level for me. But the funny thing is that it would be so easy. The hard work is in the rule engine, and I've already written that!

The language isn't really Prolog, or even the simpler logic languages, like Datalog. The main difference is in the allowable syntax. For a start, I want to allow domains on the predicates (so I can write owl:sameAs(x,y), for instance). I also want to use lower case letters for variables (instead of the upper case letters required by other logic processors). But in essence it's all the same.

So it's a type of logic, and it's based on RDF. Without anything better, I decided to call it RLog (which I pronounce Arr - Log). Who knows, if I finish it, it might even take off. (OK, OK, I don't have that much hubris).

It's funny that I'm looking to implement this. A couple of years ago I realized that Kowari could form the basis of a Prolog engine, though I knew that there was a lot of coding needed before we could do it. Now I'm on the verge of having written it, and I never really intended that.

Where Am I? How Did I Get Here? And What Am I Doing In This Handbasket?

I started all of this some time ago, but for the reasons I've mentioned above, I still have some way to go.

I started out by encoding my first set of OWL rules in RLog. The language didn't need much definition, just standard predicate logic, along with variables, domained predicates, comments, and so on. So then I converted all the RDFS rules, and added in the OWL rules that I want for the first cut. This showed up some interesting cases that I had to consider.

RLog tricks

The first thing I found was in rule 4b on RDFS. Strictly speaking, it should be:
  rdfs:Resource(u) :- a(x,u).
Meaning that for any triple <x a u> the element labeled "u" is a resource (or rdfs:Resource). That's what the RDF documents say anyway. However, this violates RDF, since u might be a literal. Literals are unable to have types like this, since it requires that they be the subject in a statements, which is illegal in RDF.

The way I've been getting around this in Krule so far has been to use a "type" model in Mulgara. The result removed all resources which were literals (via the "minus" operator). I decided that the best way to achieve this in RLog is with the following:
  rdfs:Resource(u) :- a(x,u), ~rdfs:Literal(u).
So now RLog has negation. Wonder how I'll make that work in the general case?

The next issue is a little trickier. Rule XI (Why Roman numerals? I never worked that out, except that this rule is weird) states that if a predicate URI starts with the RDF domain and an underscore, then it is a ContainerMembershipProperty.
  eg. http://www.w3.org/1999/02/22-rdf-syntax-ns#_1
This isn't quite right, as anything after the underscore is supposed to be a number, but it's not too bad. Sesame seems to get away with this rule.

Since Mulgara is using "magic" predicates for certain operations, it made sense to extend this into RLog:
  rdfs:ContainerMembershipProperty(i) :- i(x,y), mulgaraprefix(i,"&rdf;_").

Fortunately, none of the OWL rules needed anything fancy, though I suspect I may need to when I get deeper into subsumption.

Beaver

I think that I've already discussed the various compiler-compiler options I looked at. To recap, I chose Beaver as an LALR parser, and JFlex as the lexer.

JFlex is under the GPL, which would normally be bad (since Mulgara is OSL), but fortunately they state that any code generated by JFlex can be under any license you like. I don't plan on changing or extending JFlex, so that's fine (I'd be more than happy to contribute back any changes I made).

The incompatibility of the OSL and the GPL means that I can't distribute JFlex to generate the lexing code, but I have two other options. The first is that I just provide the lexing code, and the original .lex file. That would let anyone use the code just fine, since the lexer can just be taken "as is". A developer would only need to download JFlex for themselves if they needed to change the lexing rules.

The other option is to add an Ant task to download JFlex if needed. This seems to be the better, option, but I haven't done it yet.

As far as using Beaver/Flex is concerned, I've finished the parser, and can easily print out the Abstract Syntax Tree (AST). Now I need to write the code that walks over the tree, finds the dependencies, and converts the AST into Krule.

None of it seems to hard, but it's a matter of finding time. Meanwhile, the boys woke up a short time ago, so I'd better go and play "Daddy" for a while...

No comments: