Monday, June 06, 2005

Substantive Coding Done
I spent the last few days concentrating on coding rather than blogging, as I have been getting close to the end. I've now finished the substantive coding for the first iteration of the rules engine. That doesn't mean it's done, but I feel good about it anyway. :-)

What I mean by substantive effort is that the code is all written, and it compiles... but that's it. Except for iterative testing during the process of coding, I haven't even tried to run it yet. I plan on starting the debugging tomorrow. This always takes time, but just starting on the debugging stage makes me feel like I'm in the home stretch.

The current implementation is based on performing the equivalent of a "select", followed by an "insert/select" if needed. This is a lot like the original proof-of-concept code, only now it is fully integrated and is being done on the server side. This is not the ideal implementation, but it will work, and should demonstrate that everything is implemented correctly. It should work pretty quickly too.

I think I'm on schedule for completion of this round of the paid work, but a few sick days recently mean that I'll need to go beyond my scheduled finishing date. I believe I'll have everything written that I'm supposed to have written, but in reality I'm being paid for time, not for completed code. The other reason to keep working on this full time is that I'm enjoying it! However, I can only afford to go for the extra time needed to make up those days I was sick. Hopefully the system will be so useful and show so much promise for further development that someone will pay me to do more of it! (Well... I can dream, can't I?) :-)

Once I have it working, and RDFS is executing correctly, I can move on to the next stage. This version will count the size of each individual constraint, rather than the size of a completed Answer, making it much more efficient. More importantly, it will match the design I'm writing my thesis about! The initial code to do this should take less than a day, but I'll need to spend some time in the query engine to make sure that constraints are being cached and re-used correctly. I'm not sure if that time should be considered "coding" or "debugging", as I'll be using constraints in a manner slightly outside of their original design (which feels like re-designing and coding), but I'll be approaching any problems like I would any unexpected error (which is debugging).

Rules vs. Ontologies
A more pressing concern is the need to make transitive constraints accept a variable predicate. This is not needed for RDFS (since the only transitive predicate is rdfs:subClassOf) but it will be needed for OWL. Once OWL is introduced, the specific rule for transitivity of sub-classes can be dropped in lieu of a declaration in OWL that rdfs:subClassOf (and owl:subClassOf) is a transitive property.

This brings me to a point that I've been thinking about for a while. I sort of understood it before, but I think I've only just started to really get it. What does an ontology language give us that we don't get from rules? After all, ontology inferencing (and consistency checking, but I won't go there right now) is performed by rules. Rules also allow much greater flexibility than we can achieve with a ontology languages like OWL. The commonly cited example of OWL's limitations is that it can't express the "uncle" relationship. An "uncle" relationship is relatively straightforward. If person A has parent B, and person B has brother C, then A has an uncle C. This is easy to describe in rules, and impossible in OWL.

If we can do everything in rules, and OWL is limited, then why use OWL?

The answer (for me) is demonstrated with the transitivity of subClassOf. If we just had a rule system, then we would need to have a rule for inferences on this predicate. ie:

  if A rdfs:subClassOf B
and B rdfs:subClassOf C
then A rdfs:subClassOf C
That's fine, but what about "less than"? We need a new rule:
  if A < B
and B < C
then A < C
How about "greater than"? New rule. "Equal to"? New rule. Every time a new transitive predicate appears we need a new rule to handle it. This means that rules have to be very domain specific. They can't handle anything that wasn't known about at the time they were created.

However, using OWL a predicate can be declared to be an owl:TransitiveProperty. Suddenly we have just one rule:
  if property is transitive
and A property B
and B property C
then A property C
Whenever a new property is introduced which is transitive, then we can just declare it in OWL. Of course, this goes for all of the properties of properties that are definable in OWL. So the ability to describe the properties of a property means that we can write generalised rules to make deductions on them.

I've sort of understood this for a while, but it was only while thinking about Euclidean properties that it finally crystallised for me. Ideally, it would be possible to assert something like:
  parentOf isEuclideanTo siblingTo
So we could end up with a rule like:
  if property1 isEuclideanTo property2
and A property1 B
and A property1 C
then B property2 C
Of course, isEuclideanTo would not be symmetric, though it would be possible to infer backwards on it (a person's parent must be the same as their sibling's parent).

There are more complex types of relationships between relationships. Uncle is a good one to demonstrate this, as it requires 3 different types of relationship which are all related to each other. While possible, the RDF required to describe something like this is starting to get messy. Complexity is introduced when you realise that one of the relationships can actually be either "brother" or "brother-in-law". Also, an uncle relationship can be deduced, as can a nephew/niece, but the parent in the middle of the relationship cannot be.

All the same this kind of knowledge about properties is something we use every day. If an ontology is to describe real world objects and relationships then it will need to be capable of describing relationships between relationships.

With this in mind, I ask the OWL mailing list what people thought of such a construct. I half expected to be shouted down, but at least I'd get to find out why. Instead the response was encouraging. Ian Horrocks (who wrote half of the papers I cite) explained that property relationships are indeed useful, but that care must be taken to ensure they are decidable. He's suggested I read one of his papers on the topic.

Who knows? Maybe one day OWL can include something like this.

No comments: