Wednesday, September 29, 2004

Documentation
The majority of today was spent on documentation. This started with some updates to the docs for the "having" clause, which I wrote last night, and moved onto the "node type" models. I tried to structure this similarly to the documentation for the XSD data type models, though I provided more examples.

Once I'd finished with this, I moved onto working on the example data AN needs for the demonstration iTQL used to implement OWL-Lite rules. He had hoped to use the Wine ontology, but it turned out to have a number of OWL Full features, which made it incompatible with the queries he wanted to perform.

Initially I suggested removing the statements which are incompatible with OWL-Lite, but the size of the file made me reconsider this. So now I'm looking at fleshing out the Camera ontology to use a few more features.

Differences
The whole morning was spent looking at the iTQL required to make owl:allValuesFrom work. It didn't take long before discovering that there was a problem in our ability to perform the query.

Using iTQL it is quite easy to find a predicate that has a restriction placed on it, and in the case of owl:allValuesFrom, the class of the range for that predicate is also easy to pick up. Passing these predicates and the associated range into a subclass, it is quite straightforward to find those statements which use that predicate, and have objects which are instances of the range class. With an "exclude" operation (as of today, "not" is now named "exclude" to make the semantic of this operation on constraints much clearer) it is a simple matter to find usages of the predicate with objects which declare a type outside of the given range.

This sounds fine, until you realise that some objects declare that they have more than a single type. A predicate might be used on an instance of the appropriate class, but if this is also an instance of another class, then our query will pick it up as being of the incorrect class.

AN and I spent hours trying to find some way of taking the list of instances of classes which are of the wrong type, and removing those objects which are also instances of the correct type. Every time I thought of a different way of approaching the problem, I always ended up with a set of data from which I needed to extract some other data, and yet there seemed to be no way to do that.

In the end, AN believed that he found a way of doing it, using "exclude" and 3 levels of subqueries. Unfortunately, it exploits a corner case for "exclude" which has not been properly implemented, so he is going to work on coding that tomorrow. However, the partial query he already has, contained enough information in it to make him believe that he has it right. I'll confess that I didn't follow it properly when he showed it to me, so I'm going to take his word for it until I see it in action.

AN claims that he and AM determined that it is possible to get any type of result using the "and" and "or" operators in conjunction with "exclude" and subqueries. I'm not so sure about this, but even if it is true, the complexity of the subqueries in order to obtain some results is ridiculous.

Instead of this I believe that we need a "difference" operator. This would work internally in a manner similar to the "or" inner join operator. It would match the two constraints' variables, and wherever there is a match of rows in both constraints, those rows in the first constraint would be removed. This would allow numerous operations which we have not been able to do in the past, including owl:allValuesFrom this morning.

Some years ago I thought that we might need an operation like this, but somehow we have always managed to avoid it. Once I discussed this with others I discovered that several others have had similar thoughts. SR had once proposed such an operation, and DM had also come up with a join operation which he called "and-not". In each case the idea was to perform an operation just as I've described here. It looks like it might be time to finally implement it. Fortunately SR looked at how it would be accomplished, and he thinks that it will be very similar to constraint conjunctions, and just as efficient.

owl:Restriction
I finally discovered the difference between owl:allValuesFrom and rdfs:range, and of course, now I feel really silly.

The mistake I made was that I thought that the restriction was applied to the predicate globally. I have indeed seen that owl:Restrictions are used by having owl:Class types inherit from them, but I had conveniently forgotten this. When applied correctly, the restriction works on a particular class.

The application of this is that a restriction on a predicate is only applicable when that predicate is being used on a subject of the type that subclasses the restriction. Any types which do not inherit from the restriction can use that predicate with impunity. This is different to rdfs:range, which declares the object type for a predicate no matter what the type of the subject is.

An example could be a "ns:hasParent" property. If this property is applied to a subject of type "ns:Human" then I want to ensure that the object is also of type "ns:Human". However, I don't want to declare the range of this predicate to be ns:Human, because when applied to subjects of type ns:Dog then the object will also need to be of type ns:Dog.

Actually, this demonstrates some of the distinct advantages that OWL has over simple systems like RDFS. It was once I realised this, that I then realised that an owl:allValuesFrom restriction based on a union of classes could start to get complex to enforce. Fortunately, unions are an OWL DL construct, but I will definitely be looking at this during the course of my research.

Just looking at the syntax now, I've realised that it is possible to have a restriction of the following form:

<owl:Class rdf:ID="Riesling">

<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="#hasFlavor" />
<owl:allValuesFrom>
<owl:unionOf rdf:parseType="Collection">
<owl:Class rdf:about="#Sweet" />
<owl:Class rdf:about="#Dry" />
</owl:unionOf>
</owl:allValuesFrom>
</owl:Restriction>
</rdfs:subClassOf>
</owl:Class>
This would make the owl:allValuesFrom iTQL much more complex. In fact, since things like OWL descriptions are recursively defined in terms of class IDs, restrictions, unions, intersections, complements and one-ofs, then we probably need to find a more general way of representing a description in iTQL. This will be needed if we ever hope to implement OWL DL, since a "description" in a restriction can be arbitrarily complex, and iTQL as it stands will need to be just as complex to traverse the entire length of the description.

Tonight I'm probably going to have nightmares about restrictions which are owl:allValuesFrom of an owl:unionOf of classes, one of which is another owl:Restriction with a owl:someValuesFrom on an owl:interectionOf of an owl:complementOf..... and so on.

Now that I think about it, we can't even contemplate OWL DL until we've worked out how to traverse and represent descriptions in iTQL. And it's called "DL" because it's supposed to be computable! :-)

I think I need to learn more Prolog.

No comments: