Friday, March 25, 2005

OWL and Cardinality
I've been giving thought to OWL cardinality in the open world model. My general rule of thumb for the open world assumption is that a model is valid if it is possible to add new statements which make everything consistent. In other words, I'm assuming that all of those statements could exist, but they just haven't been entered yet. A model can only be invalid if there is an apparent problem, and there are no possible statements which can fix it. For instance, the following model has an inconsistency which cannot be fixed with a new statement:

  <ns:A> <owl:differentFrom> <ns:B>
<ns:B> <owl:sameAs> <ns:A>

This has big ramifications for cardinality in OWL.

Consider owl:minCardinality. This restriction says that there must be a certain minimum number of times a predicate gets used with a subject. However, if there are not enough statements to meet this restriction, then it is always possible to add them later. Hence the open world assumption means that it is generally not possible to check owl:minCardinality for consistency.

The only way that owl:minCardinality might be violated is if there simply are not enough options available for a new statement to be legitimately added. For instance, the range of a predicate may be a class which was defined using owl:oneOf. However, I can't see how this could cause a problem unless the list of available objects was smaller than the minimum cardinality. That would certainly be inconsistent, but it would be inconsistent TBox information, rather than inconsistent ABox data. This means that the owl:minCardinality would not be violated, but the OWL would be inconsistent. A different error, but one I should still check for.

owl:maxCardinality is a more complex story. At first glance, the following might appear to be inconsistent:
  <owl:Class rdf:ID="myClass">
<owl:onProperty rdf:resource="#myProperty" />
<owl:maxCardinality rdf:datatype="&xsd;nonNegativeInteger">1</owl:maxCardinality>

<myClass rdf:about="#someObject">
<rdf:Description rdf:about="#firstPropertyObject"\>
<rdf:Description rdf:about="#secondPropertyObject"\>
This is only partial RDF, and I haven't validated it, but it should demonstrate a property being used twice, when it has a maximum cardinality of 1.

The difficulty in checking this for consistency is that an <owl:sameAs> statement can make it all valid:

<ns:firstPropertyObject> <owl:sameAs> <ns:secondPropertyObject>

It would seem that the only statements which can be validly checked for cardinality are those which refer to objects with explicit owl:differentFrom statements to differentiate them. However, it is only possible to work with a set of objects which completely differentiate themselves from each other. To confuse things, there can be more than one such set, with intersections between these sets. Each set has to be considered completely separately, and it is only those sets larger than that the cardinality constraint that will count.

I tried to post the iTQL query for this yesterday, but that was mostly to think the syntax through. It isn't all that difficult to find the owl:maxCardinality restrictions, and to conjoin this with the set of nodes found in owl:differentFrom statements. The trick is that the owl:differentFrom statements have to be selected both ways and joined to the restricted predicate in order to be counted. This needs a constraint like:
  $subject $predicate $node1
and $node1 <owl:differentFrom> $node2
and $subject $predicate $node2
Careful use of the count() subqueries should let me pull these sets apart, and find how often the predicate was used on that set.

I can't test this approach just yet, as I'll need to build some good test data first. That will take a little while, as I need to get through my current tasks first.


Andrew said...
This comment has been removed by a blog administrator.
Andrew said...

Cardinality in an open world has been brought to my attention before and more recently I've been reading how to extend/modify OWL especially with the unique name assumption (or lack thereof).

My concern has always been is that while the system may not have these assumptions it maybe a model that people using the system would have, searching around these issues seem to have been addressed in "OWL Flight":

"...intuitively, when the designer of the ontology models a minimal cardinality constraint, the designer expects that this constraint is violated if no property filler is known to exist, rather than a property filler being created during reasoning, which satisfies this constraint. In other words, in order to check minimal cardinality constraints, we apply a form of closed-world reasoning."

They call it the Local Closed-World Assumption. There's also another paper "OWL DL vs. OWL Flight"

Pellet was the only reasoner I found to support non-unique names (Racer for instance doesn't).

Andrew said...
This comment has been removed by a blog administrator.