Monday, March 07, 2005

Today was a bit of consolidation, going over the RDF I've already built, and tweaking it a little. I also worked on the Java/iTQL code, though I'm still trying to do more with iTQL, and less with Java.

One question at the moment, is whether I should create a recursive Java function to go down the Constraint trees, or if I should use the iTQL walk function. I'm loathe to execute queries indiscriminately, but since this part of the system does not necessarily need to scale then it may not hurt. There is only a little to be gained by trying to expand queries in iTQL, but since I'm considering scalability at every other level it seems strange not to do it here.

While I've experimented with some use of walk I'll probably just continue with the Java code for now.

My main problem is actually the lack of a difference operator. For instance, when I select all of the queries' selection variables I would start by finding all the properties of the selection clause:

select $rule $select_clause $predicate $object
from <rmi://localhost/server1#rules>
where $rule <rdf:type> <krule:Rule>
and $rule <krule:hasQuery> $query
and $query <krule:selectionVariables> $select_clause
and $select_clause $predicate $object;
For rule rdfs1 this returns:
[ krule:rdfs1, _node99, rdf:type, rdf:Seq ]
[ krule:rdfs1, _node99, rdf:_1, krule:var_a ]
[ krule:rdfs1, _node99, rdf:_2, krule:ref_type ]
[ krule:rdfs1, _node99, rdf:_3, krule:ref_property ]
Note that while this is an actual result, I've abbreviated the domains. Hmmm, that might be a useful feature to automate, at least for the iTQL shell. I'll have to think about implementing that.

To get the selection elements, all that is needed are the nodes referenced by the sequence predicates. In this case there are two solutions. The first is to create a resolver that will let me just select those predicates. I expect to implement that code shortly. The other option is to just remove the rdf:type statement using a difference operation. That wouldn't work for every circumstance, but it will here, since the sequence will have no properties other than rdf:type and rdf:_#. It is also essential for certain other operations, so this would seem to be a reasonable trigger for me to implement it now.

A difference operation can be achieved with a subquery, but at that point all operations become tuple-at-a-time. This is provably less efficient. If I had to resort to a tuple-at-a-time solution, then I'd be better off saving time by putting a logic engine into Kowari and be done with it.

The RDF for the rules engine can be queried without scalability, but since OWL needs this as well, I really do need to get it implemented. A compelling, though less vital reason is the fact that the subquery syntax is very obtuse.

Like AND and OR, the difference operator will need an infix notation. For that reason I shouldn't use a word like "difference". Saying "A difference B" just doesn't sound right. Another alternative that was proposed by DavidM was NAND as it is a little like a constraint conjunction with the inverse set of the second operand. However, I don't like this, as NAND is usually used to describe a commutative operation, while the difference is noncommutative.

For these reasons I'm thinking of just using the word "minus". That way it will read quite naturally.

After some of the previous debate, I'm almost expecting opposition to my implementation of this. Some people seemed to have a problem with the idea when it was first proposed, as I think that they believed it was some kind of hack. This was why I decided to formalise the operation algebraically back in October. Even with an acceptance of the need for the operation, I just know that there will be a problem with the syntax.

Fortunately I'm now working "independently" on an open source project. If no one else wants to have a go with this implementation, then I get to decide how it is done. :-)

If someone else decided that they really hated my implementation, then they can always rip it out of Kowari, and leave me alone to run my patch. I doubt that would happen though, because the functionality is needed. OWL won't work without it, and I doubt anyone will intentionally break something just because they disagree with syntax. There is also no harm in leaving something (useful) that other people don't use. They won't be using it if they don't like it. The biggest change I can see would be if someone thought they had a better way, and then implemented that... but that would take someone willing to make the effort, and no one else has expressed an interest in that yet.

Anyway, I've been pulling out the code to ConstraintConjunction and perusing that, with the idea of using it as my template for a ConstraintDifference class. At the lowest level, I believe that it is simply a ConstraintConjunction with an inversion on the σ operator (selection). There are some extra efficiencies which can be gained, but I think they are almost the same as for an inner join, ie. a ConstraintConjunction. I'll have a go tomorrow and see how it goes.

These things always take longer than you expect, so I may well end up working on this all the way up until next week. Even if I don't, I'll still need to spend a day or two writing a complete set of tests for it.

I didn't get to write any of my confirmation of the weekend, due to family commitments. I'm about to start on that now. Hopefully I'll have something to show for it by morning.

1 comment:

Andrew said...

I think, when it comes down to it, some of this stuff is better explained in code and a working implementation.

Write some tests, make it small and as decoupled as possible and get on with the rest of it.

When Andrae was talking about refactoring some of Kowari I didn't really know what he was going to do until I saw it and then it didn't matter :-) Actually, then I understood.