Saturday, March 12, 2005

Unification
It's overdue, but I should write a short post about Friday. The idea was to take this day off to get over the bug, but I spent a bit of time looking at TuplesOperations.join anyway. In particular, I was fortunate enough to have Andrae explain what "unification" meant.

The term comes from Prolog interpreters, and is used to simplify certain joins. If one of the constraints in a join returns just one row, then it can be the subject of unification. In this case, a constraint which contains some of the variables from the constraint with the single row solution can be modified. The constraint can have its variables set to this pre-calculated value, and looked up again.

As an example, consider two constraints A and B, where A looks like:
(<ns:foo> <ns:bar> $x)
and constraint B looks like:
($x <ns:pred> $y)

Now if A were to result in just one line:
<ns:foo> <ns:bar> <ns:obj>
Then B can be modified to look up:
(<ns:obj> <ns:pred> $y)

The result is that A can be dropped, and B can be re-evaluated (trivial), rather than having to go through the more expensive operation of joining A to B.

The only trick here is that the first column of B must still be labeled "$x". The label is needed for any subsequent joins, and for the select clause to get to the value. The iTQL way of setting a variable to a fixed value is with the magical predicate kowari:is:
($x <kowari:is> <ns:obj>)

So the equivalent iTQL for this is:
($x <kowari:is> <ns:obj>) AND
($x <ns:pred> $y)


Syntax
This dredges up some issues I have with some of the syntax in iTQL. I might just mention them here, so I have a reference point in future.

The kowari:is syntax is a "neat" way of doing it, in that it fits iTQL trivially, but it is a syntax I dislike. The reason here is that it isn't a "constraint" in the usual sense, but actually changes other constraints in the query. I'd have preferred a syntax which makes it easier for a user to understand what they are doing, rather than fitting in with the pre-existing syntax. The kowari:is syntax is also painful to work with in the query layer, as it gets parsed as a constraint, and gets put into the constraint expression as in individual entity. This means that these magical predicates have to be found, and their effect passed on to all the other constraints that they are relevant for. I don't think this algorithm is particularly elegant.

There are several other approaches which could have been tried. The first to come to mind is simply:
($x=<ns:obj> <ns:pred> $y)

This has the advantage of not introducing an AND when no join operations are going to occur. It can be argued that this is not a consideration: after all, the unification optimisation can result in a join being dropped when an AND is present. However, that is an implementation optimisation, and the join operation can occur, depending on the data. The kowari:is predicate on the other hand cannot result in a join.

On the other hand, this syntax has the disadvantage of getting lost in all the other constraints which include the variable $x. It would be redundant to set the variable for each constraint, but if you don't then this syntax would have one constraint modifying all of the others: a situation I would prefer to avoid.

So this kind of syntax would simplify some queries, but would possibly be confusing for others. It may be possible to create something that is distinct from a constraint which applies to all of the constraint, but then the kowari:is constraint almost does that anyway (if you're willing to accept that it is not a normal constraint). There are always tradeoffs. I won't be changing it any time soon, but I certainly won't be averse to having a go if someone comes up with a better idea.

1 comment:

Andrew said...

I'm glad someone else is finally seeing the light with tucana:is. The real problem I had was from a coding perspective is that it smells - it's a special case in a number of places in the code.

I remember being told that tucana:is could be considered as being "constraining the graph with all values in it". I think this is a bit of a hack - and doesn't really match what the other constraints do. They either implicitly constrain what is in the FROM clause or explicitly constraint what in the IN.

One way to save tucana:is would be for it to operate in this manor - constraining the FROM or IN. Then it is a normal constraint and can be joined.

Also when you talk about adding assignment the other suggestion I had apart from doing it within the WHERE is to do it before the SELECT. Similar to the PREFIX clause in SPARQL but it assigns variables.

I also had some realisation about the different SPARQL operations (CONSTRUCT for example) and how we could do that with assign clause - I think that would be very interesting.