Tuesday, August 17, 2004

Cardinality
Cardinality implementation is an elusive beast. I've been working on it this evening, as I didn't get as far as I wanted today. After discussing things with AN this morning, we came up with a good plan for implementation, but turning this into code has not been as simple as I'd hoped for.

To start with, AN has refactored a lot of the constraint resolution for Kowari. I don't have a major problem with that per se, but it invalidates a lot of what I knew about the way things were done. Consequently, I spent quite a bit of time finding code that I knew existed, but I didn't always know where to look for it since it was refactored into a new home.

Cardinality constraints have to be done after everything else has been resolved. To force these constraints to be executed last, I've set the size of the constraint's row count to Long.MAX_VALUE. AN suggested using the size of the graph plus one, but there is no graph for this kind of constraint, so I opted for a number that would be obviously larger than any other.

It may have been possible to tell the execution plan about the need for cardinality constraints to always come last, but I didn't like this idea. The code for the execution plan is clean and generic, so the last thing we want is for it to start making special cases for particular types of constraints. Conversely, setting the size of a constraint to a large number like this is really just subverting the whole execution plan anyway, so perhaps some special knowledge wouldn't be so bad after all. I'll leave it "as is" for the moment.

Essentially, cardinality resolution gets done in one of two ways:

  • If the variable of the cardinality constraint is not found in the tuples the constraint is being joined on, then the constraint will be applied to the entire tuples. First, the variable has to be associated with a column (this must be done with another joined constraint, otherwise the variable is completely unbound and will throw an exception). Once associated, that column can be searched for unique values, by joining on a singleton tuples with the variable associated with that column. The rows of the resulting tuples then get joined to the entire set one at a time. If the results are of the correct size, then they get appended to the final result.

  • If the variable is found in the tuples to which this constraint will be joined, then a join on that variable will provide a unique list of values. This list can be used one item at a time, to join against the original tuples, and if the sizes of the resulting tuples are correct then they will be appended to the result.
From what I can tell, I'll need to implement this whenever a ConstraintConjunction is executed. Execution of constraints happens inside ExecutionPlan but the work gets delegated to the TuplesOperations.join() method. This creates a problem, since this method is only aware of tuples, and not of the constraints they belong to.

As a result, I'm thinking I need a new type of Tuples which can represent the need to constrain by cardinality. However, it's late, so I'll sleep on it, and get AN's opinion in the morning (maybe even SR, as he is back now).

No comments: