Wednesday, August 18, 2004

Syntax
Today was spent doing very little coding, which I found quite frustrating. TJ won't be happy, but there was nothing I could do about it.

The constraint implementation described yesterday needed to create a new type of Tuples object so that it perform the correct type of transformation when joined against. This didn't seem quite right, and since the appropriate code belongs to others, I took the time to ask their advice. Of course, I ended up with 4 people offering 5 opinions. So there ended up being a lot of discussion before we could reach consensus.

After considering it for some time I realised that cardinality needs 4 things:

  1. A constraint expression.
  2. A list of variables (usually just one) from the constraint expression.
  3. The type of integer comparison to perform ( > < = ).
  4. The scalar integer value to compare against.
The list of variables provides the grouping of item 1 needed for the counts. I couldn't come up with a use-case which would ever need more than one variable in this list, but it's normally best not to place artificial restrictions when they are not needed. Besides, as soon as we decide not to do something like this, we invariably get an email requesting that very feature, normally within 6 months.

Since item 1 is a tuples, and the output is a tuples, then this can really be described as a parameterized mapping from tuples to tuples. Hence it makes sense to consider this as a function (yes, I've been reading more from the text on Predicate Logic). The type of comparison found in item 3 would probably be best described by declaring 3 different functions, which in turn reduces the required parameters.

No one seemed to dispute any of this, but we could not get consensus on the syntax to express any of it. AN liked the idea of using a pure function notation, pointing out that we mostly do that already, particularly with trans() and walk(). AN preferred a function-like syntax, with all the parameters appearing as 3-element statements, separated by "and" keywords. This is exactly what trans() and walk() do. SR came up with a syntax that was much more wordy, making it look like a statement. The first element of this statement was the integer from part 4, the second element was a predicate describing the scalar comparison from part 3 and the third element was like a subquery. Item 2 was the "select" clause of this subquery, and item 1 was the "where" clause.

After spending many hours getting consensus on the semantics of this, I was getting frustrated at the time taken to work out the syntax. I was wondering if it might be possible to implement this function without finalizing the syntax, however I was reluctant to proceed with this as a poor choice in the realization of some of the required items could make the parsing of the syntax into these items much harder than it would need to be.

Finally TJ entered the fray, and asked about using subqueries and count(). To start with this didn't appear to be useful to us, as it only provides a scalar integer as a result, but then TJ showed how it could be left in the results (with its implicit variable name of $k0) and then those lines which didn't match on $k0 could be dumped. After investigating this, it appears to work, and will only require a minimal change.

The required change is the introduction of a "having" clause which can use a constraint with a predicate of occurs, occursLessThan or occursMoreThan. This will be applied as a kind of filter at the last moment.

The advantage of doing things this way is that it involves a minimal change, and should be reasonably fast to implement. The problem with this method is that the result must come with the $k0 variable built in, so it couldn't be used for doing insertions. However, I don't think this kind of query needs that functionality, so it might be OK. Another problem is that the syntax is so incredibly obtuse, that I barely remember it. Tracing through how it was going to work, it took me 5 minutes to see that it would do what we want. Even now I'm not sure I can remember it. For this reason alone I'm concerned about implementing the code in this way, but cardinality is already late, and I really need something done before I go on holidays at the end of this week.

I suppose the best outcome here would be to get it working, but in my absence for someone to discover a problem, and on my return I could do it in the original way... once someone has worked out a syntax!

Inferencing
In the meantime, I've been getting the RDFS inferencing paths working again. Unfortunately I've been using my Mac, which has 2 problems. First, it already has something bound on port 8080, and the http server uses this. Strangely, I couldn't easily figure out what is already using that port, as netstat does not report anything using it. I should just point my browser at it and see if I get a response.

The second problem was when I moved the server over to my desktop machine. It is names "chaos", and I have an entry for it in /etc/hosts. Unfortunately, Java is failing to find a server named "chaos" and so iTQL falls back to 127.0.0.1, which is never going to work, even if the server is on the same machine as the client.

I ended up running both the iTQL client and the server on my desktop machine (with the iTQL GIU coming to be via X), which worked fine. However, running the inferencing rules had the same addressing problems, meaning I would need to move all of the Drools code up there as well. I took that as my cue to leave it until morning.

No comments: