Tuesday, June 01, 2004

Resolver SPI
Read more of this today, including the Lucene and XML resolvers. I found one amusing piece of in the XML resolver's next() method. It had obviously evolved from a very different method, and for a 30 line bit of code it was quite obfuscated. At first glance it contained at least 3 inappropriate control structures.

Both SR and AM had their names on it, so I asked AM. After laughing he offered a quick fix that didn't work. I tested it, watched it fail, then fixed it properly.

These things are no one's fault, as changing requirements and tight deadlines can lead to code being changed a little at a time, with no one getting the time to go back and refactor. However, it can be funny when you are asked to read it.

Now that I have a decent handle on this code I'll be writing a "Remote Resolver". This means taking the current remote session code, and wrapping it in the Resolver SPI. This means that remote sessions will look just like every other type of data store.

A significant aspect of this code is that it will be for TKS and not Kowari. That feels a little strange, as I haven't worked on purely TKS code for a little while.

I also fixed the JXUnit tests which had been failing on the transitive queries. Yes, it was due to the order of the returned data.

Inferencing
I described what had been going on with transitive statements for AN this afternoon. As I suspected, it was not quite what he'd had in mind for it, though on further reflection he says that what we have is still useful.

When trans() is used to describe a transitive predicate and two variables, then it will infer every possible statement with that predicate. This works exactly as it is expected.

The problem is when trans() is used with either the subject or object set to a fixed value. AN had planned for this to result in every possible inference from the graph that was reachable from the fixed value. To illustrate, consider the following example data:
  [a predicate b]
  [b predicate c]
  [c predicate d]
  [x predicate y]
  [y predicate z]


The list of possible inferred statements for a transitive predicate is:
  [a predicate c]
  [a predicate d]
  [b predicate d]
  [x predicate z]


The list of possible predicate reachable from a is:
  [a predicate c]
  [a predicate d]
  [b predicate d]


Note that x, y, and z are not reachable from a. The reason for having such a thing is purely for the sake of efficiency. If a graph is modified in a part known to be disconnected from other parts, then being able to restrict re-inferencing to only that part will be a major improvement in performance.

At the moment what we have is the set of inferred statements which has a subject (or object) of a. ie.
  [a predicate c]
  [a predicate d]


This is not entirely useless. For instance, if the predicate happened to be subClassOf, then this kind of statement would allow us to say, "give us all classes that a is a subclass of". This is needed for backward chaining, and once AN realised it he seemed a little mollified.

The reason I wrote anchored transitive queries to be like this and not what was originally desired was sort of accidental. The inferred statements which include a given node is a subset of all the inferred statements reachable from that node. I wrote code which could find the former, with the intent of extending it to find the latter as well.

Making sure I was on the right track, I ran an intermediate test. Once I saw the results coming out from this test I recalled that TJ had mentioned that he wanted something that returned this particular set of statements. With AN away, I went and asked TJ, and he agreed that this was exactly what he wanted and that I should leave it there.

I now realise that TJ has always been an advocate of generating inferences on the fly with the use of backward chaining. Hence his approval of this kind of function. Had AN been there to discuss it then I'd have realised that he and TJ were not after the same thing, and that the other type of function is also needed. I had a suspicion that this was the case, and so I've been keen to talk to AN about it. I'm glad I've done so now.

Ironically, AN had yet to consider backward chaining, how we would do it, and the syntax to use. By showing the current trans() statement with the modified syntax (I discussed this modification last week), he has discovered that we already have the tools for backward chaining, and that the syntax is already done for him. It might not be very pretty syntax, but it makes sense. Most importantly, it already works, and since it was a needed step to doing what he wanted all along, we haven't really wasted any time be doing it.

Once the distributed resolver is working I'll be getting back to implementing this.

JRDFmem
The graph is now serializable, which was the last major obstacle to usability. I also spent some time last night moving directories around, separating packages into separate jars, and cleaning the whole build file up a little. I've spoken to AN about it, and I think I'm on the right track to have it included with JRDF on Sourceforge soon.

While I want to move on I keep having ideas for improvements, and I keep wanting to add them in. I might do some of them, but I'll have to cut it off soon, or else I'll accomplish nothing.

Masters
Assoc. Prof. Bob Colomb from UQ has agreed to see me this week to discuss the possibility of doing my Masters. I was starting to get frustrated, as the only responses I've had so far have been from QUT, and on each occasion I was told that they didn't specialise in my area, and maybe I could talk to someone else. I received my latest response this evening from Dr Raymond Lau, which was encouraging though not helpful. He is going on leave, but he says that QUT is definately do research in the area, and I should have success if I keep asking. Oh well, Rome wasn't built in a day.

A Masters isn't essential for me, but I think it would be helpful for future work overseas. I've discovered that Bachelor degrees in some countries are really inadequate for professional work, and that candidates are not considered without Honours or Masters degrees, spending on location.

An example of this is Engineering in the UK vs. Australia. In the UK there is a 3 year degree which gets you exactly nowhere. The IEE only accept graduates who have also completed an extra year of Honours. In Australia the standard engineering degree is already 4 years. I'm on the Queensland committee for the IEE, and discussions with professors of 3 Queensland universities have revealed that the increasing scope of engineering and technology today has led to engineering courses packing in more content than most other degrees, and reducing the credit allotted so that students don't exceed the maximum study levels set by the universities' adminstrations. The result is that a standard Australian undergraduate engineering degree is deemed by the IEE to be at least the equivalent of an honours degree in England. Some even suggest putting it on a par with a Masters, though this is not an official position.

If a Masters can be considered the equivalent of the undergraduate degrees that I'm familiar with, then it is no wonder a Masters is considered mandatory in some places. So even if I meet the standard, without an appropriate piece of paper I may encounter difficulty in the future.

For anyone who doesn't know, I have a Bachelor of Computer Systems Engineering, and a Bachelor of Science (with Distinction), majoring in quantum physics. I started postgraduate Honours in physics at the beginning of 2002, but family circumstances led to me withdrawing. I hope to get back to it one day. To that end, having a Masters, even in a different discipline, may be of use to me.

No comments: