Tuesday, January 18, 2005

The Deafening Silence
No, I haven't died or gone away. I've just been really, really busy. I want a job again just so I can slow down! :-)

To those of you who took the time to write to me (for one reason or another), thanks for the support. It's all a bit frustrating, even when you know that it has nothing to do with you.

The last few weeks I've been doing "Daddy Daycare" while Anne has been back at work. It's actually a lot of fun, but I don't get all that much done with Luc around. Fortunately Luc has started at daycare this week, so I've started to catch up on what I'm trying to do. Unfortunately, we were struggling before Luc started at daycare. Now that he's there we are rapidly going backwards again. Hopefully I'll have work again soon.

For a start, I'm writing lots of job applications. Ideally, I want to continue the OWL work (more on that shortly), but if I can't organise something that can pay me soon, then I'll have to take something more immediate that will take me away from it. Hopefully not, but I have to keep my options open.

Ideally, I'll continue on OWL development for Kowari until I have a working, scalable implementation. This should also get me to a point of submitting my Masters thesis. Once I have accomplished both of those, I'd like to use whatever contacts I've made to try to find work in America. I have only done very short working trips overseas before, and I want to get the experience that only a few years abroad can provide.

I'm going to have to take the time to write up a lot of technical details on what I've learnt recently, but for the time being I'll try give an overview.

I'm definitely continuing OWL inferencing work. That is the basis for the Masters, so it is important to me to keep it up. How much time I can put into keeping it up will be determined by the work I find, but I'll be doing it nonetheless.

The Rete algorithm seems to be the best approach I can take to the rules system. I was wondering how far it would get me after having used Drools for RDFS, but I had a conversation at the W3C day of the Evolve conference which made me look at it more carefully. Consequently I found the original paper by Charles Forgy, and the contents were quite enlightening.

For a start, Rete is for managing changing data. That's very interesting for adding and removing data, but it means that it isn't useful for doing a full set of inferences over an entire system. My plan has been to do the full set of inferences, with change management to come later, so Rete would appear to be of little interest to me at this point.

However, a closer analysis of what Rete is trying to do shows that it can offer Kowari something after all. For a start, it makes the assumption that finding and processing data means iterating over all of the data. This is normally the case, but it doesn't apply to Kowari. Because each index is an instance of the data, and because we index in every possible direction, we can find all the data which matches a constraint with a pair of binary searches. We can also count the size of the result quickly and easily by stepping over the blocks in the AVL tree. This means that there is very little cost to finding all the data again.

Rete is only supposed to be useful for changing data because it avoids finding all of the data for each node again. However, if all the data on each node can be retrieved on each iteration, then the algorithm would also be useful for performing the rules on a complete data set. Using semi-naƮve evaluation would also permit the nodes with unchanged results to be avoided most of the time. The result is that a hybrid of the Rete algorithm may allow Kowari to scalably perform all the rules on the entire data set.

I've also worked out that the nodes in this Rete hybrid are not the rules at all. Instead, they are generally the constraints which make up individual queries. The efficiencies of the Rete algorithm mean that any shared nodes will only execute their work once. This will have real benefits to the execution of the full queries within the rules framework, though it will need some re-working of the query code, particularly in terms of caching.

Having a scalable solution operating on the entire dataset will certainly be a first step. It will be inefficient for minor modifications on large systems, but it will still work. Once this is going then, I can move onto the "change" problem. At this stage it looks like a standard Rete graph will perform the rules in these circumstances quite well. It rules data will look similar to the hybrid, but the execution would differ.

Anyway, it's really late now and I'm barely coherent. I'll put this away for the moment and come back to the description when I can actually look at the screen without my eyes closing involuntarily. :-)

1 comment:

Rob said...

Hey Paul. A lady from diversiti is "urgently seeking" J2EE developers for positions at Suncorp. I told her I knew a some highly skilled Developers looking for work. If you like I can pass your details on to her.