Sunday, April 15, 2007

Distributed Resolver

TQL has always allowed multiple models to be mentioned in a query, either in an expression in the from clause, or on individual constraints, via an in clause. This lets you perform selections on expressions describing unions and intersections of models, with arbitrary complexity. However, all the models described must be on the same server.

After 2 years of procrastinating, I finally implemented a resolver for distributing queries. This means that models on other servers (on the same machine, or across the network) can also be accessed. It's not an optimal approach, but it works quite well.

I actually did something similar 2 and a half years ago, but this was for TKS and not Kowari. This means it could never be released as open source code, and I lost access to it once Tucana was sold. It was not particularly difficult to implement in the first place, so I knew I could do it again, but I really didn't want to. I kept putting it off, as it was demoralizing to have to do something from scratch that I'd already done before, even if I did forget the specifics. At least this time I think I did a better job (2.5 years of extra experience will do that for you). I also have the feeling that I may have implemented more Resolver methods than I needed to back then.

One thing that is missing is blank node resolution. At the moment, blank nodes from 2 separate servers may give the same temporary identifiers. This is a bug. The solution I want is to have blank nodes from another server to use extended string representations which include the server IP address, rather than the simple _:## format currently in use. This will require a blank node factory that returns the required type of blank node as needed. The factory will need to be given to the Answer as it is created, so that it can return the correct type of blank nodes as they arrive. I'll have to check more carefully, but I think this can happen in AnswerWrapperRemoteAnswer.

For the moment, the approach is naïve, and does not take into account how much data may try to move across the network. A more scalable approach will get done eventually, but this may be part of a commercial offering from my employer Fourth Codex. I'm expecting to have to add some of the required infrastructure to Mulgara, but keep the "secret sauce" in an optional external module. Of course, this will depend on my time and resources at work.

1 comment:

Anonymous said...

and here i was wondering exactly what was appealing about herzum. that explains it :)