Thursday, April 29, 2004

Back at work, but with a head full of cotton wool, so don't try and read too much into what I write here today... I'll probably refute it all tomorrow.

Early part of the morning was spent dealing with email and spam. It's about time I changed my IEEE address as it's starting to get out of hand, even with filters. Unfortunately I used that address for registering with MSDN for work last year, and within 48 hours I'd gone from about 2 spams a week up to nearly a hundred a day. Great security at MS.

Back to reading more of the Jena implementation of its Ontology API. I think I have a grasp of it all now. Both the general structure, and seeing it run in the debugger left me unimpressed. The question is, can we re-implement Jena as an API for TKS/Kowari? Should we? This was a topic of discussion with TJ this afternoon.

The current implementation just provides a mapping between Jena and a Kowari datastore. It leaves Jena to do everything it would normally do, and only replaces the datastore with a Kowari one. This means that all of Jena's inefficiencies are still there, and it needs to use a map of Jena nodes to the URIs used internally by Kowari and another map back again. (This was the code I've had to debug since Easter). These maps mean that the system doesn't scale for size. The reliance on Jena code means that the system also doesn't scale for speed. It provides a compatibility layer, but it doesn't add anything to Jena at all.

Personally, we should properly implement the Jena API. This would let us slot in with numerous existing systems, and get us off the ground in a hurry. It would also probably give us some real kudos in the community, as it would be one of the first Jena implementations which could handle large quantities of data, and do it quickly. Re-implementing the ontology API to directly access Kowari would also make a certain amount of sense, though the extra gain won't be so dramatic. This is because the Jena ontology API doesn't do a whole lot of work, leaving most of it up to Jena.

There are a lot of classes in the Jena API, so it will take some time to reimplement them directly on top of Kowari. I don't yet see anything insurmountable, nor anything that isn't scalable, so I'm confident that there isn't a huge risk in doing this. Even if we were to find an operation that would be grossly inefficient, a simplistic and inefficient implementation couldn't be any slower than what is offered by Jena now. The Resolver might prove to be cumbersome, but for the moment we could probably use the one found in Jena.

TJ has two major concerns with reimplementing the Jena API for Kowari. The first is that he's worried we might find something in the API which is just not efficient to put over the top of Kowari, creating a significant drain on our resources for no real benefit. As I mentioned in the last paragraph, I don't think this is a real risk. Jena currently performs most operations by iterating over the entire database, and in the worst case we can match that performance.

TJ's second concern is that we might be able to offer features and efficiencies that would be useful to developers that the Jena API simply can't take advantage of. Neither of us know what these might be, but I do see where he's coming from. At this stage I think I'd need to be actually implementing this before I'd see it. Maybe AN could offer some insight here.

For this reason TJ wants to create an externally available API to access the Kowari query layer. It wouldn't have to differ too much from what we currently have, but it would clean things up a lot. This would offer an alternative (and more efficient) interface into Kowari/TKS rather than using an iTQL interpreter. TJ even suggested it could be done as an extention to JRDF (though I'm not convinced of that one). From a technical point of view I agree with the eventual need for this API.

TJ thinks that writing an API like this for a datastore that is as scalable and as fast as Kowari/TKS could entice many developers away from using the Jena API and into using ours. He knows of at least one company that would happily do so. I'm not so sure how widely accepted it would be though. It really depends on how much the developers out there have come to accept Jena as the standard API for this sort of thing. After all, there are viewers, and other pieces of software out there that are already Jena aware. It's a shame AN wasn't here, as he would have quite a bit to contribute to the conversation. I'm sure DW has some definate ideas as well.

We both agreed that ideally we'd have both a Jena API, and our own - giving the best of both worlds. We can even implement the Jena API in terms of our own (since our own would be a reasonably thin wrapper anyway). Unfortunately, as always, it's a matter of time and money. I'll let wiser heads rule on this.

Until we have a decision on this, I really need to have a better idea of OWL, since I really only know how Jena does OWL. That way I can see what is involved in providing our own support without involving Jena. Knowing what OWL needs to do/doesn't need to do will help me work out whether Jena is doing it the best possible way, or if we can do it better.

So I'm back into the W3C documents and reading again. That requires a clear head. I really wish I didn't have this cold. :-}

BTW, does anyone know how I can get Mozilla to use the spell checker provided with Blogger? All I get is a popup that appears too briefly to see what it says.

No comments: