Out of Hours
The rest of my week was almost as busy as the time I spent elucidating Kowari.
On Tuesday DavidW and I went down to the University of Maryland's MIND lab to meet some of the MINDSWAP group. We were shown a very impressive demonstration of ontology debugging in Pellet, using a couple of different methods labelled white-box debugging and black-box debugging. As the names imply, the white-box method carefully follows the reasoning used by Pellet, while the black-box method looks at inputs and outputs. I'm not sure how much of this could be automated, but as a tool for debugging it was really impressive. It was enough to make me consider hooking Kowari into the back end of Pellet.
In fact, hooking Pellet into Kowari has a couple of things going for it. First, it gives me a point of comparison for my own work, both in terms of speed and correctness. Ideally, my own code will be dramatically faster (I can't see why it wouldn't be). However, having followed Pellet for a little while, I'd expect it to provide a more complete solution, in terms of entailment, and particularly with consistency. The second reason is to provide a good set of ontology debugging tools.
I was also asked to give a demonstration of the new rules engine in Kowari. I'd been running it for a couple of weeks at this point, and trusted it, but it still made me nervous to show it to a room full of strangers, all of whom understand this stuff. Everyone seemed happy, but it gave me a little more motivation to get back to work on completing the implementation.
Fujitsu has a lab upstairs from MINDSWAP, and a couple of their people had asked to come along to meet me. While we were there, they asked me several questions about how to make Kowari load data quickly. It seemed that they were sending insertions one statement at a time, so we suggested blocking them together to avoid some of the RMI overhead. They also invited me back the following day to see what they've been doing.
OWL for Dinner
Afterwards, DavidW and I went to dinner with Jim Hendler. Nice guy, and he was quite happy to answer some of my questions about RDFS and OWL. The one thing I remember taking away from that night was a better understanding of the agendas of the data modellers and the logic people participating in OWL. This culminated in the explanation that RDFS is that part of OWL that everyone could easily agree to in the short term, thereby enabling an initial release of a standard for this kind of work. This explains quite a lot.
It wasn't explicitly mentioned, but I sort of inferred that the separation of OWL DL and OWL Full was the compromise arrived at between the ontologists who needed to express complex structures (OWL Full) and the logic experts who insisted on decidability (OWL DL).
The following night I was back down at the University of Maryland, this time visiting Fujistsu labs.
The evening started with a follow up question about bulk loads. There were two problems. The first was that they were running out of memory with insertions, and the second was the speed.
The memory problem turned out to be a result of their insertion method. It seemed that they were using a single INSERT iTQL statement, with all of the statements as a part of a single line. Of course, this was being encoded as a Java String which had to be marshalled and unmarshalled for RMI. Yuck. As a quick fix I suggested limiting the size of the string and using several queries (this worked), but I also suggested using N3 as the input format to avoid RMI altogether.
The speed problem was partly due to the RMI overhead (it was marshalling a LOT of text to send over the network!), but mostly because the insertions were not using transactions. I explained about this, and showed them how to perform a write in a single transaction. The result was a speed improvement of an order of magnitude. I'm sure that made me look very good. :-)
While there I was also shown a project for a kind of "ubiquitous computing environment". This integrated a whole series of technologies that I was familiar with, but hadn't seen together like this before.
The idea was to take data from any device in the vicinity, and direct it to any other device that was compatible with the data type. Devices were found dynamically on the network (with Zeroconf, IIRC) and then queried for a description of their services. These descriptions were returned in OWL-S, providing enough info to describe the name of the service, the data formats accepted or provided by the service, URLs for pages that control the service, and so on. They even had a GUI configuration tool for graphically describing a work flow by connecting blocks representative of the services.
As I said, there was no new technology in this implementation, but it's the first time I've ever seen anyone put it all together and make it work. The devices they had working like this included PDAs, desktops, intelligent projectors, cameras, displays, databases, file servers and telephones. It looked great. :-)
Sunday, August 28, 2005
Out of Hours