Tuesday, May 29, 2007

Microformats

After the keynote, my first session for the day was on Semantic Microformats, given by Uche Ogbuji of Zepheira. After having had my first exposure to microformats in the form of GRDDL only the day before, I figured that this talk would provide some more info, which it did.

Everything you need to know about microformats can be found on the web, but attending a talk like this was exactly the kind of thing I needed. A talk like this can save hours of going through websites learning specifications, when a simple example can provide 99% of what I need.

RDF from Atom

Next I followed David down to a talk by Bradley Allen from Siderean Software, entitled: A Semantic Web Without RDF/XML: Building RDF Applications in Atom. This was held in the only conference room on the ground floor, so I was a little late after taking a few minutes to find it.

Bradley had a number of things to say, but I'd summarize it as saying that it is too hard to go out there an create RDF for everything, but we have a lot of perfectly good Atom out there to work with. He gave a brief rundown on how Atom can be easily converted to RDF, and how this then forms the basis of a semantic web (ie. everything already marked up with semantic [RDF] data) with no need to rewrite any data.

The main problem with the current definition of the semantic web (as opposed to the general field of "semantics") is that it requires a lot of metadata to be provided by content authors. There are a number of attempts to automate this by trying to extract the metadata with natural language processing, statistical analysis, even fancy greps, but this is still in its infancy. Then there are ways to provide hints (blatant or otherwise) to metadata extractors, which is what microformats can provide. The approach of using Atom is even more pragmatic, in that it is using metadata that already exists, and is easily convertible to the semantic web's standard structure.

I'm sure Siderean see their approach as being much more advanced than I characterize here, but that was the general gist of the talk. (Apologies to Bradley, but it was over a week ago now, and my notes are sparse).

Syndicated OWL

My favorite talk of the day was from Christian Halaschek-Wiener from MINDSWAP. The thing I like about MINDSWAP is that their theoretical work gets expressed really neatly and efficiently in Pellet. This is one of the many reasons I'm interested in getting Pellet integrated into Mulgara (something I have yet to discuss on this blog - don't worry, we still need a rules engine as well!)

Christian had a lot to get through, and I'm kicking myself for missing the first 5 minutes. (Not only was that annoying for me, but it's rude to the presenter). All the same, the talk was fascinating, and has much wider application that the domain of syndication.

Essentially, syndication with OWL means that you're bringing in data from a number of different places, usually at different times. More importantly, it means that you will be getting updates to your information. This has ramifications for any inferences you've made, or non-inferable queries you wish to make.

It turns out that most of the state that Pellet works out in performing its calculations is applicable for small deltas on the TBox (and ABox) data. This means that only a few things need be recalculated with each update. Christian showed reasoning timing graphs starting with 10,000 statements (I think that was the number), and then over a range where new statements were added. The previous version of Pellet took about the same amount of time (with small increases) as the new statements came in. In contrast, the newer version of Pellet took about the same time for the original data, but had a massive drop in time for the calculations when new data was added. In other words, it wasn't having to recalculate the state for the majority of the system, and was only having to deal with the differences.

I've thought about and discussed change management in OWL for a few years now, but I never put any effort into it. Christian's work [PDF] looks to provide a solid foundation for this. Well done Christian.

I'd really like to learn more about what he's done, and see if it only applies to tableaux reasoning, or if there is some applicability to rules inferencing (since Mulgara wants both). I suppose that will happen in my Copious Free Time.

No comments: