Thursday, May 24, 2007

SemTech Day 1

After registering, I looked around for where I needed to be, and for any familiar faces. Right off the bat I saw Bernadette talking with Eric Miller. I've been looking forward to seeing Bern again, and DavidW has been telling me for some time that he wanted to introduce me to Eric, so this was a pleasant coincidence. Eric is as sharp as I was led to believe, but far more personal and amusing than I expected. He's a really nice guy, and I can see why he's been so influential in the semantic web community.

The first talk I attended was by Brian and David, titled, Putting the Web Back into the Semantic Web. I already had some notion of what this was about, and I knew that they leverage NetKernel quite heavily.

Brian had shown me last year how NetKernel provides a RESTful interface and allows data and processing operations to be described in URLs. This basically allows efficient remote access to a arbitrary functions, with nestable function calls. It seemed cute when I first saw it, but did see that I needed it, despite the papers on efficiency that Brian had sent me. I certainly didn't understand his desire to use it as a framework inside Mulgara. But he knows what he's doing, so I've been prepared to give him the benefit of any doubt.

I won't go into the details of the talk, but I was very impressed at the ease in which they were able to access and process data using NetKernel. More impressive were the reports of what a NetKernel infrastructure could provide. Efficient transport over HTTP is a given, but near-linear scalability with threads and clustering really caught my attention. I guess it makes sense, since Google get analogous scalability with their stateless Map-Reduce libraries. I could hear similar murmurs of appreciation from the audience.

David went on to describe GRDDL, showing GRDDL markup on his own home page. I've been curious about microformats for some time, but too busy to look into them, so this was a nice, practical introduction. Introduce a NetKernel module for doing GRDDL parsing into RDF, and now you can specify URLs which refer to the RDF from a web page. They've also written another module which takes RDF data and inserts it into a Mulgara database. Using these together, a single call to NetKernel will perform an extraction of RDF from a web page, and insert it into a local RDF store. A little presentation layer work backed by the Mulgara store means that you can now browse through all the extracted RDF. This gets particularly interesting if the RDF contains linkages across the store.

I was impressed by this talk, and have since heard several others say the same thing. But it was a later conversation with Eric that made me really rethink the idea.

RDF can be used in several ways. The most trivial is simply for storing any kind of information. It works particularly well for information with a network topology, but it can be applied to anything really, even binary file formats like MP3 or PNG. I'm not saying that this is necessary a good idea (for a binary format it's a terrible idea), but it is certainly possible.

Another approach to using RDF is for modeling. This uses RDFS, and possibly OWL or SKOS. Almost every application which uses RDF is doing exactly this. Unfortunately, while RDF is a great structure for this type of data, it is a terrible API for developers to work with. This explains the need for projects like ActiveRDF, and Alexandria's upcoming modeling API (which works well, but needs more features).

Yet another approach is simply for linking resources. When I first heard about RDF in 2001, one of my first thoughts for using it was to describe links between arbitrary resources. Since those resources are described with URIs, my natural thought was to link URLs together, thereby creating a "Web" of physical resources. This is already achieved in the World Wide Web by the "anchor" linkages described inside HTML pages, but these links can only be created and modified by a page's author. As an author of a page, I will often link to other pages, but I am unable to link those pages back to mine. There are also occasions when there is value in linking arbitrary third party pages, such as when one page provides specifics for concepts described in another page, or when describing a workflow.

Of course, all 3 of these approaches can be used at once, to varying degrees. RDF is great that way. However, developers usually use RDF to meet a specific need, meaning that the variety of ways that RDF can be used is often overlooked. I think the last approach of linking arbitrary resources is underutilized, though I've seen it used here and there. Some people even complain that you shouldn't use RDF to link pages together, as any application using this data is expecting the resources in question to be URLs which refer to resources with a physical presence on the net. The RDF specification specifically uses URIs to indicate that the resources need not have a physical presence, and many people get upset when you create an application that presumes that a URI is a dereferencable URL. But insisting that those resources cannot have a physical presence is stating that RDF isn't allowed to describe certain types of resources, hence, I disagree. But I digress...

Since NetKernel allows you to reference any operation on any kind of data via a single URL, then using RDF it is possible to link any kind of resource to any data you can dream of. Not just static data as we do now, but real-time, arbitrarily complex, calculated data. Not only that, but it's trivial to do.

This got my head spinning. It opens up a lot of possibilities.

Continued

I put this post down while traveling back to Chicago, so it makes sense to break it here and continue later. There's lots to talk about, and it got me thinking about a lot of things that I want to write down while it's still fresh, so I can clarify my perspective.

No comments: