Friday, February 13, 2009

Prodding

Going back through a recent flurry of activity by Webster Mudge on Google Groups, I noticed a couple of things directly related to me.

First was a link to David Wood's post from last month in which he talks about how I did SKOS using RLog (some nice compliments BTW, thanks David). Both in this post and personally, David has been hassling me to integrate RLog into Mulgara. I'd love to get this done, but SPARQL and scalability have been priorities for me, and no one ever asked for RLog before. But it's been shuffling to the top of my list recently, so I'm going to see what I can get done in the next week, before I get loaded with new priorities.

The other link was to SquirrelRDF and included the comment, “Great idea, bummer it's tied to Jena.” This intrigued me, and I wondered if it was something Mulgara could do, so I checked it out. Only, once I got there I discovered that Mulgara already does it, and has done for years!

That's one of the biggest problems with Mulgara: lack of documentation. People just aren't aware of what the system can do, and there's no easy way to find out. I'd love to fix this on the Wiki, but when I'm accountable for getting things done, and not for telling people how to do it, then I tend to opt for the "getting things done" work instead.

Resolvers

For anyone interested, Mulgara has a system we call the "Resolver" interface. The purpose of this is to present any data as live RDF data. Included with Mulgara are resolvers for accessing: Lucene indexes; RDF files over HTTP; filesystem data; GIS data; RDBMSs (via JDBC, and using D2RQ); JAR files; plus a few resolvers specifically for representing aspects of literals and URIs stored in the database. Most are read-only interpretations of external data, but some are writable.

We also have a related system called "Content Handlers". These are for handling raw file formats and returning RDF triples. We support the obvious RDF/XML and N3 file formats, but also interpret Unix MBox files and MP3 files (the latter was done as a tutorial). This mixes well with things like the HTTP and file resolvers, as it lets us refer to a graph such as http://www.w3.org/2000/01/rdf-schema in a query. In this example the graph will not be in the local database (it could be, but only if you'd created it), so the HTTP resolver will be asked to retrieve the contents from the URL. Once the data arrived, it would be sent to the RDF/XML content handler (havind recognized the "application/rdf+xml" MIME type), which will then turn it into a queryable local graph in memory. The query can continue then as if everything was local. If the data is on the local filesystem, or MIME type isn't recognized, then it will fall back to relying on filename extensions.

It's because of the way these things hook together that allows us to hook SPARQL sources together easily. It may be messy, but it is perfectly possible to select from a graph with a URI like:
http://host/sparql?default-graph-uri=my%3Agraph&
query=%40prefix+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E+.
+create+%7B+%3Fs+%3Fp+%3Fo+%7D+where+%7B+%3Fs+%3Fp+%3Fo+.
+%3Fp+rdfs%3Adomain+%3Cmy%3AClass%3E+%7D
I've split the URI over a few lines to make it fit better, and I also used the graph name of my:graph just to keep it shorter. It's legal, though unusual.

Mulgara originally aimed at being highly scalable, and we're in the process of regaining that title (honest... the modest improvements we've had recently are orders of magnitude short of XA2). However, the sheer number of features and flexibility of the system is probably it's most compelling attribute at the moment. If only I could document it all, and spread the word.

Oh well, back to the grind. At the moment I'm alternating between RESTful features (I want to PUT and DELETE individual statements) and a class that will transparently memory map a file larger than 2GB. For the latter, I'd love to offer and extension to java.nio.Buffer, but this package has been completely locked down by Sun. I hate not being able to extend on built-in functionality. :-(

No comments: