Sunday, July 25, 2004

Directions
As I said on Thursday, there are lots of things to do next, and I was trying to work out hte best direction to take next. AN suggested that exporting and importing N3 would be a good idea, as other people need it as well. This suited me well, as it let me put the decision off for another day.

After creating an "export" command in the iTQL grammar, I learned that the "backup" command already writes either binary backups, or RDF. Since this function is already overloaded, it made sense to back out the "export" command, and just overload the "backup" command a little more. I didn't quite finish, but it will soon write N3 files if the output filename has a .n3 extention.

XQuery
The main reason I didn't get a trivial thing like N3 exporting done was a longish video conversation with SR. We discussed a number of things, including the appropriateness of XQuery for the DAWG.

It seems that XQuery can query RDF data with few technical problems. I had thought that this would be the answer to all potential problems, and that XQuery would therefor be a suitable strawman. However, SR made the important observation that while XQuery can find any required data easily, providing an API base on XQuery is problematic.

The easiest way to describe the problem is to go back to that old comparison with SQL. SQL is very restrictive, partly in the structure for querying, and particularly in the structure of the returned data. However, for programming an API it is this very structure that makes it so useful. Any returned data comes back with elements formatted in rows of typed columns. This allows an API to retrieve any element, typically with statements which iterate over the rows, and accesses the elements by column.

XQuery is much more open, both in methods of retrieving data, but also in the returning format. While the flexibility of the query structure is advantageous, the same characteristic is a problem for the returned data. XQuery has the ability to format returned data in any manner of format, from an RDF document through to an Excel spreadsheet. This felxibility is a liability for a potential API, as there is no consistent way to access the returned data. Unlike SQL, it is possible for a user to create a query that returns data in any possible format. Even if a return format is specified, then a user can always create a query that doesn't follow it, breaking any API that tries to wrap it.

While a full-featured and flexible query language is useful, one which cannot be effectively wrapped in an API is a liability.

I'm sure SR has more to say on this, and I'm looking forward to how the DAWG approaches his results.

No comments: