Tuesday, July 20, 2004

I was a little distracted today with the proceedings of the DAWG.

Over the last few days Jeff Pollock has expressed frustration that the DAWG has not made it a formal requirement to commit to a language based on XQuery. Personally, I've been a little surprised at this, given that the group seems quite happy to consider such a proposal, but wants to investigate the ramifications first. The relevant section of the minutes says:
We discussed Proposed XQuery requirement and/or objective without reaching critical mass around any particular wording. ACTION SimonR: write a document discussing tradeoffs with adapting XQuery as an RDF query language for discussion thru the September meeting in Bristol.

The issue seems to be that Jeff insists that a commitment to XQuery as a "requirement or objective" must be made now. Of course, Jeff has his own agenda on this, as does everyone else. That is the point of a committee after all. However, I haven't found his arguments to be persuasive.

Jeff made 4 points for using XQuery as the basis of the query language to be proposed by the DAWG. The first 3 of these are quite valid, but miss the point. In two of them he points out that XQuery is modular, general purpose, and well structured. In the other he points out that XQuery is a W3C specification, and describes the importance of supporting such standards. In other words, he describes many of the strengths of XQuery as a language, with no reference to its applicability to RDF.

However, it was the fourth point that frustrated me. Excerpting from his message:
the output of RDF and OWL (and most likely SWRL) specifications was solidly grounded upon XML inside the SemWeb layer cake

RDF can certainly be represented as XML, but to claim that the RDF specification is "solidly grounded" on XML is incorrect. XML describes a tree structure, while RDF describes a directed graph. Many RDF documents have a simple tree structure and are easily represented in XML, but when an RDF document contains loops it cannot be directly represented in XML. One method to overcome this is to label a branch of the XML with an ID attribute, and to refer to that ID from elsewhere in the document. While this works, it circumvents standard XML structure.

For XQuery to deal with RDF XML is certainly possible, though not as trivial as one might expect from an XML-based structure. To claim that XQuery is appropriate for RDF and OWL because they are solidly grounded on XML is incorrect.

That said, I have no personal objection to the use of XQuery. I'm looking forward to reading Simon's document on the benefits and problems associated with using it. In the meantime, Jeff should consider his arguments more carefully in future, as his current ones don't carry any weight.

RDFS Rules
My initial idea to work around the problem of duplicated variables described yesterday was to replace the rule objects which use them with a different object type that did the workaround. Once I got to implementing the new class, I realised that I really wanted to make all the rule objects appear the same in the .drl file, so the same tests could be called on them. That meant that both classes should implement a common interface. After considering the operations classes needed by these classes, I finally opted for an abstract class instead.

The classes are nearly done, but I'm still working on the new insertion code for the workaround class. I also realised that rule XI which needs namespace string matching, can also be done with an extension of the abstract class. Up until now I'd been avoiding rule XI because we didn't have iTQL that could be used for it, but now we need to provide workarounds for other missing iTQL functionality it makes sense to implement this rule as well.

I also spent a little time documenting after-the-fact requirements for the paging pre-fetching.

I had an extended lunch while I went into UQ again. I've been told by both Bob and the nice lady in ITEE postgraduate administration (Kathy) that my proposal will be accepted by the university, but that they are notoriously slow at getting through these things. Kathy explained that she spends a significant portion of every week trying to get a response out of the main university administration on student applications, so I may be waiting a while to get back an official response. In the meantime, the application form from the university says:
I understand that I will be enrolled as a student on the commencement date I specify above, and that I have agreed to start my research project on this date even if I have not received written advice from the University about my admission.

This is a little annoying, as there are a few texts in the Physical Sciences library that I'd like to borrow, but I can't until the application is processed. Having the application accepted would give me a little piece of mind as well. Kathy promised to help. At least I'm more fortunate that overseas applicants, whose visas typically require an acceptance.

I spent most of my time in a discussion with Bob about what I should be doing to start with. At this point he just seems to be happy to help me find my feet while I work out the specific direction I should be going in. He also lent me a few PhD and Masters theses to provide a rough guide of the sorts of things expected eventually.

Probably the most useful remark Bob made was not to let myself get caught up with coding. It usually has little bearing on the thesis, and a student can fool themselves into thinking that they are getting something useful done, when they're really standing still. I still have some ideas which need me to write code, but I'll be careful to keep this warning in mind.

We also spent a little time discussing Bob's current research. He is working with a group out of DSTC along with people from IBM and Sandpiper Software. They are building a translation mapping from one ontology description framework to another. These include UML, OWL, E-R diagrams, and others. It certainly touches on what I'm interested in, although it is in a slightly different direction, particularly given that my emphasis is squarely on OWL. Still, I'm interested in learning a little more, so I'll read up on it in the coming weeks.


Anonymous said...

Hey there-

Thanks for the attention to the DAWG proceedings, as I pointed out in one of my posts, we all take this work very seriously.

I just wanted to point out three things:

(1) the DAWG charter says "It is a requirement that the query language be compatible with an XQuery context" ... thus my frustration at only being able to assign an action item (vs. a requirement or objective) for the DAWG working group. This is especially important when a group reaches the strawman milestone and votes on requirements.

(2) I am right on point in _all_ my arguments - precisely because I am arguing for a surface layer query syntax for _all_ of the SemWeb - not just RDF. We know it works for OWL because we've implemented it, and we can assert strongly that it will work for RDF (see Rob S. notes on DAWG), and we are confident that Simon's work (with help from others) will show conclusively that the BRQL algebra can be expressed in XQuery syntax.

(3) The whole XML grounding thing is confusing, but I will conceed half-way to your point -- the RDF purists never wanted RDF to have an XML representation. The fact is, however, that it does. And further, it does precisely because of the importance of having a common syntactic representation for the _entire_ SemWeb stack.

Oh, and one other note, my agenda is for the SemWeb architecture to succeed in the market. I don't believe it will if the DAWG produces a one-off query language that is only good for RDF. We can disagree on that point, but please don't imply that my position is based on my company's desire to support XQuery.

Thanks for caring! I hope that you become convinced over time, and through Simon and others work to show how XQuery will work for RDF (with all the implications that it will work for OWL and a future rules spec)



Quoll said...

While not disagreeing with your position (I actually like the idea of XQuery - I just don't yet know if it's a good idea), I still don't think that your arguments are persuasive here. I should address what you've said in order:

(1) You have misquoted the DAWG charter. You said, "It is a requirement that the query language be compatible with an XQuery context". The actual statement is, "There is a requirement for RDF data to be accessable within an XML Query context."

This does not mean that XQuery should be used, but rather that it be accessable. The surrounding context to this statement makes it very clear that XQuery may not necessarily be chosen as the RDF query language, but rather that a translation between the two must be available. It certainly expresses a desire to consider XQuery favourably in order to re-use W3C technology, but only within the limits of applicability to RDF.

(2) I'll confess I wasn't clear when I said that you're arguments missed the point. I never claimed that your arguments were wrong (in fact, I said that your points were valid), but that they miss the point.

The "point" that I'm referring to here is the applicability of XQuery to RDF. Your arguments are all about the merits of XQuery as a language, but none of them are technical arguments discussing their applicability to RDF. You claim that XQuery is better than SQL, but lots of things are better than SQL. You point out that XQuery is a W3C spec, but so are many other specs which wouldn't work for querying RDF. You claim that XQuery is more general pupose than SQL (again pointing out its superiority to SQL).

I don't fault you on these statements, but none of them refer to the applicability of XQuery to RDF, something the charter of the DAWG explicitly refers to. Your points should be describing why XQuery is good to use with RDF, but none of these statements do that, and hence I do not find them compelling.

(3) On the third point I can see that we would never agree. At least here you were talking about the applicability of XQuery to RDF.

I acknowledge that for better or worse RDF has an XML serialization. If it did not then we would not even be discussing this.

The XML representation of RDF can be quite dissimilar to standard XML structures, where the data layout is much more straightforward. Reflexive properties are a good example of this. There is symmetry in a reflexive property that XML cannot express, unless it resorts to duplication of resources which should only appear in the graph once. Either way, many (including myself) consider the representation to be quite poor.

The fact that an RDF document is in XML means that it can be queried with XQuery. However, it may be structured so poorly that using XQuery might be very inefficient and difficult.

My concern with your arguments was that many of them were based on XQuery being "good" without any regard for applicability. Chocolate is "good" in the general case, but not for a diabetic!

Now Simon and others may come back saying that the representation of RDF in XML is such that XQuery can easily traverse it, regardless of any structural irregularities in the XML. That will certainly validate the charter's desire to use XQuery. However, if they come back saying that the difficulties are too great, then the charter allows for that as well. In the latter situation the charter calls for a non-XQuery language to be proposed, and a mechanism for translating from one to the other to be provided.