Monday, October 22, 2007

Theory and Practice

The development of OWL comes from an extensive history of description logic research. This gives it a solid theoretical foundation for developing systems on, but that still doesn't make it practical.

There are numerous practical systems out there which do not have a solid theoretical foundation (and this can sometimes bite you when you really try to push a system), but can still be very useful for real world applications. After all, Gödel showed us that every system will either be incomplete or inconsistent (maybe that explains why Quantum Physics and General Relativity cannot both be right. Since Gödel's Theorem requires that there be an inconsistency somewhere in our description of the universe, then maybe that's it). :-)

If theoretically solid systems not an absolute requirement for practical applications, then are they really needed? I'd like to think so, but I don't have any proof of this. In fact, the opposite seems to be true. Systems with obvious technical flaws become successful, while those with a good theoretical underpinning languish. There are many reasons for failure, with social and marketing being among the more common. Ironically, the problems can also be technical.

John Sowa's essay on Fads and Fallacies about Logic describes how logic was originally derived from trying to formalize statements made in natural language. He also mentions that a common complaint made about modern logic is its unreadability. But this innocuous statement doesn't do the evolution justice. Consider this example in classical logic:
  • All men are mortal.
  • Socrates is a man.
  • Therefore: Socrates is mortal.
Now look at the following definition of modal logic:

Given a set of propositional letters p1,p2,..., the set of formulae of the modal logic K is the small set that:
  • contains p1,p2,...,
  • is closed under Boolean connectives, ∧, ∨ and ¬, and
  • if it contains Φ, then it also contains □Φ and ◇Φ.
The semantics of modal formulae is given by Kripke structures of M=⟨S,π,K⟩, where S is a set of states, π is a projection of propositional letters to sets of states, and K is the accessibility relation which is a binary relation on the states S. Then, for a modal formula Φ and a state sS, the expression M, sΦ is read as "Φ holds in M in state s". So:
  • M, spi     iff sπ(pi)
  • M, sΦ1Φ2    iff M,sΦ1 and M,sΦ2
  • M, sΦ1Φ2    iff M,sΦ1 or M,sΦ2
  • M, s ⊨ ¬Φ     iff M,sΦ
  • M, s ⊨ □Φ     iff there exists s'S with (s,s') ∈ K and M,s'Φ
  • M, s ⊨ ◇Φ     iff for all s'S if (s,s') ∈ K, then M,s'Φ


(Courtesy of The Description Logic Handbook. Sorry if it doesn't render properly for you... I tried! The UniCode character ⊨ (⊨) is rendered in Safari, but not in Firefox on my desktop - though it shows up on my notebook.)

While several generations removed, it can still be hard to see how this formalism descended from classical logic. Modal logic is on a firm theoretical foundation, and it is apparent that the above is a precise description, yet it is not the sort of tool that professional programmers are ever likely to use. This is because the complexity of understanding the formalism is a significant barrier to entry.

We see this time and again. Functional programming is superior to imperative programming in many instances, and yet the barrier to entry is too high for many programmers to use it. Many of the elements that were considered fundamental to Object Oriented Programming are avoided or ignored by many programmers, and are not even available in some supposedly Object Oriented languages (for instance, Java does not invoke methods by passing messages to objects). And many logic formalisms are overlooked, or simply not sufficiently understood for many of the applications for which there were intended.

After working with RDF, RDFS and OWL for some years now, I've started to come to the conclusion that these systems suffer from these same problems with the barrier to entry. It took me long enough to understand the complexities introduced in an open world model without a unique name assumption. Contrary to common assumption, RDFS domain and range is descriptive rather than prescriptive. And Cardinality restrictions rarely create inconsistencies.

Part of the problem stems from the fact that non-unique names and an open world are a complete different set of assumptions from the paradigms that programmers have been trained to deal with. It takes a real shift in thinking to understand this. Also, computers are good at working with the data they have stored. Working with data that is not stored is more the domain of mathematics: a field that has been receiving less attention in the industry in recent years, particularly as professionals have moved away from "Computer Science" and into "Information Technology". Even those of us who know better still resort to the expediency of using many closed world assumptions when storing RDF data.

Giving RDF, RDFS, and OWL to the general world of programmers today seems like a recipe for implementations of varying correctness with little hope of interoperability - the very thing that these technologies were designed to enable.

However, the RDF, RDFS and OWL were designed the way they are for very sound reasons. The internet is and open world. New information is being asserted all the time (and some information is being retracted, meaning that facts on the web are both temporal and non-monotonic, neither of which is dealt with by semantic web technologies, but let's deal with one problem at a time). There are often many different ways of referring to the same things (IP addresses and hostnames are two examples). URIs are the mechanism for identifying things on the internet, and while URIs may not be unique for a single resource, they do describe a single resource, and no other. All of these features were considered when RDF and OWL were developed, and the decisions made were good ones. Trying to build a system that caters to programmers presumptions by ignoring these characteristics of the internet would be ignoring the world as it is.

So I'm left thinking that the foundations of RDF and OWL are correct, but somehow we have to present them in such a way that programmers don't shoot themselves in the foot with them. Sometimes I think I have some ideas, but it's easy to become disheartened.

Certainly, I believe that education is required. To some extent this has been successful, as I've seen significant improvement in developers' understanding (my own included) in the last few years. We also need to provide tools which help guide developers along the right path, even if that means restricting some of their functionality in some instances. These have started to come online, but we have a long way to go.

Overall, I believe in the vision of the semantic web, but the people who will make it happen are the people who will write software to use it. OWL seems to be an impediment to the understanding they require, and the tools for the task are still rudimentary. It leaves me wondering what can be done to help.

No comments: