Monday, May 28, 2007

Day 2

Once again, I'm writing a lot of my own thoughts here, rather than just trying to keep a track of everything that happened at the conference. If you're interested in that stuff, then I'm sure lots of people wrote about it. For me, I found that it got me thinking about a lot of things, which is the main reason for me to write.

Random Thoughts

This doesn't mention any sessions at all. I just had a series of thoughts on various topics as they got mentioned at the conference, and thought I'd note them down.

AllegroGraph

My last couple of posts mentioned AllegroGraph and immediately started diving into the Mulgara implementation. One justification for this is that such a well advanced system inspired me to examine where Mulgara is, and where we want to take it. However, on some level I confess to feeling a little jealous. After all, we had the chance to be even more advanced than this, but lost it. However, we still seem to have a few unique features, and have more coming (which I'm sure you'll hear about soon).

From a philosophical point of view, having high quality open source infrastructure will help enable the Semantic Web to progress in the same way that it helped the Web and Web 2.0 to develop. Even when a project demands access to some of the features that only commercial vendors provide, an open source alternative can help bootstrap a project in its early phases, and provides competition for all to benefit from.

So while I may feel a little jealous, I'm hoping this will help inspire me to get Mulgara to do more. It can't just be Andrae and I though. Fortunately, this conference appears to have helped us there. We may be getting some more help from places like Topaz, but I'm also hoping to lift the profile of the software. After all, the more people who use it, the more likely it is that someone will need to scratch an itch and submit some code!

Codex/DLV

The core of the fourth codex suite is the Codex system. This is the new commercial name for OntoDLV [PDF]. This blog isn't an advertisement for my current employer, but it's worth mentioning the system all the same, as I find it interesting.

One of the features of Codex is to translate its ontology language into a program that can be run on DLV, a disjunctive logic reasoner. DLV offers an interesting set of features, including closed world reasoning, true negation along with negation as failure, non-monotonicity, and disjunctions in the head of a rule. Some of these are the opposite of OWL reasoning (closed world, negation as failure, non-monotonicity) while the disjunctive feature is completely orthogonal. Building an ontology language on top of this reasoner creates a very different kind of modeling environment to OWL. Interestingly, the guys in Italy have recently been looking at the consequences of importing OWL queries into the "rules" part of the OntoDLV language. This would make for an interesting complement of features.

OWL is built the way it is because of the interactions with the World Wide Web that it was designed to describe. An open world assumption is vital on the web, as things are being added all the time. Being monotonic may not work perfectly (as different sources may disagree) but is still important, as there is no way to assert priority of one source over another. The lack of a unique name assumption is also important, as it is very common that things are referred to by more than one name. In fact, it occurs to me that having a truly open world assumption means that you must not have a unique name assumption, particularly if there is an equality or equivalent operation in a language. Now that I think about it, this make it seem strange that many description logics have an open world assumption, but also presume the unique name assumption.

Contrary to the properties of the web, the corporate environment is often quite different. It may be because "the enterprise" environment is a consequence of decades of training to conform to existing systems, such as relational databases, or the type of modeling needed in this environment is inherently different. Whichever way it is, corporations model, collect, and think about data in a particular way, most of which doesn't work well with OWL. Records are often taken over, and over again, with the same structure each time (yes, corporations may have several nearly identical databases with different record structures, and OWL can help here. But bear with me). Entities have been allocated unique identifiers, and do not need to be "declared" to be distinct. Most importantly, if a company does not know about data, this almost invariably means that such data does not exist. (e.g. an employee record cannot exist until it has been entered by someone authorized to do so. Accountants don't want to know about possible future employees).

While listening to people at the conference I started to recognize that people want each of these different properties, depending on their environment. It seems that Codex has something to offer many people, especially with it's ability to import OWL data.

Reasoning

Another thought that I had while listening at the conference was that OWL reasoning is often done in a semi-closed way. Inferences are usually made in a way that accepts the possibility of unknown facts (open-world), but it can necessarily only reason on the data that it knows. This has a closed world flavor for me, though it is definitely kept consistent with the open world model.

There are lots of things that might be true that we don't consider when reasoning. No one ever declares that all individuals are in fact distinct (when it is known that they are). Classes are never declared to be disjoint, unless there is a specific reason for it (such as sharing a common superclass).

I'm not saying that people make inferences that presume unique names, nor that they do anything which would be invalid were two things revealed to be identical. The math behind valid inferences is too solid for that. But people see reasoning happen on the limited data they provide, and tend to think of that data as being their entire Universe of Discourse (I've wanted to use that phrase again for years). Consequently, people are repeatedly making closed world presumptions, and wonder why inferences and calculations don't come up with the answers they expected.

We also never see the "possible worlds" that a current model allows for. Given the infinite nature of the open world presumption, these possibilities might surprise people. In some cases, the model may prove to be completely inconsistent with reality, indicating that more modeling information is required for the system to be really useful. I know that people like Ian and Bijan know how all this works intimately, but most people are surprised when they see owl:cardinality apparently violated, so this sort of consideration is way beyond where most people are thinking.

I suppose I had this thought partly because almost no-one declares every owl:Thing as owl:differentFrom every other distinct owl:Thing. Similarly, classes are rarely declared as owl:disjointFrom unrelated classes. Sure Man and Woman may be declared disjoint, but I've yet to see Vehicle and BillingCenter get declared as disjoint from one another. This becomes more noticeable when you consider the differences between a closed world reasoner (like Codex) and an open world one. It's not that an open world ontology is any less expressive, it's just that you need more information to encode the same scenario. The funny thing is that people often have this information, but they don't put it in. Consequently, open world reasoners don't come up with as many inferences - not because they are less capable, but because they don't know as much about their system as a closed world reasoner does.

3 comments:

peter royal said...

i must admit that i'd be far more into contributing to mulgara if the license was more on the BSD-style side of things. alas, i realize it probably is what it is in order to be compatible with the original kowari code..

Quoll said...

That's exactly right. At least we got rid of the entity required to be the "initial contributor" in the MPL.

What is it about the OSL that you don't like? I know it won't be for everyone, but I'm interested in the reasons.

For open source projects, I like those licenses that keep the source code open. I think it would bother me to discover that something I'd freely contributed to the community was being wrapped up in a proprietary system, with others gaining an advantage without contributing back to the community. I could live with it, but it would bug me.

With the exception of keeping the source code open, the OSL seems pretty permissive to me (obviously IANAL).

Of course, Stallman always said it is OK to add extensions and sell them. You just need to include the source code whenever you sell the software. I don't really know if that applies in the OSL, as it depends on the definition of "public" in section 1.c.

Giving away source code when you sell something doesn't bother me much, as I've yet to deal with a client who wants to go to the effort of learning the architecture and making wholesale modifications. If they buy the software, then they understand that it's cheaper to ask for support and/or modifications from the supplier than it is to do it themselves.

peter royal said...

its section 1c, the requirement that the OSL be used on any derivative work.

since i could see "communicate" being used to mean web applications.. frequently i'll tweak BSD-licensed software to meet specific needs that i've got. if the fix is quality, i'll generally contribute it back, but sometimes not.

ultimately, requiring derivative works to be under the OSL is actually limiting the freedom that users of the software have to do what they want with it :)

(and yes, the use of OSL has expressly kept us from using mulgara @ my day job, since we would want the ability to sit on any modifications that we view as advantageous for periods of time. its the cost of maintaining an internal fork vs contributing back, the longer the internal fork goes the less value it has over time as divergence with the main project occurs. i find the OSL to be discouraging of such behavior due to 1c)