Thursday, March 03, 2005

I want to keep it brief tonight, but as usual I have quite a bit to talk about. I'll probably gloss over a heap. :-)

I finished the RDFS rules structure today, sans rule XI. The reason for this is because it requires prefix matching on strings. I don't have a problem with this, but I want to know what the internal structure of the query will be before I go ahead and encode it in RDF. I will probably start on a resolver to do prefix matching some time next week.

I also reverted some of the OWL in which I tried to describe the structure of a transitive query in painful detail. There were a few reasons for this. The first reason was because I was having to subclass everything: the constraints, the properties, and the element types. This was completely divorcing the structure from the rest of the class system, which was the opposite of what I was trying to achieve.

The other reason is that the Java classes in Kowari don't try to manage any of this either. Instead, they assert that the structure appears correct when it comes time to process it. Since the class structure that this RDF is supposed to represent doesn't care about this aspect of the structure, it appeared unnecessary to try and encode it in OWL. The final result is much cleaner looking RDF.

My challenge now is to load it all effectively, and build a Kowari class structure out of it. That will take some time, so I'm glad that I've budgeted accordingly. The reason I was thinking I'll get to the string prefix resolver is just so I can take a break from the code. :-)

DavidW asked for the numbers RDF generator yesterday, and I dredged it back up again. It's pretty small, but I decided to GPL it anyway, for the principle of it. I also considered the structure of the RDF (which was designed for me by Simon), and realised there were problems with it.

Way back when I first wrote this code, Simon said that he thought that people might balk at the idea of declaring a node to be the owl:sameAs a literal value (in fact, Simon used the old owl:sameIndividualAs predicate). Well I checked the OWL language guide, and he's right. Instead, I've opted to use owl:hasValue, which is valid for OWL Full and OWL DL, but not OWL Lite. Applying values like this meant that I needed nodes to refer to each other instead of directly to values, so I've had to give them all names. In this case I've opted for where n is the same as the value of the node. That ought to blow out the string pool. :-)

The resulting output validates fine as RDF, but is not valid OWL. The only reason for this is because I use a heap of undefined classes. This is easy to fix with a few headers, but I felt it unnecessary. The reason is because this generator is supposed to create an enormous quantity of RDF, for testing purposes. OWL is not a requirement here. However, the semantic of owl:hasValue is useful here, so I've used it.

I had a couple of meetings today, which worked out very well for me. It started with a chance meeting with Janet again, and she invited me over to a morning coffee break with her group. Unfortunately, I ended up doing a lot of the talking, but I was well received and have been asked back. Janet had some useful observations to make as well, so I'm starting to appreciate how valuable it can be to have regular contact with other people who are familiar with a lot of your research issues.

Then at lunch I had a talk with Peter Bruza. He was a little confused as to my request for a meeting, but was kind enough to make the time anyway. I tried to explain my questions on the lack of ontology scope in languages like OWL, and how this restricts the kind of inferences I can make. He agreed with me, and then went on to discuss deductive, inductive, and abductive reasoning, and how he feels that abductive reasoning is what we really need.

I didn't understand abductive reasoning (and I'm a little hazy on inductive reasoning too) so he took the time to explain it to me. In my own words, it's a little like the following...

Deductive reasoning is where all of the facts are taken, and logic is applied to deduce a new fact. For instance:

  1. Confucius is a man
  2. All men are mortal
Therefore - Confucius is mortal.

Abductive reasoning does not follow this chain. Instead, it makes a "guess", and then compares it against the facts for consistency. For instance:
  1. All men are humans
  2. All women are humans
  3. Confucius is a human
  4. Confucius is not a woman
At this point we can guess that Confucius is a man. We can't know it to be true, because we have not been told if there is anything that is a human that is neither a man nor a woman. After making this guess, the statement can be tested for consistency against all the known statements, and if it passes, then it is a valid theory.

It's kind of like the "Formulate a theory" step in the scientific process. Automating this process would be a real breakthrough, so I can see why Peter is so interested in it.

I explained that the specific problems I'm looking at already has all of the data needed to perform a deduction, rather than an abduction, so in some ways my problem should be easier than his. My issue is an inability to describe the ontology sufficiently that these deductions can be made. I described this in detail, with some examples, and Peter agreed with my analysis. He also said that the problem is an interesting one, and a good topic for a PhD.

Damn. I didn't want to hear that. How am I going to find out now? (I can't afford to take time off work to do a PhD, as a PhD income is rather paltry). I guess I'll still work on it, but I was hoping that someone had already made a good start on it.

In the meantime, Peter has lent me one of his papers. His abduction work needs a way to encode his information (otherwise a computer can't work on it), so I'm hoping to find some insights in his methods.

I also keep running into the term "Non-monotonic reasoning". I have an idea that this is referring to using facts that may result in different answers at different times, but I think I need to find out a clearer specific definition.

I then got a message from a friend about a seminar at lunch tomorrow in which a system called BioMOBY is being presented. BioMOBY is a "biological web service interoperability initiative". Interestingly, they appear to be using Semantic Web technologies. They make the claim that:
interoperability between bioinformatics web services can be largely achieved simply by ontologically specifying the data structures being passed between the services (syntax) even without rich specification of what those data structures mean (semantics).

This, along with other statements, has me thinking that Kowari may be able to work effectively with this data, a point that I'm sure was not lost on the friend who sent me the seminar abstract. So I'll probably be losing yet another lunch time to work. :-)

There's more to say, but it's been a long day, so I'll say it another time.

1 comment:

Andrew said...

Following what you said above, non-monotonic is important because you can make close-world statements - that can change or be retracted when other information is known. So you can make assumptions.

With RDF you make only true statements in an open-world.

I think AI researchers gave up on monotonic reasoning a long while ago (at least 20 years ago) so RDF/OWL is pretty conservative. At least that's the feeling I get.