Friday, February 25, 2005

Update
So where have I been, and what have I been doing? Why have I not blogged for the past few days? Have I been accomplishing anything? The answer to the final question is, "Yes." The others require explanation...

Over the last few days I've had various evening engagements. Last night was the opening of the French Film Festival, along with party. I'd provide a link, only the site contains a frameset with navigation via cookies. Very messy, and it can't be consistently deep-linked to. Try Palace Cinemas and click on "Festivals and Events" if interested.

Unfortunately, this clashed with the opening night of Flickerfest, which I'd rather have seen (we had tickets with friends for the French film, so we had to forgo Flickerfest). We have Flickerfest passes, so at least I'm getting to see the remainder. Tonight's screening was really enjoyable. I particularly enjoyed "7:35 De La Manana", which was a romantic song and dance number, performed by a suicide bomber and his captives. You'd have to see it to get it. :-)

Other nights this week were just a matter of falling asleep early, due to exhaustion.

The reason this impacts my blog, is because I write at night. I had no ability to post during the day, due to the bug I described where I log into other people's accounts while on campus. Even if I wrote during the day, it would be difficult to post at night, simply because I haven't been turning the computer on after hours.

Rule Structure
After a few days of experimenting I now have a first draft of an OWL description of the rule data. Now if only I had an OWL engine to test the validity of the RDF I write to this specification. :-)

I'm reasonably happy with this draft, but I already want to make some changes.

I started out by building a structure in a ball and stick diagram to represent a simple rule. Unfortunately, I found that this approach has problems. While a ball and stick diagram is effective for illustrating the main elements of a graph, it quickly gets out of control when types and inheritance are included. Consequently, I stuck to the main points, and used RDFS/OWL to flesh it out when I came to write the RDF-XML.

In fact, I realise now that I only really like to use ball and stick diagrams for ABox data. TBox data may look fine in UML, but in RDF it isn't so pretty. The biggest problem is that it gets merged in with the ABox data. After all, RDF is good for merging data in this way. But having all the data in one place makes the diagram unreadable.

Structure
The biggest problem I had with this design was how to structure things. By this I mean decisions like where to use blank nodes, where collections are appropriate, and so on.

I had first thought that I would be navigating my way through the RDF programmatically. For this reason I thought that the JRDF interface might be the way to go. However, after letting myself be fooled like this for a while I started realising just how many queries would be required for the simplest parts of the graph. I felt like a fool when you consider that I am using these rules to perform set-at-a-time operations on the very same database. All I need to do is use iTQL to do all the work in one fell swoop.

To help here I can do a couple of things to make the queries easier. The first is to avoid collections where possible. The next is to be careful about the choice of blank nodes.

The only problem I have is that a variable list from a select clause must be ordered. This means that I have to use a sequence, meaning the use of the rdf:_ predicates. I don't believe that DavidM got around to prefix matching in the stringpool, so there is no way to easily query this data.

There are several approaches I can think of here. The most obvious is to use the programmatic approach. This works, but is a horrible way to go. I'll only use it as a last resort.

The next idea is to implement prefix matching myself, including a syntax in iTQL. I know how to go about this, although the syntax would probably become a debating point, like it always does. The main problem is finding the time for this.

The last idea is to build a resolver that can find these values. It may be a decent halfway measure, though it may include some hacks. For instance, it could return a tuples of rdf:_0, ... rdf:_100 to match against set of less than 100. A terrible idea for the general purpose, but fine when you know that you will never have more than half a dozen values.

Before I commit to anything I will look in the string pool and consider what I need to do for prefix matching. I can then work from there.

There are still some corners of the structure where I'm trying to determine programmatic control versus iTQL. The best example of this is the selection of RDF structure versus an iTQL string. Since I allow both, I need code which can handle either. So do I find all rules together and sort them out as I iterate over them, or do I select them separately? This decision has an impact on the structure, so I have to consider it. At this point I think I'll be building them separately.

All in all, I'm relatively happy with the ontology. While it's not "diagrammatic" it still provides me with a solid template to build the RDF rules with. Since every rule is structured uniquely, it gets tempting to fiddle with the structure to suit the occasion, but an ontology like this prevents me from getting off track like that. It will also make me carefully consider the structure any time I decide that a change really is necessary.

Constraints
One thing I hadn't expected is that I got to rename a few classes. While the RDF will need to map into the Kowari query system, the names I give the RDF nodes need have no bearing on the Kowari class structure (although it tries to reflect it). As a result, I was able to name the constraint classes in a way that I'm happier with.

In particular, I'm using the name Constraint rather than ConstraintOperation. In the same way, I'm using SimpleConstraint instead of ConstraintImpl. ConstraintConjunction and ConstraintDisjunction are both subclassed from ConstraintOperation which in turn is a subclass of Constraint. Very similar to the java code, but just enough different that I feel more comfortable about it.

OCL
At the request of a friend, I'm spending a bit of time out of hours learning and working with OCL. I haven't learnt all that much yet, but it makes sense so far. I'll certainly be picking up more UML as I go!

Working with OCL, and considering the MOF, I'm starting to look for these layers in OWL and RDF. Sometimes OWL seems really restrictive since so many meta-layers have been collapsed into one.

Object Database
I had a weird thought the other day. I imagined a class ontology stored in RDF, including class images stored as literals. A ClassLoader implementation could then create instances of a class directly from the datastore, as opposed to using the file system for the classpath.

The examples I can come up with to use this seem pretty contrived, but it seems useful nonetheless, particularly when bringing in new (trusted) RDF with class definitions at runtime. The nice thing is that the class definitions could come with their own ontologies. (Is there a standard for serialised UML into RDF? Could OWL do the job? Where would OCL fit in?) Does anyone have any ideas on this? Do I just want to do it because I can? (To much software suffers from this).

The only problem I have is that I don't think we can store an arbitrary blob in Kowari. The string pool will take it, but I don't know if the interfaces will (I'm pretty sure they don't). Of course, I can always uuencode a binary, but putting the blob in directly would be better. Maybe I need to put a new datatype into the string pool.

Weekend
Well it's late, and I want to ride in the morning, so I'll leave it here. I plan on working a bit this weekend, so I'll see how far I get.

Aside from work, I also need to start writing the confirmation on my weekends. I was putting this off until I'd finished reading some more papers, but I seem to be constantly accumulating literature, so at some point I'm going to have to stop. Maybe I should draw the line where I am now.

13 comments:

Rob said...

RE: Object database

I am using a MDA tool (yuk!) for generating a web application base from UML. It uses XMI to generate Java classes.

I was also thinking of generating ontologies from UML (XMI). Let me know if you do any work you do in this area.

Andrew said...

SPComparator provides a way to do prefix searching.

Have a look at:
public int comparePrefix(ByteBuffer d1, ByteBuffer d2, int d2Size);

It's not exposed much above the level of the StringPool yet though.

Andrew said...

As far as RDF and Java is concerned Mindswap did a thing called "Dynamic Java Class Loader using OWL":
http://www.mindswap.org/~mhgrove/OWLClassLoader/

Related and I'm just putting here because I think it's cool is JPred - executing Java code based on predicate logic:
http://www.cs.ucla.edu/~todd/jpred/

Paula said...

Hi Rob,
I'm intrigued at your comment about "generating ontologies from UML". Any UML *is* an ontology. So what are you referring to? Do you mean translating into an ontology language like OWL? If so, then look at my supervisor's research. He's specifically looking at that sort of thing.

UML and OWL intersect a great deal, but each includes items which are not available in the other. For instance, in UML I can define a relationship from object A to object B, where object B will be deleted when object A is deleted. OWL's open-world assumption does not really deal with the concept of an object lifetime. Conversely, OWL has cardinality constraints, which are much more complete than OWL linkages.

In both cases it is necessary to drop information when transferring from one ontology language to the other. While not insurmountable, it is an important issue to consider.

Paula said...

Andrew,
Yes, I know that the comparators can match around a string prefix. I also know that iTQL does not have a language binding for it. What I didn't know is how far up the interface chain the comparators are visible.

Thanks for the pointer. I'll go poking around and see what I can come up with.

As a part of my current work I also need to get a "collection" function into Kowari. It might increase my time estimate, but I should include this as a part of that process.

Paula said...

Andrew,
Both of these URLs look interesting. The first because it aligns with what I was thinking about. The second because I've been up to *here* (think of me holding my hand up to my neck) in predicate logic lately.

Actually, I've started to think of the ramifications for predicate logic on Kowari. It won't be as efficient or scalable as the rules system I'm building, but it would have one important advantage... provenance.

It's one thing to say, "Given A and B, we infer C". However, it may be important to ask, "Why do we know C?" and to be able to respond,"Because of A and B." It is not easy (and sometimes impossible) to keep this information while doing set-at-a-time inferencing. The rule that generates a statement may be kept (useful for backward chaining when calculating ontology changes), but usually not the data that caused the rule to be triggered.

In other words, given a rule ℛ:
∀x: Ax ⋀ Bx → CxWe can either calculate each Cx one at a time, or all values of C at once. The problem with doing them all at once is that we can store ℛ with Cx, but we lose the Ax and Bx.

So predicate logic can still have an important role to play for Kowari, but it depends on the requirements.

Paula said...

Damn it's hard to format these comments. :-)

Andrew said...

Okay, when you said "I don't believe that DavidM got around to prefix matching in the stringpool, so there is no way to easily query this data."

I was just answering that - there is prefix matching in the stringpool - there just isn't syntax.

The way I currently understad it is that datatypes have to extend SPComparator and AbstractSPStringComparator is a default implementation. However, I don't think URIs at the moment support it.

The ways it's implemented it would suggest another magical predicate like gt, lt, is, etc. and it would work on any supporting type.

Rob said...

I guess what I meant was translating the UML into OWL. UML is used a lot at work, and I thought it would be nice to have a tool for generating OWL from the UML class diagrams.

Paula said...

Yeah, you're right. I know what I meant, I just didn't say it right. :-)

I know about the comparators, because I used them to implement the type resolver. However, they're reasonably low level, and not really a string pool interface per se. At least, there's no interface which says "return all strings which start with XXX".

I'll be working on this shortly, but I have to say that I hate the "less than" and "greater than" magic predicates. The implementation is horrid. :-)

Thinking about it, The easiest thing to do would be a string resolver. It would be almost exactly the same as the type resolver... only with a prefix parameter. If only I could think of a clean way to pass in a parameter to a resolver...

Paula said...

BTW, I made a mistake about UML and cardinality constraints on predicates. AFAIK this can be done just fine. Sorry. I remembered this while swimming yesterday. :-)

OTOH, I don't think that constraints like owl:oneOf don't work. This is because UML doesn't handle the concept of an instance (at least, it doesn't to my knowledge).

Anonymous said...

Hmm... I'm not used to posting comments, and hadn't thought to until after posting a reply in my own blog. It links to the w3c technical note on the relationship between UML and RDF Schema, and points to one piece of work that tries to create a more complete XMI-in-RDF mapping.

Benjamin.

Paula said...

Thanks Benjamin.

I learnt about the XMI spec just the other day. I really should learn to do a Google before speculating about the existence of something on my Blog. :-)

However, I appreciate the link to the RDF. Thanks.