Tuesday, February 20, 2007

Interfaces

Interface design appears to fall into two categories.

The first category is where a set of functionality is to be provided, or is available in some way, and the interfaces are then built to provide access to all of this functionality. These designs are typically obtuse, and force a developer to jump through obscure hoops for their own internal gratification. 90% of interfaces fall into this category.

The second category is where the designer thinks about the task to be performed, and then thinks about how the code they would want to write in order to accomplish this task. Subsequent implementation may force some extra requirements into the design, but it is almost always possible to write implementations that largely fit the initial design.

Of course, the first category is built from the bottom up, and the second is top down. Sometimes it is necessary to use the bottom up approach, particularly when providing access to a pre-existing system, but the results are almost never pretty. Unfortunately, given the difficulty of many interfaces available today bottom up design seems to be the norm for designing interfaces. I want to name names here, but with the exception of Microsoft (MFC, or COM+ anyone?) it hardly seems fair to many of those hard working developers out there. (Actually, MFC doesn't seem to be a bottom up design. It's just full of weird inconsistencies where 6 similar-but-slightly different tasks rely on 6 completely unrelated mechanisms).

As an aside, when I say "interface", I'm not referring to Java interface definitions. I'm referring to the more general concept. These can be defined as Java interfaces, C++ headers, IDL files, XML descriptions, IUnknown querying, and more.

RDFS and OWL

Many interfaces I've seen for interacting with RDFS and OWL have had bottom up interfaces, often because they are trying to provide all of the functionality available in these languages. However, the result is usually very messy, and difficult to work with. For many of these interfaces, you really need to know OWL in order to use the interface.

However, the task that most developers what to achieve is often much simpler than anything that would require a complete knowledge of OWL. Typically, a developer will want to define two things: A model, and instance data.
  • The model will generally involve a taxonomy of classes, each with their own specific fields. It will describe properties on those fields, such as data types, lists, which ones are key fields, which ones are optional, and so on. It will also describe relationships between the classes, and possibly restrictions on those relationships.
  • The instance data will simply be a set of objects which each have a type defined in the model.
To be sure, all OWL interfaces out there allow all of this, but I personally haven't found them easy to use.

It should also be noted that OWL isn't the only way to do this. UML has done all of this for a long time now. This should be no surprise, as almost all features from each language (OWL and UML) can be mapped into a representation in the other. The exceptions are rarely employed and can be worked around (one of these exceptions is n-m associations in UML). However, UML is typically used statically at design time, and not dynamically at runtime. This makes sense, since UML is a closed world model, and a runtime system would need to allow temporarily incomplete systems while instance data was being built. OWL has a natural advantage in this regard.

RDFS/OWL Interfaces

When I needed a modeling interface, I made a conscious decision to avoid all of the OWL constructs, and only pick what I needed. I reasoned that the underlying language already would support any new required constructs, and trust that it would be possible to make sensible additions to the interface if any new requirements came along. My justification here is in my experience with interface changes usually being trivial, but modifying an underlying construct is often difficult or impossible.

Once I made that choice, my next step was to work out just what I wanted to do. The list was short, being comprised of the class definition, and object instantiation described above.

So what is the easiest way to describe each of these things? To me, a class definition is the name of the class, along with any inheritances it may have. It also contains a collection of fields. So a class definition should be a constructor which accepts a name, a list of other class definitions (or their names if I wanted to get into referencing classes before their definitions), and a list of fields. Fields would also require a name, a datatype (object or simple type), and some flags to indicate if they represented required data, a key field, and if they represented a list.

So was this approach useful? Well other than being verbose (having to describe all the fields, and then construct the class definition), it seems easy to use, and has been quite successful in the code we've used it in.

A more interesting question has been object instantiation.

I decided to take a leaf out of Perl here, where objects are just a hashmap, keyed on field name. This works quite well, and described a cheap way for me to get objects up and running. Wrapping the hashmap in a class that is given a copy of the class definition allows the data to be checked for consistency and completeness, and in some cases inferencing can be performed.

My only regret with this approach has been that it is verbose in a similar way to defining classes. The hashmap has to be created and fully populated, with each field taking its own verbose call for insertion. While easy, it isn't the way I would like to create these objects. Using the objects is similarly verbose, with all access going through get and put methods.

Thinking about it, the most obvious thing that I want to do here, is to simply create the object with an inbuilt language constructor. In the case of Java, this means a call to new, with appropriate parameters. This is what UML would have provided, but UML would have been compiled into Java source code before compilation. What I'm looking for here is dynamic creation of the class.

Bytecode Libraries

This is where my interest in bytecode libraries like ASM and BCEL comes in. Using these libraries it is possible to turn a class definition into a Java class, that can be instantiated with a custom class loader.

Make the custom class loader the current class loader, and you could theoretically use the new keyword. However, you'd have to cheat by doing a bait-and-switch, where you let the compiler build against one instance of a class, but have the class loader provide your class instead. OK, so this is an serviceable hack, but it's fun to know it's possible. Accessing fields isn't so easy though, and reflection is the only effective way.

A bigger problem is evolving class definitions over time. I've dynamically built classes in the past, but never tried to update a class after it has already been loaded once. I suppose the simplest way would be for each modification to get a new version ID that becomes a hidden part of the name, but that could lead to problems.

A better language to do this in is Ruby. Ruby already lets you define classes at runtime. More importantly, it lets you update them at runtime as well. I'm still a Ruby beginner, and I know nothing about the VM, so most of my ideas are just that, but it seems like a good idea.

I haven't had the chance to work on any of this modeling code for some months now, but I'm hoping I'll get the chance again soon. Depending on what seems most "natural" at the time, I may get to do some interesting things yet.

I should point out that most of my API is based on RDFS, with just a little OWL (InverseFunctionalProperty, Transitive, cardinality). While this sounds very restrictive, these few constructs provide a great deal of functionality. I'm looking forward to applying a few more.

5 comments:

Anonymous said...

you can do this trivially in ruby (and even groovy on the jvm) by utilizing their 'method missing' support.. so rather than making methods for each individual property, you let all property access bubble into the 'method missing' method, and dynamically dispatch.

you can then evolve the class at runtime at will..

take a peek at http://www.activerdf.org/, if you haven't already.. its ruby code that makes objects from an ontology. property access is kinda funky though, supplying namespace::localName on the other side of the dot. it'd be handy to be able to just refer to a property by the localName, but then you can have collisions between ontologies..

but then again, what's wrong with having some properties be 'first class', and available via direct bean-like names, and others via a generic get() that takes a full uri?

and, if you haven't yet seen it, http://www.betaversion.org/~stefano/linotype/news/99/ .. same vein of thinking.

Paula said...

Sounds like Ruby does what I thought it could. More incentive to learn it.

I looked at ActiveRDF briefly, but there were a few difficulties with it from my perspective.

First, most of my paid work is Java, so the ActiveRDF interfaces aren't usable from a Java client. I'm specifically after something I can provide to Java developers.

Second, it would be cool to put Mulgara behind ActiveRDF, but my Ruby skills are rudimentary at best. I just need to learn more Ruby.

Rob said...

Hi Paul,

(for Java) two words: Dynamic Proxy
or generally: Active Record (ie. Ruby).

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/reflect/Proxy.html

This is what powers RDF Beanz:
http://geko.svn.sourceforge.net/viewvc/geko/rdfbeans/src/java/org/chitty/rdfbeans/model/impl/RdfBeanImpl.java?revision=21&view=markup

checkout the RDF Bean Unit Tests:
RdfBean bean = session.get(RESOURCE_URI);
Foo foo = (Foo) bean; // no exception thrown
foo.getFooProperty(); // no exception thrown

http://geko.svn.sourceforge.net/viewvc/geko/rdfbeans/src/java/org/chitty/rdfbeans/test/JrdfSessionUnitTest.java?view=markup

I don't think my talk was very clear when I was in Chicago.

Hope this helps,
Rob

Alan Lovejoy said...

What you want to do is trivial in any dialect of Smalltalk or LISP.

Paula said...

Yes, but I'm not enamored of the efficiency of implementation for Lisp, and I don't know Smalltalk.

Besides that, this is a Java system, and I'm trying to provide accessibility to other Java systems. Lisp is inappropriate.