Tuesday, November 22, 2005

Projects
I've spent my last week and a bit on JNI. It was fun when I started (since I haven't done much C lately), but ultimately tedious. After all, I'm just writing a wrapper around someone else's library. Sure, there are some cute challenges, but it's not quantum physics (that reminds me... I really need to get back to postgrad physics at some point. I should finish this Masters).

In the meantime, I've been looking at a project for web services to keep my mind fresh. Or more precisely, I've been looking at another project when I'm not working or looking after the boys (yes, I'm averaging below 5 hours sleep per night). It's a project I've mentioned on and off for a while now, but I've decided to do something about it.

The principles of the project came about through a combination of several factors. I've had some experience with an application server (where I wrote the JMX code). I've also been working with OWL, from a theoretical point of view, an inferencing perspective, and using it for modeling in several systems. I've been working with UML modeling, and writing an OCL interpreter to manage those models at runtime. I'm also an engineer who really likes bit-banging as opposed to all this high-level abstraction stuff (though the high-level stuff does have its charms). Finally, I've had several interesting conversations with people who've helped me crystallize what I'm trying to achieve.

The Idea
The idea is to use OWL to describe a class that can be instantiated in Java. There are several aspects of OWL that don't make it ideal for this kind of modeling, but it is still possible to use it. This is demonstrated by the ability to map most UML features into OWL. Fortunately, the flexibility of RDF allows almost any conceivable type of annotation to be added to the class, filling in any areas where OWL is not up to the task.

What would be the point of this? Well the first things that comes to mind are Web Services (thanks to David for this suggestion). Currently, services can described in OWL-S. If a client does not know about a described service, then it can always try to model it (this technology is still in development). However, why simulate the model, when you can instantiate a class that meets the description of the model? This would perform better, and offer much more flexibility to the client system.

But How?
One approach for this would be to convert the OWL class into some implementing Java source code, write it to disk, and convert it into a class file with javac. I've never liked this approach. It is very slow, relies on the presence of the compiler and knowing where the compiler is, uses up disk space, and requires the entire file to be re-written for even minor changes. JSPs on Tomcat are a good example of this. Ever noticed how slow the pages are the first time you look at them? That's because the JSP is being converted to plain old Java, written to a source file, compiled to a class file, loaded, and finally run.

The way around this would be to have a compiler built in. This would avoid executing an external process. Then if the compiler could output to a memory buffer instead of a class file, the results could be fed directly into a class loader, without having to use the disk.

However, a normal compiler would still expect to work on Java source code, which is just text. This still leaves the system rather inflexible, requiring a full recompile for every modification. Ideally, I'd want a compiler that could work directly on the Abstract Syntax Tree (AST) of Java. This would allow for easy and faster modifications. Since compilers have to generate an AST internally, this would also make compilation faster, since the text to AST conversion could be skipped.

If the compiler is to be operating directly on the AST, then where would the AST come from? Normally the AST would come from Java text, but I'm trying to avoid having to repeatedly convert text into an AST. I'd like to build a class from OWL, so should I be compiling that into an AST every time? OWL is more structured that text, but it would still be a lot of work to repeat on a dynamic system.

Ideally, it would be possible to convert each of these into an AST, but to then persistently store the AST for manipulation and compilation at any time. At this point I realized that I have an RDF database (after all, I'm doing modeling with OWL), and this would be perfect for storing the AST. This started to open up new possibilities.

The system would involve storing a complete Java AST in RDF. While the schema definition will take a little while, there is nothing hard in this (the schema is not required for storage, but rather to understand the structure in order to implement the API for the AST). To get things into the schema will require something that can compile Java source text into this AST. There are several open source Java compilers, along with SableCC definition files for parsing Java source, so this should be reasonably straightforward as well. Compiling OWL into the AST is a different matter, but appears to be possible with a set of inferencing rules.

The final step is the transformation of the AST into class files. This is a well documented procedure, though one that I've yet to learn properly. I can always leverage an open source compiler's implementation, but I will need a good understanding of this process if I'm to customize it accordingly. Besides, I've been meaning to read the Java spec for years.

Once the class binary is generated, a custom class loader will let this class be immediately loaded and instantiated. This could be very dynamic, allowing infinitely flexible new classes, with methods customized at runtime. Building these classes from semantics documents like OWL-S means that the system can dynamically reconfigure itself to manage what it discoveries about the world.

AST
Pivotal to all of this is the AST. James Gosling once gave an interview about Java 1.5 (I can't find it anymore) where he talked about the inclusion of an AST API. He was also working on a project called "Jackpot" which provided this API. Obviously this never eventuated for Java 1.5, though it's been suggested for Java 1.6. So if I can't use an official AST, what should I be using?

Should I go with the internal AST of a working compiler, like Kaffe? Should I go with the AST given to me by a SableCC parser? I figured that standards are a good thing here, so I went looking for what other people use.

The one AST that seems to have the best penetration comes from Eclipse. This bothers me for a few reasons. First, it is still slow and occasionally crashes on my Mac (though recently it's been getting faster and more stable). Second is the steep learning curve there seems to be to get into the internals of Eclipse. Finally, when I looked at the structure, it appears more complex than the ASTs I've seen elsewhere (maybe it's somehow more complete?).

Anyway, I haven't coded anything yet, so I'm still looking.

Possibilities
Having an architecture that stores the AST, compiling data into it, and then emitting it into binary, has several other advantages. Obviously, it becomes easy to modify code programmatically, and it is possible to have a single system that can compile multiple languages into the one AST format (Java, Jython, or annotated OWL).

This kind of system also makes it easy to work backwards from binary to text. Existing class files can be decomposed into their AST, and an AST can be converted into Java source text. Jad already converts from class to source code quite successfully (I'm guessing it must use an AST internally), so a precedent has been set here. However, this system would provide extra functionality. It could take a class file at run time, decompose it, update the existing code (for instance, adding instrumentation to methods), and then reload the class. I've heard of systems which do this sort of thing (particularly for adding instrumentation), but not with an API to control the modifications.

Ontologies
Normally, I'd be horrified at the idea of letting programmers out with such a powerful API, but the purpose here is not to permit programmers to perform ad hoc modifications, but rather to modify code according to the model described in an ontology.

I think this idea offers some interesting options for dynamically implementing models, and also for using an ontology to describe a program which can then be built automatically.

Ontologies are an area of research that is still moving quickly. It's hard to know exactly what this would contribute to it, but I think it would be quite useful.

No comments: