Saturday, May 14, 2005

Work, work, work
This has been a week of resolvers, testing, rules engines, and Carbon on Java.

Where to start? This is why I should be writing this stuff as it happens, rather than as a retrospective, but better late than never.

Prefix Resolver
I backed out the changes that I made to the string pool for prefix searching, but I've kept a copy. It's all been replaced with simpler code that uses Character.MAX_VALUE appended to the prefix string. This seems to work well with the findStringPoolRange() method from the string pool implementations.

I got this working with the prefixes defined in string literals, which makes conceptual sense, but it has some practical problems. In order to find all elements of a sequence, it is necessary to match a prefix of When represented as a literal, this has to appear in its full form. However, if I use a URI instead, then I can take advantage of aliases, something that can't (and shouldn't) be done with string literals. This means that the prefix can be shown instead as <rdf:_>.

While the syntax for this is a lot more practical, it does cheat the semantics a little. After all, there is no URI of <rdf:_>, while there is a URI of <rdf:_1>. But the abbreviation is so much nicer that I'm sticking to it. I'm now supporting both URI and Literal prefixes, so anyone with a problem can continue to use the full expanded form as a string literal.

Backing the old code out, and putting in the dual URI/Literal option took longer than expected (as these things always do), so a lot of the week went in this, and the testing. However, it was worth it, as I can now select on a sequence. To do this I need a prefix model for the resolver (I'll also include the appropriate aliases here for convenience):

  alias <> as tucana;
alias <> as krule;
alias <> as rdf;
alias <> as rdfs;
alias <> as owl;
create <rmi://localhost/server1#prefix> <tucana:PrefixModel>;
The Kowari Rules schema uses a sequence of variables for a query, so I can use the prefix resolver along with the new <tucana:prefix> predicate to find these for me:
select $query $p $o from <rmi://localhost/server1#rules>
where $query <krule:selectionVariables> $seq
and $seq <rdf:type> <rdf:Seq>
and $seq $p $o
and $p <tucana:prefix> <rdf:_> in <rmi://localhost/server1#prefix>;
This has proven to be really handy for reading the RDF for the rules (which is why I wrote this resolver first). However, the real important of this is that I now have the tools to do full RDFS entailment. The final RDFS rule can now be encoded as:
select $id <rdf:type> <rdfs:ContainerMembershipProperty>
from <rmi://localhost/server1#model>
where $x $id $y
and $id <tucana:prefix> <rdf:_> in <rmi://localhost/server1#prefix>
into <rmi://localhost/server1#model>;
So all I need now is the rule engine to do it. I spent the latter part of this week on just that. It looks like it's the parsing of the rules that takes most of the work, as the engine itself looks relatively straightforward. I'll see how this comes together in the next week.

Meanwhile, I wrote a series of tests for the prefix matching (which I've mentioned several times in the last few weeks), and after many iterations of correcting both tests and code, I have everything checked in. Whew.

You'll note that the "magic" predicates and model types are all still using the "tucana" namespace. We really need to change that to "kowari". I'm a little hesitant to simply make wholesale changes here, as it will break other code. On the other hand, it is probably a good idea to do it sooner rather than later.

Does anyone have any objections to this change?

Checking anything into Sourceforge still needs the full test suite to be run first (this is even more important now that we rarely see each other in person). During the tests I noted that there are some warnings about calls to walk and trans. In each case they are complaining about variables rather than fixed values for a resource.

In the case of walk, then this will need to be addressed by allowing a variable, so long as it has been bound to a single value. This will be necessary in order to query for the node where the walking will start.

For trans then a bound variable of any type will need to be supported. This will make it possible to select all transitive statements in OWL.

However, I'm aiming to have the basic rule engine with RDFS completed within the month. While these changes are vital, can I justify doing them now? They are not needed for RDFS, so perhaps I should postpone it. I'm just reluctant to put it off, as the change is important and should only take a few days to do.

Maybe I should put my other projects on the back burner and start using my "after hours" time to get this done instead? That'd be a shame, as I need some sanity in my life.

As my code for wrapping the Carbon classes became more complete and refined, I decided to put it up on SourceForge. Since it was just supposed to wrap the Carbon metadata classes for Java, I called it jCarbonMetadata. However, I'm starting to wonder if the scope is shifting.

The latest part of this project has been to implement the query class. To properly write the javadoc for this code, I really need to duplicate Apple's documentation, but I don't think I'd be allowed to do that. However, I'm not sure about this. I'm providing access to Apple's classes, so it makes sense to use Apple's docs. Writing my own descriptions sounds like a recipe for disaster, particularly as I'm often doing this stuff after midnight. Linking isn't very practical either.

I've been making progress anyway, writing wrapper functions around many of the MDQuery functions. Just when I thought I was near completion I discovered a method that threw my whole plan into chaos.

The initial idea was to write a wrapper around the three relevant MD classes: MDItem, MDQuery, MDSchema. The methods for these classes all return strings and collection objects from the CoreFoundation framework. Rather than provide a wrapper implementation for each of these objects, I've been converting everything to its java equivalent and returning that instead. Not only does this mean less work (as I have fewer classes to implement), but it makes more sense for a Java programmer anyway, as the Java classes are the ones they need to be using.

This was all going well until I got to the function called MDQueryCopyValuesOfAttributes. This function returns a CFArray of values. I was expecting to convert this into a java.lang.Object[] until I read the following:
The array contents may change over time if the query is configured for live-updates."
So if I just copy the results into an array I'll lose this live-update functionality!

There are two approaches I can take here. The first is to just have a static interface, and wear the loss in functionality. The CFArray has to be polled for a change in size anyway, so why not poll the function call instead?

The second approach is to wrap a CFArray in another Java class. While this would take more work, it is a more complete solution.

For the moment I'm hedging my bets, and have written two methods. The first method will return an array, while the second will return a dynamically updated MDQuery object. I have a stub class for this at the moment, though I don't expect to flesh it out until everything else is done. I thought that the method that returns the Object[] might just use the CFArray method, and have this class provide a toArray() method, but in the end I decided that I can cross the JNI boundary less often if I provide separate methods in C.

If I'm going to look at CFArray I figured that I might as well find out what else it offers. Most of the functions are as you'd expect, but there is also a function called CFArrayApplyFunction for applying a function to each element of the array (this brings flashbacks to the STL for me). That seems relatively useful, so I've starting thinking about the rest of the Core Foundation (CF).

Apple have explicitly said that Carbon is not going away, and indeed that there are some things that can only be done using Carbon. With this in mind, maybe there is room to implement the whole of CF in Java? I don't think any of it would be too hard, though it would take a lot of time.

This made me wonder if I should really be converting all of the CF objects into their Java equivalents. Perhaps I should implement each of these objects in Java, and return those instead. Then they can be converted into the Java classes by appropriate pure Java functions.

Considering this a little more, I decided that even if I implement all of these classes, then I'd still do conversions in C, through a JNI interface. There are two reasons for this. The first is that some objects must be converted in C, such as numbers, and for this reason most of the conversion functions have already been written. The second reason is to minimise crossing the JNI boundary. Converting objects like CFArray and CFDictionary in Java would mean repeatedly calling the same JNI access functions.

As I said earlier, I'll just stick to the three main classes, and consider the remaining CF classes when I get there. I still have other projects on the go, so I'll just have to consider the importance of these things once I get there. For the moment I just want this code for a Kowari resolver.

I'm sure there's other stuff to talk about, but I'd better get back to work.

No comments: