Working notes: 2004

Friday, December 10, 2004

RMI Bug
TJ approached me in the morning with a new bug. He claimed that there was a regression bug, where creating a model starting with "rmi://localhost/server1#" threw a NonRemoteSessionException.

My first thought was that this was a problem due to the create command not going through the resolver interface (like it is supposed to). I should ask SR and AM about that, as the interface currently lies. It also means that we don't get model creation when new resolvers are created, which drastically reduces some of the effectiveness of the interface. Anyway, it turned out that this thought was wrong. Yes, model creation does not go through the resolver interface, but that was not the root of the problem.

When I tried to duplicate TJ's problem I found that everything was working for me. I reported this back to TJ, and he showed me that one of his 2 bug reports had mentioned that he was using the web UI (I assumed that both reports were the same, and looked at the report which didn't mention the web UI). I never use the web UI, so it hadn't even occurred to me to try his query this way.

Just like the descriptors, the web UI runs in the same JVM as the server. This means that any RMI connections it makes will be back to itself. I kludged this for the descriptors by allowing RMI connections which use the canonical host name (as these won't loop anyway), however this wasn't working for the web UI since it could use any host name, depending on the user.

I was stuck on this for a little while, as I had no idea how to distinguish a web UI connection from any other. Even if I could work it out, then it would still be using the wrong server name. That second problem made me realise that I needed to rename the server before the model could go through. This was already being done within the resolver framework, but it now needed to happen when creating a model as well. This code is in ItqlInterpreter.

While I'm at it, ItqlInterpreter is a terrible place for this stuff. This is the class for taking parsed iTQL and dispatching it to the correct modules. Having it do the work of creating a model seems to be inappropriate, and another argument for getting the resolvers to do it for us.

All the same, I used similar code to the resolvers to look up ServerInfo if it exists, and use the information found in there to rename the server. Unfortunately, the list of acceptable server names was not stored in there, but was instead determined by Database. So I modified this class to set the aliases on ServerInfo once it was determined. Because Database and ServerInfo are in different packages I had to make the setter method public, which I didn't want. The only thing I could do to mitigate this is to make sure that this setter can only be called once.

I was supposed to leave early for our Christmas party, but I ended up staying almost until 5pm before I had this working.

Thursday, December 09, 2004

Tests
Not much today. Mostly a lot of tests and confirming that things were running the way they are supposed to. Since the tests take so long at the moment this seems to take forever, and it is hard to do much else in the meantime (except, perhaps to read).

I started to look at SOFA, but really made little headway.

Everything seems a little slow at the moment. It must be the wind down before Christmas.

IEE
I went to the IEE Christmas Luncheon today, which was a nice diversion. I suppose I go along out of a sense of obligation, as there are very few people my age who go along, or even bother to be a member at the moment. Fortunately I sat next to a really nice guy named Teck Wai, who just a little younger than I am (or I assume he was. I often mistake Chinese people for being younger than they really are).

Since the lunch is also the AGM, the new committee was elected, and I took up a position again. It's not too onerous when you're not on the executive, and you can actually get something done when you get involved. While I was at it I convinced Teck Wai to nominate as well. Another friend of mine seconded the nomination the moment that he saw that I was suggesting a candidate who was aged under 40. :-)

The IEE is keen on expanding their involvement in IT, as many members like myself work in it. I should work on organising a few more events which would be of interest to younger members (like myself) as the main relevance of institutions like this is to introduce professionals in the same area to one another. I'll see what I can come up with, otherwise I may find myself dropping my membership due to lack of others' interest.

Wednesday, December 08, 2004

Catching Up
Back from the conference today, and I spent quite a bit of time catching up on email. One of these emails was from ML about how to traverse an RDF list in iTQL. Curious about his use of subqueries I had a go myself, and immediately found an exception.

It turns out that if a subquery is doing a walk using a blank node as the starting point, then Kowari will throw an exception as it tries to localize the blank node. I know that this has worked for me before, so it has left me a little confused. I had to go and meet Bob at this point, so I emailed AN about it.

Masters
Things went well with Bob, and his recommendation at the moment is to stop reading for a bit and concentrate on writing my confirmation. He suggests that if I have any "holes" in my knowledge in the literature review, then I should leave an annotated space for the moment, and come back to it later.

Another suggestion was to write a semantic matrix for the top few papers. This means writing a list of all the things of interest to me, and then describing each paper according to each criteria. Sounds useful, so I'll be drawing something like that over the next few weeks.

SOFA
There wasn't a lot of time at the end of the day, so I just had enough time to get the latest SOFA version and put it into a local CVS project. Creating a CVS project was new for me, and seems a little clunky, but after a discussion with DJ I agreed that it was probably appropriate to create projects that way. The big issue is that we keep a lot of things in separate projects around here rather than separate modules in a single project, which means that we do a lot more project creation than is really common.

I want to write about yesterday's Evolve conference, but I'll have to come back to it.

Monday, December 06, 2004

Tests and SOFA
Quiet day today. With the latest release happening I was intent to test TKS as much as possible. SR had been having trouble, so I also picked up a clean checkout to confirm that all was well (which it was).

I discovered what had happened to my file modifications on Friday evening. I'd done everything required, but I hadn't removed the org.kowari.resolver version of the directories. So while all the new stuff was correct, the old code was also there. I'd been in such a hurry to leave on Friday that I failed to notice that the files that "lost" their changes were actually in the wrong area. TJ and SR found this problem, as the system was trying to compile these extra classes, and kept giving them errors.

In the meantime I finished reading the SOFA introduction and design whitepaper, and have now started making my way through the API. Next, I'll have to look at KA's implementation of the net.java.dev.sofa.model.OntologyModel and net.java.dev.sofa.model.ThingModel interfaces.

Example Rule
It turned out that the owl:inverseOf statement from OWL Lite hadn't had the entailment iTQL documented for it. This was pretty easy, and I'm surprised that it wasn't already done. However, I updated some of the example RDF, and wrote the iTQL along with a description for it. Getting everything right and tested can take a little while, so by the time I'd finished GN had gone home for the day, meaning it won't make it into the documentation this time around.

This means that the only OWL statements which aren't documented with iTQL are from versioning. Theoretically this can involve inferencing as well (is this document which is compared to be compatible with another document actually consistent?) but we won't be considering that in the first instance. Well.... maybe TJ wants to, but I won't be. :-)

Friday, December 03, 2004

Remote Moves
Today was spent moving the remote resolver out of Kowari and into TKS. This means that distributed queries will now be restricted to TKS only. A distributed query is one where models from more than one server are all used.

The reason for the move is because the remote resolver was never supposed to be in Kowari yet. The only system to have distributed queries in the past was TKS. With the new resolver framework this functionality is being performed by the remote resolver, so this resolver is only supposed to be in TKS.

TKS is built on top of Kowari, meaning that as Kowari changes, so too does TKS. With the recent release coming up, Kowari was moving quickly, and I needed to make sure that the remote resolver was working with the latest build. To create the resolver in TKS would have meant updating TKS all the time, which would have become to expensive to get any effective work done. So I built it all in Kowari, with the intention of eventually moving it all.

I'll confess, it was in the back of my mind that maybe someone would forget about it and it would accidentally make it into the Kowari release, but unfortunately that didn't happen. :-) Not to worry. As TKS picks up new features (the so-called "value-adds") then older features like this will eventually make it back into Kowari. OTOH, maybe someone out there will really want this feature and do it themselves. After all, it is probably the simplest of all of our resolvers. I'm tempted to do it myself, but somehow I think that would be inappropriate. :-)

Be that as it may, the day was spent moving this stuff. That meant changing package names, removing the open source licence, changing config files, removing from one part of CVS and adding to another.... the list went on.

By the end of the day everything was working, and I went to do one final test. However, the build failed to compile, and when I looked, all of the source files in TKS were back to their original (Kowari) form. I don't know how this happened, but did not take too long to fix. The real work had been in the configuration files and the build scripts, so I had it going again in about 5 minutes.

While testing the changes, I read more about SOFA and KA's implementation of it in Kowari. I haven't quite finished the documentation yet, but I've covered most of it.

Thursday, December 02, 2004

Note
I'm too tired to proof read this (or yesterday's). I'll just have to do it in the morning. In the meantime I'd be grateful if you'll skip typos, and try to infer what I really meant when I write garbage. :-)

Final Tests
TJ checked in a lot of things the previous night, so he was keen for me to do a full set of tests again. I really needed to run them again anyway, as my machine had shut down over night for some reason.

This latest test showed up a new problem with the filesystem resolver, but this did not look as simple as the I'd seen yesterday. I showed it to ML, who recognised the problem and assured me that it was not something that I had broken. I didn't think it was, but you never know!

With everything passing, I was finally able to check this code in. I wanted to commit the files individually, so I could annotate the commit for each file appropriately. This meant that I had to carefully choose the order of commits, and try to do it when everyone else was out, just in case I made a mistake in the ordering, and they picked up a half completed set of files.

SOFA
While the tests were running I was able to read a lot more about SOFA. It certainly has a lot going for it. The main shortcomings that I see are that it does not scale very well, and there are a few OWL constructs that it cannot represent.

In the case of the latter, this is not really a problem, as most of the things that it can't do are OWL Full, with only a little OWL DL missing. For instance, there is little scope for writing relations for relations. Restrictions do not seem to cover owl:someValuesFrom or owl:complementOf, and unions on restrictions are not covered at all. However, where SOFA does not permit certain OWL constructs to be represented, often there is an equivalent construct which will suit the same purpose.

The scaling issue is really due to one of SOFA's strengths. SOFA manages to keep all of it's data in memory, such that it knows what kind of relationships everything has to everything else. Our RDF store scaled much better, but there is no implicit meaning behind any of the statements. As a result, modifying anything in a SOFA ontology results in consistency checks and inferencing checks being done quickly and easily. To do the same in RDF means querying a lot of statements to understand the relationships involved with the data just modified.

So while SOFA won't apply well to a large, existing dataset, it works very well with data that is being modified one statement at a time. It's a nice way of dealing with the change problem that I've avoided up until now. Experience with this should also help to apply similar principles to changing data in the statement store. Similarly, it may be possible to apply some SOFA inferences on our data be using appropriate iTQL, making the SOFA interface more efficient.

One way to make SOFA work with larger data sets is to serialize an ontology out to a file, and then to bring it back in via SOFA, but this is not very efficient. For this reason, the need to write a proper rules engine has not been removed. I had been wondering about this when I discovered that SOFA did some inferencing.

Sub Answers
Today TJ discovered a problem with some answer types running out of memory. This occurs when an answer is serialized for transport over RMI. The problem is that an answer might have only a couple of lines, but those lines contain subanswers which are huge.

When I serialized answers for RMI it occurred to me that I didn't really know how large a subanswer could be. I initially worried that a large enough set of subanswers could be too much to be handled in memory. However, I couldn't see subanswers getting too large, and so I went ahead with what I had.

Never commit code when you think that in some circumstances there could be a problem with it. I know that. Why did I choose to forget it?

After a few minutes of thought I came up with a solution. The RemoteAnswerWrapperAnswer class determines that how to package an answer for the network. This decision needs to be replaced with an Answer factory. The factory then makes a choice:

If the answer has subanswers as its rows, then return an RMI reference (higher network overhead, but no memory usage).
If the answer contains data and is less than a configured number of rows, then serialize the subanswer.
If the answer contains data and is larger than a configured number of rows, then use a paged answer object.

This factory needs to be used recursively as subanswers are traversed. This will result in the large chunks of data at the bottom of the answer tree getting serialized effectively, while the tree itself will be traversed with RMI. The result should handle any amount of data. It shouldn't even take too long to implement.

Remote Resolver
One of our "value adds" (I hate that term) for TKS, is to support distributed queries. All queries may be done to remote server, but only distributed queries can refer to more than one server in a single query.

These distributed queries now get handled by the remote resolver. When I worked on this resolver I kept it in Kowari, but now that it works properly it has to be moved to TKS. As new features come into TKS, they may reduce the "high-level" value of the remote resolver, and hence allow it down into Kowari (presuming someone else hasn't rewritten it already - after all, that's what open source is about). But for the moment, it has to stay a TKS-only feature.

AN had been looking to move this code, but was waiting until I finished with the RmiSessionFactory code. However, he was having a little difficulty with it, so now that I'm done with the looping bug I've been asked to move the remote resolver myself.

Other than changing the package names of the code, the only real differences seem to be in the Ant build scripts. By the end of the day I'd managed enough that TKS will now build the remote resolver, but I have not yet run all the tests on it.

In the meantime I tried the tests on Kowari now that the remote resolver is gone. The first thing that happened was that many of the Jena tests, and all of the JRDF tests failed. I agonized over this for a while, but then I realised that I'd been doing the TKS build while these tests were running. According to AN, a TKS build can briefly start a server, which would definitely conflict with any set of Kowari tests which were being run at the same time.

I'm now re-running the Kowari tests, and I have my fingers crossed that when I get there in the morning they will have all passed. Then I just have to see how well (or poorly) the TKS tests run.

Wednesday, December 01, 2004

Class Paths
Well it was nearly working today. The tests which failed seemed unrelated to the changes I'd made, so I was initially quite confused. A little logging eventually showed the way.

The problem was occurring during the constructor to RmiSessionFactory where it gets the URI of the local server from EmbeddedKowariServer via reflection. The confusing part was that it was giving a ClassNotFoundException for the class "SimpleXAResourceException", which at first glance appears to be completely unrelated.

Coincidentally, it was less than a week ago that I was having a conversation with DM about just this. Even though I only needed a simple static method from EmbeddedKowariServer, the class loader does not know this, and so attempts to load up the entire class. This includes having to recursively load all the classes for the return types and the exceptions. The SimpleXAResourceException was being thrown by the startServer method, and since this class wasn't available, the reflection code for this class failed.

There were two approaches to this problem. The first was to make sure that all referenced classes are available. However, this is fraught with difficulty for two reasons. The first is that I'll only discover which classes are needed at runtime. So even if I figured out where I needed to add SimpleXAResourceException, I could simply end up with a report on the next class which wasn't found.

The second problem is that it becomes difficult to know that I caught everything. In some instances the classes are all available and everything runs flawlessly. In other parts of the code some classes are not available. I don't know where or when the classpath changes, and while I might be able to get it working for every code path run in the tests, it's always possible that a new type of usage will call this code again without some required classes available.

The other approach was to factor out all the static information from EmbeddedKowariServer and put it in a simple public class specifically designed for handling this information. This works much better, as it does not have any dependencies on non-java packages. It also has the nice effect of putting all the server configuration info into one place.

The class I built for this was called ServerInfo. All methods and variables on this class are static. The getter methods are public and the setter methods are package scope, the intention being to only call them from EmbeddedKowariServer.

As usual, the tests to make sure all was well took a very long time to run on each occasion. In the meantime I used the opportunity to learn more of the SOFA API.

One little hiccough that I encountered was with the "filesystem" resolver. Fortunately, it turned out that it was just a JXUnit test that was getting back XML which differed from what it expected. The problem was that someone had checked in their own computer's name hardcoded in the file. I checked the CVS log to see who was the culprit, and I discovered that the problem had already been recognised and fixed.

With everything apparently going, I did a final CVS update and started the tests again for the night. At that point, the fact that it was all working at last, and a headache that I'd developed in the meantime convinced me to leave a half an hour early. :-)

Tuesday, November 30, 2004

Local Sessions
My idea to put LocalSessionFactory in with RmiSessionFactory didn't work out quite as I'd planned. The classes associated with RmiSessionFactory were to be found in server-rmi-base-1.1.0.jar, which was really the wrong place to put the local session factory classes. So I made sure that server-local-1.1.0.jar was included everywhere that server-rmi-base-1.1.0.jar was. However, this still didn't prevent the ClassNotFoundExceptions.

I tried placing this jar into numerous places without any luck, so in the end I started putting the LocalSessionFactory classes into different jars. This didn't work either. I spoke to various people who ought to have known, but no ideas were forthcoming. I guess this explains why TJ wants to change the classloader framework!

Finally, I was frustrated enough to jut put it all in server-rmi-base-1.1.0.jar. This eliminated the ClassNotFoundException, and left me with an exception relating to unknown transaction phases.

At this point I took a step back to re-examine what I was trying to do.

When the original code needed a session factory, it would always get an RmiSessionFactory, which wrapped a RemoteSessionFactory, which wrapped a local session factory. The local session factory is configurable, but at this stage it is always a Database object (Database meets the SessionFactory interface). So if I wanted to avoid RMI altogether, all I needed to do was get hold of a Database object, just like RemoteSessionFactory does.

The Database object is a single object created by EmbeddedKowariServer (well, it's actually created by the ServerMBean that EmbeddedKowariServer creates). The RmiServer class uses EmbeddedKowariServer to find this object, and gives it to the RemoteSessionFactory constructor. So if I wanted to use a local session factory, all I should need to do would be to get it from EmbeddedKowariServer.

I tried this, and again it didn't work. This time the exceptions all said "Unable to commit", and the database files had not even been created on the disk. I discussed this with DM, but we were unable to work out exactly what was happening. All I can guess is that the RmiServer somehow initialized the database before using it for the first time, but I couldn't see where that happened.

I'd now made it to the end of the day, with very little progress. At this point I realised that about the only thing I was trying to do was to avoid using RMI with descriptors. All other "clients" would occur in another JVM and RMI couldn't be avoided anyway. So it seemed reasonable to use RMI for descriptors as well.

To do this I just needed to change the test in RMISessionFactory so that it would ignore connections to the current server if the URL for the server was exactly the URL used to find the server. This is safe, as the looping bug only occurs when connecting to a server using a name that the server is not familiar with.

Once this was done I set the tests running, and crossed my fingers that all would be well. I'm guessing that something will go wrong, but hopefully I'm converging on the solution.

Monday, November 29, 2004

Kowari and JDK 1.5
I forgot to say... Last Thursday night I got Kowari building and running with JDK 1.5. Maybe I should submit those XMLC patches.

Unfortunately, I couldn't check my changes in, as it now has problems with JDK 1.4. This is mostly because the XMLC interfaces now require certain methods to be implemented, which the JDK 1.4 system doesn't have.

When I get some time I'll have a go at bringing in the latest W3C libraries to see if these will override the built-in libraries with the latest interfaces.

OSX and MIDP
I've also been wasting some time looking at developing for my new phone with MIDP on OSX. It looks like I can do it, so I'll have to spend some time brushing up on J2ME. All I really want is a calculator, and I can probably find one if I go looking, but where would be the fun in that?

In the meantime, I've discovered that the phone has numerous bugs. For a start, it can't render Slashdot. It starts to show some headers, but when the page is complete all that is visible is a very large page of whitespace. The browser also makes it impossible to select some links on certain pages (for instance, the Sony Ericsson page!).

The other big bug is in the email client. It supports SSL, so after fiddling with the configuration for a couple of days, plus some support from my phone provider, I finally got it to connect to GMail. If there is no mail there, then it all works fine, connecting to the server, and telling me that I have no new messages. However, if the smallest text message is present, it will start to download it, and then hang. After a few minutes of complete unresponsiveness it starts to flash the screen and then reboots. Yuck. At least I can make phone calls with it. :-)

Discussions
With TJ back we spent a little time talking about the needs of the customers he spent the week with. Most of it seems to come back to the work we are currently doing, but I think it has focused TJ's ideas a little.

We had a chat this afternoon about the OWL resolver he would like to see. I was initially worried about this, but in the end I've been mollified. Basically, it will return statements from a model of inferenced data, combined with the model of base facts. The techniques I'm using to create an inferred model will be essentially the same, but now there will be some extra information on how the models relate. This will allow many operations occur automatically whenever the OWL resolver is used. I'm all for this approach, so I'm happy to go ahead with it.

In the meantime TJ is very keen on the idea of the "change problem", which I've been avoiding up until now. While I know how essential it is, I think I've talked him into letting us get inferencing working correctly first. I think this is important, as we need to learn to walk before we can run. While I've done no work and little research on the problem so far, I have been thinking about it, and I can think of several things we can do to make this work more efficiently than the brute-force approach. However, I also think that we will learn some lessons as we implement the brute-force method. So I believe that putting off this design for the time being will allow us to get a working (though slowly updating) system in a shorter timeframe, and allow the final version to be better designed and faster than if we attempted it from the outset.

Looping Bug
I was back onto this after a weekend away from it all.

An initial discussion with AN revealed that the JRDF and Jena code actually use the LocalSessionFactory class, and not RMI (yay, I was beginning to lose faith). This ran counter to SR's earlier assertion that the code which returned a LocalSessionFactory was no longer in use. AN seemed initially skeptical that no one else used anything but RMI, but I eventually convinced him.

So this left me in a position of trying to create a LocalSessionFactory and returning it from the findSessionFactory() method when the NonRemoteSessionException method is thrown from the RmiSessionFactory constructor. Writing the code for this was the easy part. Getting it to work in the tests was the difficult part.

Unfortunately, no one seemed to have access to the LocalSessionFactory class. I spent more than a little time looking through the myriad build scripts trying to find where I should add it, but with little luck. On each occasion when I thought I'd found it, I added the needed classes to the appropriate jar and re-ran the tests. On each occasion I discovered that I hadn't found the correct place.

I really beat my head against a brick wall on this today, with little progress. Only as I was leaving did I think to look for the jar file which implements RemoteSessionFactory as this must also have access to the local data store. I thought I might get to it tonight, but I've spent quite a bit of time catching up on last week's blog, so I'll be doing it in the morning instead.

Busy, busy, busy
Well I've just been through a rather full week, both at home, and at work. As a result, I didn't log on to blogger once. Darn.

So what did I do? Well if only I'd been blogging as I went I'd have an easy answer to that. Since I haven't, I'll just have to write a few loose and disconnected notes. It won't be a complete description, but it will give a vague idea of what I've been on about.

I've been pretty slack about this blog in the last few weeks. It is obviously a very busy time of the year from a social context, and I'm tending to go to bed earlier at the moment due to training (I do a lot of blogging at night).

Actually, the training is going quite well, and I had a great race on the 21st. I was aiming for an hour, and was initially disappointed that I got 1:00:18, but then I discovered that the bike leg was 1.8km too long (they wanted to put the turns at appropriate places in the road), so I'd have made the time if I did the advertised distance. I have a longer race on this Sunday, and the last race has inspired me to work extra hard... which has led to me blogging less.

Sorry, I don't usually write much about training, but I've been really happy with my progress lately. Oh, and for anyone who needed proof I was at Noosa, you can find some pics at Supersport images if you look for BIB number 2426. (You probably can't see it, but my bike has a flat front tire - very frustrating).

I'll try and make more of an effort with blogging this week.

In the meantime, here are a few notes on the past week:

Monday - 22nd
TJ is away training a client this week, so things seem a little more relaxed today. I started the day looking at doing more of the "OWL inferencing with iTQL" documentation, but by the end of the day I was back onto the looping bug.

Just to describe this bug again...

DM has put some effort into divorcing model names from locations. Consequently, it is now possible to move models from one machine to another. However, we still don't have any way of separately describing locations and names, so model names still have their location in them. The difference is that the model can be moved and the new location in the name will reference it correctly. (In the past the whole name had to be the same, including location. This is why models couldn't move from one machine to another).

The problem with the current code is that it expects models to be restricted to a specific set of names on the current machine. Hence, a model which is called rmi://machine.domain.com/server1#foo must be referred to by that name. It can't be called rmi://machine/server1#foo even though that machine name resolves to the same address.

This problem manifests when a query is made on a name that the server does not recognise. If the name rmi://machine/server1#foo is queried, then the request will be sent to the address for "machine". Since this server does not recognise that particular name (it expects the names machine.domain.com or localhost) then it forwards the request on to the server named "machine". Not only do we get infinite recursion, we even do it through RMI! :-)

I had thought that I could try and compare the hostname in the model name. However, this isn't a complete solution, as it is always possible to miss an IP address on the current machine, or a DNS server could hold a name for a host which that computer has no knowledge of. As a failsafe, the host has to be queried for its "servername" and this will get compared to the servername on the connecting client.

The servername is found in the startup class EmbeddedKowariServer, which determines the name from the user configuration, or else calculates a default name. However, as the program starts the EmbeddedKowariServer class is unable to see the RmiSessionFactory class in its classpath, as it isn't until later that the appropriate class loaders are set up.

Similarly, the RmiSessionFactory class is unable to see EmbeddedKowariServer in its classpath while compiling. However, I experimented a bit with reflection, and discovered that EmbeddedKowariServer was available at runtime. That took me until the end of the day.

Tuesday - 23rd
I started out by modifying the session interface so it could test both ends of an RMI connection in the constructor of RmiSessionFactory. If they compared equal, I initially expected that I could silently change the RmiSessionFactory to return new local sessions instead. However, I had difficulty getting access to a class which would do this, and after some discussion with DM and SR I changed tack. Incidentally, all of my suggestions on how to proceed were considered to either be inappropriate, or else impossible by both SR and DM. However, some of what I'd already done was working fine, which invalidated some of the "impossibility" arguments I was given. :-)

When the RmiSessionFactory constructor detects that the client and server are on the same machine it now throws a new exception called NonRemoteSessionException. I changed the SessionFactoryFinder to pick up this exception, and fall back to using a local connection. However, even though the code for a local session was all there, the class paths were not. Asking SR about it, he said that this code would never be run anyway, and that it should probably be excised (he was wrong, but I didn't discover that until the following Monday).

This led to a discussion which resulted in two things. The first is that I needed to try to recognise the name of the current machine, and change the requested model wherever possible. That means picking up all the known IP addresses, and comparing these to the results of a DNS lookup on the host name in the model name. This would catch almost all problems. The second part, was to continue to test the client and server for their names, to prevent loops where the first method fails.

Wednesday - 24th
The first half of this day was spent at the WISE conference being held here in Brisbane. My school organised my attendance, but I could only afford a small amount of time at it. Fortunately, I got to see my supervisor delivering a paper on his current work, and it explained a couple of points for me. Most of the other presentations were a bit lacking.

The worst part was that many presenters were not there. In some cases they had been denied visas, but in quite a few cases presenters had shown up for registration on Monday, and had then spent the rest of the week at the Gold Coast instead. I have no objection to them not attending for most of the time, but it just seems rude to not show up for your own presentation.

I also learned something surprising about conferences. Bob wasn't impressed with a few of the presentations, and so I asked if the presenters were ever given feedback on their performance. I was surprised to learn that this never happens. The closest thing to feedback is the number of questions posed to the presenter, but this could indicate numerous different things, so it becomes hard to judge.

As a future presenter, this has its good points and bad. As a plus, its nice to think that I'd gain credit from the school simply for presenting the paper, and I can be as bad as I like. Indeed, I can even leave a conference feeling really good about myself, no matter how I performed (who wants to have their ego deflated?). However, the negatives are probably bigger. Unless I have some brutally honest friends about I'm unlikely to get honest feedback, which will make it very hard to improve. Lets face it, no one is great on their first presentations, so I will be looking for every avenue for possible improvement. It's just a shame that the obvious one is not there.

After returning to the code I started on the methods for renaming models to a canonical form. I continued some of it in the evening, but it was not quite finished on this day.

Thursday - 25th
It took me a few hours, but by lunch I'd completed the methods to return canonical forms of a model name. Model names had to go back and forth between local and global space, which made it a little less straightforward, but it seemed to work well.

The next job was to find all points in the code which attempted to use a model name, and make sure it was replaced with the canonical version. This mostly happened in DatabaseSession but when I'd finished I did a little grepping and discovered that it was needed in some new classes called SetModelOperation, RemoveModelOperation and ModifyModelOperation. The appearance of these told me that each operation is starting to be factored out into its own class. Later in the afternoon I was having an implementation discussion with AM and discovered that he is the culprit. I'm expecting to see more operation classes soon.

By the end of the day I was reasonably confident that this bug was nailed. Everything appeared to be working, so all that remained was a full set of tests to be run in the morning.

Friday - 26th
The tests failed. Doh!

Unfortunately, our full set of tests now takes some time to run, so this stage can get very slow. I got to spend a bit of time reading documentation while tests ran. It can be very frustrating.

I finally tracked the problem down to the Kowari Descriptors. These are essentially clients which run within the server. When they try create a model they explicitly ask for an RMI connection to the current server and send the request out on that. Since the client and the server are on the same machine, the RmiSessionFactory constructor throws and exception, and the test fails.

So what went wrong? Well it turned out that I had a lot of my code for handling the problems intercepting the Resolver layer. I was under the impression that all requests were now supposed to go through this layer. However, the ItqlInterpreter does not do this. Instead, it obtains a session factory directly, and asks for a new session from it. Since the model will always start with "rmi://" then the factory will always be and RmiSessionFactory. I discussed the problem with SR, and he explained the code which did some of this work (to the annoyance of KA, who was trying to work in the same room. Sorry about that KA).

SR suggested not setting a session to a server, as this was supposed to leave the session connected to the local server. I was skeptical at this, as I couldn't see how the local server was ever set, but I gave it a go anyway. Unfortunately, this simply resulted in a lot of errors referring to a NullSessionException.

Closer inspection of this code revealed that ALL connections were being made with RMI, and only the RemoteSessionFactory is able to talk to a database on the same machine as itself. Since most clients are in a separate JVM this is not a big issue, but it certainly causes problems in some circumstances. More discussion with SR led me back to Monday's (now deleted) code which attempted to get a local session when an RmiSessionFactoryconstructor failed. However, that had failed on Monday, so now I needed to figure out how a RemoteSessionFactory was able to get these sessions for itself.

And that got me to the weekend (now that I know its OK, it gives me some kind of perverse pleasure to start sentences with a conjunction. Not really sure why. Maybe it's because I know I'll be annoying someone out there who thinks that you can't do it). :-)

I was no where near as productive as I'd like, and I hate that. In particular, I'm frustrated that haven't been able to return to working on OWL code. At least it will give me incentive to get this done as quickly as possible.

Friday, November 19, 2004

XMLC
I mentioned that I would talk about Barracuda and JDK 1.5, only I forgot.

The main problem is that Java now includes updated org.w3c.dom packages, which now implement DOM Level 3 (JDK 1.4 only did DOM Level 2). The new code includes numerous extra methods, and a few new subpackages. The subpackages don't affect Barracuda, but the new interface methods do, since Barracuda does not implement them.

At the moment I'm just putting all the method signatures into the correct places in Barracuda. I'm not doing any work in those methods (since no Kowari code will ever call them), but I'm putting in all the Javadoc which is time consuming. The only operations these methods make is to throw an appropriate exception (usually UnsupportedOperationException).

Maybe if I submit these changes someone can use this as a template for a proper implementation.

Remote Server IDs
I don't want to spend long on this blog tonight, as I'm trying to upgrade Barracuda to JDK 1.5, and I want to get back to it.

I spent a little while talking with DM today about the new bug with remote server names. We are finally storing model names as relative URIs (well, it takes URIs, and if they happen to be URLs, then it accepts relative URLs). This is great, but relative URLs now means that we can refer to a model using numerous machine names, and the current code is expecting only the canonical machine name. The result is that if a query is issued against a model using a URL based on a non-canonical machine name, then the server doesn't recognise the model as being local. Instead it connects to the "remote" machine that it finds in the URL, and sends the query on to that machine.

An example might be a query on a model stored on a machine named mycomputer.domain.com. The model's URI could be rmi://mycomputer.domain.com/server1#model. If the model is instead referred to as rmi://mycomputer/server1#model then the query will go to the correct server, but the server will not recognise that the model is local, so it will make a connection to mycomputer and forward the query. This leads to an infinite loop.

Unfortunately, it is not possible for a computer to know absolutely every name that it can be known by. For instance, all it takes is a new entry in a DNS for a previously unknown name to come into effect. As a result, a client needs some way of telling that the server that it is connected to is actually the local server. This is easily accomplished by giving each server a unique identifier and checking for it with each new connection. I had considered some kind of UUID, but server URIs are really supposed to be unique, and they can be used to tell the user useful information in the event of a failure.

I had a chat with DM about the DatabaseSession.createModel() method today, but after thinking about it I realised that it operates a little differently to the other types of operations. For a start, it refers to models which should not exist, but which nevertheless represent valid URIs. It then makes the assumption that the model is local, and does a lot of its own work to create the model, rather than passing the work on to a resolver, like other operations do. However there is no reason that it can't handle non-local URLs, and pass the createModel request on to the appropriate server. It just needs a little extra checking, and then it can pass the request on to a RemoteResolver instance.

Another thing that makes this operation a little different is that it will probably never get called unless the model being referred to is local. This is because the iTQL client (and ITQLInterpreter class) will only connect to the correct machine to make this request. So you'd think I could just ignore this operation. However, there are other ways to call this method (eg. JRDF does it) so it would just be asking for trouble if I didn't cover this case.

Finding the ID
In order to find the server URI, I had to start looking in EmbeddedKowariServer. SR has been complaining about this code for some time, comparing it to a post-apocalyptic wasteland where nomads have taken pot shots at various pieces of code as the need arose. After looking at the code I started to see his point.

For instance, there is a static value which holds the hostname, and another instance variable which also holds the hostname. There are two places where the instance hostname can be set, and the static hostname gets set on the following line. One page later appears the line:

EmbeddedKowariServer.setBoundHostname(this.getHost());

Of course, this just sets the static hostname to the instance hostname, which is what it was already set to. Someone wasn't paying attention here.

Now I was looking for the part of the code which would give me the server's URI, and I could do that by checking which parts of the code referred to the hostname. Unfortunately, silly things like I just described make it annoying to track down every part of the code which refers to the hostname.

Anyway, I finally found what I want in the constructor for RmiServerMBean (wow, I haven't seen MBeans since I implemented the JMX framework for Enhydra in 2000). However, there is no simple way to get the URI from this object, mostly because the code where I need it (RmiSessionFactory and RemoteSessionFactory) doesn't know anything about the server MBean that launched them. (Not a very clean MBean design either. These things are supposed to provide instrumentation and control for other objects, not actually implement a service).

I spoke with SR about this problem (among other things - he really hates EmbeddedKowariServer. I mean, who ever heard of a database that contains an HTTP server? It should be the other way around! Kowari would be a lot smaller and more modular then). We eventually agreed that since a server will have only one public URI, then it can be kept statically, and we can just put it in a central location (like EmbeddedKowariServer) to allow anyone to find it.

Exceptions
The other trick, is working out where to do the test of the remote server (to check if it is local), and what to do in the event of a failure.

The thing that seems the most logical is to do the test in RmiSessionFactory just after it has obtained a RemoteSessionFactory (I alluded to this above). If it turns out that the RemoteSessionFactory is on the same server, then it should be closed, and an exception thrown. The SessionFactoryFinder.newSessionFactory() code which called the RmiSessionFactory can then catch this exception, log a warning that a non-standard name was used, and then use a local SessionFactory instead.

I made a start on this code, but it was Friday afternoon, so I wasn't going to pretend to myself that I wanted to finish this before Monday. :-)

Oh, and I looked a little at the documentation of iTQL for OWL predicates, but that was all in the morning.

Thursday, November 18, 2004

Documentation
Today has been spent reading yet more OWL documentation, and looking again at some Kowari docs. There are still a couple of OWL predicates for which there are no iTQL implementations, and I need to either provide them, or explain the current shortcomings of iTQL.

TJ has also set me a series of tasks, starting with solving a problem of a server knowing not to make a remote connection to itself when using a Remote Resolver. If a model is to be found on a particular server, then the name provided could be any one of several qualified or unqualified names, or even "localhost", not to mention numerous IPs. When this happens, the server has to know when to handle a query itself, and when to pass it on.

The only thing I can think of that would work reliably, is for each server to generate a GUID on startup, and to set up a protocol to negotiate this with all connecting clients. That way if a server wants to pass it on, and the connection it makes happens to be to itself (via a name or IP that it didn't recognise) then it can drop the connection and handle the query locally. That way there can be no IPs or names that escape the server's attention (eg. dropping and restarting a network connection could allocate a previously unknown IP to a server).

So I'll be spending my next few days working on this, plus the documentation mentioned above.

Other than that (and filing my tax return) it has been a very quiet day.

Firefox and Virtual Desktops
I've been trialing CodeTek Virtual Desktop, and overall I've been really happy with it. It's a little expensive (given that I only have Australian dollars to play with), but I think I'm using it enough to justify the expense when my free trial runs out. The only thing that makes me wonder if I should get it, is that Apple might try providing virtual desktops as a built in part of Tiger.

One problem with the CodeTek software has been that it doesn't play well with Firefox. I normally use Safari as it has a full Aqua GUI and is well integrated, but Firefox is the browser I use for Blogger so I get all the cute features in the editor. (Camino would seem to offer the best of both camps, but it does not properly render all pages, eg. The Blogger editing page).

When I click on a link to create a new post in Blogger, or I change to another application and back, then I cannot get Firefox to accept any kind of keyboard input. Initially I was thinking that it was a Firefox problem, until I discovered that I can get the focus onto Firefox when I change virtual screens and back. Firefox is the only application I've found with this problem, so it seems to be an interaction bug. I suspect that it is because Firefox uses Carbon, while almost everything else I use is based on Cocoa (maybe I should check out IE?). Firefox is about to start moving over to Cocoa for OS X, so this problem might eventually go away even without any intervention from CodeTek.

Wednesday, November 17, 2004

More OWL
Absolutely nothing of interest to report on today.

I read more OWL documentation. I find it interesting that I can read one part of the documentation and not think much of it, but after I learn more about OWL and come back to the same document I discover something really significant that I didn't get the first time through.

As a result, I'm reading this stuff, and re-reading it. Once I work on it a bit more I'll probably have to come back and re-read it again.

One set of things I really need to search for are any papers which refer to applying DL to OWL, such that OWL DL was arrived at. Ian Horrocks probably wrote some of that, so I should go looking there.

Phone
In the midst of this, my new phone arrived. It's a Sony Ericsson K700i, and I got it on my current account for no extra money. Apparently I'm already being ripped-off adequately. :-)

I just wanted a phone that can do Bluetooth, so I can sync it to my PowerBook. Well it does that really well (I've been impressed at some of the connectivity features, both built-in, and through Salling Clicker), but the other features look very nice as well. I comes with video (no 3G unfortunately, so it only works with MMS) and a snapshot camera, has a very large screen, and runs Java programs really nicely (it seems to be implementing Java3D. Does this phone have hardware 3D support?). All of this comes in a small and light phone, which is another important consideration. Unfortunately, it has a sharpish edge which I find uncomfortable against my ear. Oh well, maybe I'll get used to that.

There are a couple of configuration issues that bother me. First, I can't change the volume of things like the camera snapshot sound (it's REALLY loud). I also wish I could change the volume of different ring tones. For instance, the main ring tone is just fine, while the MP3 I've recorded for Anne's ringtone is a bit soft. In fact, all recorded MP3s sound soft, so it would be nice to either modify the levels on the recording, or else just allow each item to be played back at an individually assigned volume.

One feature I like is that I also have individuals' photos come up whenever I get a phone call or message from them. I'm used to individual ringtones, but I quite like this feature (though I'll probably get over it soon enough).

Overall, I like the phone. I suppose I should start brushing up on my MIDP again.

Tuesday, November 16, 2004

CVS
I mentioned to AN how he had not called setRows on the "trans" and "walk" constraints when he added the code which called getRows on all constraint types. He insisted that he had, and so we looked at the CVS logs.

Sure enough, he had put this code in, almost verbatim to my own changes (he used an extra space when casting the constraint). CVS dutifully logged both of our changes, so it was plainly visible that we had both made this modification.

So how was I not able to see AN's code? I had done numerous CVS updates, and even checked in my own modifications (overwriting AN's), all without complaint from the system. There were no conflicts or updates to be seen on this file.

So I'd wasted Monday on a bug that should never have existed, and Friday seems to have been wasted on a similar problem (although the HybridTuples problem found while trying to print the constraint tree was an important fix). Similarly, I'd witnessed CVS failing for AM a couple of weeks ago. So what is the deal here?

I mentioned this problem to TJ at our morning meeting. Unfortunately, I have no idea why CVS might be behaving this way, nor any reasonable suggestions for an alternative file repository.

XA API
I went back to the RemoteResolver code to see what I needed to do with the getXAResource method. Looking at the javax.transaction.xa package just confused me, as I couldn't work out where the Xid was supposed to come from.

I finally started searching for examples. The most informative I could find was an IBM site. Once I got to see how client code uses this interface I got a much better idea of what was required... and it became clear very quickly that the Resolver interface is not built to be able to handle this API.

The main problem is that the transaction manager needs to associate an Xid with a session. A session will get associated with a RemoteResolver instance, so the Xid will really get mapped to a RemoteResolver instance (in fact, it will get mapped to a set of resolver instances). So when the Xid is first used (by calling XAResource.start(xid,...)) the mapping to the resolver needs to be set up. This would usually be done by creating the resolver in the start(xid,...) method. However, the getXAResource() method is currently a non-static member of Resolver. This means that the Resolver object had to be created before it could be mapped to an Xid, which could cause problems if it gets used before a transaction that it will become a part of. A bigger problem is that the XAResource will be associated with a single resolver session, and yet it is supposed to manage numerous sessions with different Xid's. Since this last proposition is impossible, it becomes clear that getXAResource() cannot be attached to the Resolver class. The appropriate place is probably the ResolverFactory, though that will still restrict it to a single Resolver type when XAResource objects are supposed to be able to manage an entire transaction (involving multiple sessions, probably on multiple Resolver types).

The upshot of all of this is that getXAResource() cannot be implemented on the current interface. Ideally we'd remove it before it got released in the next few days, but the DummyXAResource objects which are currently being returned are being used for some logging, so there would be significant changes needed, and we don't have the time.

OWL
I saw Bob again today, and we discussed OWL a little more. I'd actually done some work this time, so it all went smoothly. :-)

I still need to write up something about the differences (and reasons for those differences) between the OWL species, however I now have some new priorities. The first thing I need to do is to get hold of a copy of the Description Logic handbook. There is a copy in the library, but it is currently out. It is a relatively recent publication, so it is unlikely I will find it online (though I can find the chapters for it easily enough). While discussing this, Bob was able to explain some of the differences between Description Logic and Datalog. I knew some of it (and I still need to learn more), but it has been a little confusing for me as different people have referred to "DL" with some meaning Description Logic, and others meaning Datalog. I understand that DL is supposed to mean Description Logic, but not everyone seems to get that. At least I understand more of the differences now.

The other thing I need to do is start writing my confirmation. Theoretically I don't need to have it done until next September, but I want to do it sooner. Unfortunately, the head of the school will be away for some time, so anyone who wants theirs done before August will need to do it in February. This timetable gets squeezed as everyone (including Bob) will be away for the Southern Hemisphere summer. So if I want any feedback from Bob it needs to be in the next 3 weeks. After that I won't be seeing him again until February... which is just before the confirmation will be due. Consequently, I'm dropping everything else to work on the start of the confirmation over the next couple of weeks.

Fortunately, after working out that I couldn't do anything with the XA API, TJ asked what type of resolver I want to work on, and was more than happy with my suggestion of an OWL resolver to representOWL Abstract Syntax. So while I have to work on my confirmation out of hours, I can continue to work on OWL at work. :-)

I spent the rest of the day (what was left of it) going over OWL documentation. I'll be doing the same tomorrow.

Monday, November 15, 2004

Trans and Walk
For some reason the trans and walk JXUnit tests started failing when I tried to check the new bug fixes from Friday.

For some reason, recent configuration changes have caused the output from JXUnit tests to no longer include stack traces of any exceptions which are thrown. This is really annoying, and it means that it can now take some time to duplicate the conditions of the error, just to see the trace.

Once I had the exception being thrown, I was able to look at the offending code and discover that the problem happened when the Tuples.getRows() method was called on a transitive constraint, or a walk constraint. This was a problem because the constraint had never had the row count set, which made calling getRows illegal.

For some reason, other people had this code working correctly for them. I did a CVS update (several times) with no effect, so I thought I was up to date. However, I could not work out how others could have this code passing when I couldn't.

I asked AM about this, and he explained that AN had needed a workaround for a problem, so he had re-introduced the setRows() and getRows methods for use with a ConstraintExpression.getCanonicalForm() method. AM was not too happy about it, as it took a previously immutable object, and made it variable.

So now that I knew where my problem was coming from, the solution (which AN's hack was still in place) was to set the rows for the "trans" and "walk" constraint types. Once this was done everything worked correctly, and I could check all the bug fixes in for the release that TJ was orchestrating for the day.

This left me free to pursue some documentation, which was boring, but necessary.

Friday, November 12, 2004

Debugging Joins
AM and I spent a lot of time today trying to debug the joins which occur when using a remote resolver.

The first part was trying to print out the constraint expression tree. Yesterday we discovered that this was causing a NullPointerException, and we spent some hours tracking it down.

The problem occurred during the sort of a Tuples. This created a new HybridTuples (via a factory method, so we didn't see immediately that the returned type was really a HybridTuples), and it was in the HybridTuples constructor that we discovered that the incoming Tuples object was being kept, but not being cloned. Consequently, when the tuples which had been given to this constructor was closed later, the HybridTuples object became invalid.

Once this was solved, we were able to observe the constraint tree. It was immediately apparent there was a problem, as the entire tree was being duplicated, with the results of both sides being unioned (and duplicates removed). While this worked for some queries, it created a problem for others when the joining code expected the columns of the tuples to be ordered in a particular way.

AM had seen something like this before, and discovered the problem in a piece of code which builds up the constraints. It was a simple typo with the wrong constraint variable being used.

A concern here was that AM says that he has fixed this code before. This seems similar to a problem he had a couple of weeks ago where a "cvs update" was not getting the latest version of a file. We could look at the CVS log for the file, and everything seemed in order, but for some reason the latest version of the file would not come down for him. In the end he had to delete the file and get it again. In this case he was lucky that it was not a file he had modified.

Thursday, November 11, 2004

Working Exclusion
OK, I'm writing this one really late (by 5 days), but here goes...

This was the first day that I felt I was really productive for the release since last weekend. I got a bit done for the release before now, and I got a lot done on the paper with DW and TA, but I have obviously been tired from the Noosa tri. I'll make the effort to take time off after it next year.

This day was spent debugging the excludes code. I finished the port of the ConstraintNegationTuples class to StatementStoreInverseResolution, which was at least enough to make it run. Of course, it didn't work, which meant that I had to sit down and figure out exactly what it was doing. I'd love to put a full description in here, but since I'm having to go back a few days to remember, and because I'm currently a little short on time I'll just let the reader look at the source code and figure it out for themselves.

The first thing to recognise was that there are two situations. The first is if the Metanode (ie. the model) was fixed. When this happens then it is necessary to slice that model out of the statement store in order to do the inversion. This has consequences for the beforeFirst method, as the first integer will always need to match the given model. It also means that the iterator has to be limited to a single model.

The second situation is for a variable metanode. This ranges over the entire datastore, and the code is simpler. There were just a few tricks making sure it worked side-by-side with the fixed metanode case.

Once I had it all running I put it up against the relevant JXUnit tests, and it all worked. I was very happy. :-)

Remote Joins
The next thing to work on was the Remote Resolver tests which were failing. According to the symptoms from ML, it sounded like the join code was failing. I don't know a problem like that would only show up against this particular resolver, but since it does it falls to me to fix it.

Given that it appeared to be "join code" that was at fault, I asked AM for some assistance. He was happy to help, and immediately suggested a logging statement to observe the structure of the constraint expression. The system immediately failed with a NullPointerException. We spent a little time looking for the culprit, but came up with nothing before calling it a day.

Wednesday, November 10, 2004

Exclude
I got a lot more done today, but I didn't finish the exclude code as I'd expected. I started out by realizing that the ConstraintNegationTuples has a second constructor that I hadn't noticed before, and this one accepts the parameters I'm interested in. After that, it was a matter of only 2 methods and it would be a full Resolution object, so I copied it over my earlier work and modified it.

Of course things never go quite as smoothly as planned. The semantics of the statement store are slightly different, so I had to update a few things. Then when I ran the code I discovered that this inverted resolution class wasn't even being created. Quite a bit of debugging later, and I discovered that the Constraint object I was being given was not a ConstraintNegation. I was also able to log the fact that a ConstraintNegation was being created by the client, and received by the server. So somewhere after that I was losing the information that the constraint was inverted.

It doesn't sound like a lot, but the last two paragraphs were actually quite a bit of work. Certainly more effort than over the last two days.

Java 1.5
Tonight I tried to continue the debugging at home, but that led to its own problem. I've installed Java 1.5 at home, but this doesn't support Kowari yet. So I could either go back to the old compiler, setting up the environment variables, etc, or else I could try and get Kowari to compile under 1.5.

Since it was after hours, I figured I could do the more enjoyable thing and had a go at porting. :-)

The first class to have problems was PIErrorHandler. This includes the class WrappedRuntimeException, which is an internal class from the Apache XML support library in Java 1.4. Speaking with KA I've learned that he was intent to obtain the source of all exceptions. Since this class can be thrown, it is necessary to pull out the wrapped exception to find out the real source of the problem. The best way to do this seemed to be to use reflection to find the getException method. This matched up nicely with a method of the same name in the SAXException class which can also be shown.

The next problem was with another "1.4 only" class called SAXSourceLocator. Again, this is an internal class for XML which is not included in Java 1.5. However, this time it meets a publicly available interface: javax.xml.transform.SourceLocator. There is no public implementation for this interface, but there is an identical interface, org.xml.sax.Locator which does have a public implementation called org.xml.sax.helpers.LocatorImpl. With identical methods, all I needed to do was merge the two into a class which extended LocatorImpl and SourceLocator. I liked the way this worked, as it didn't borrowed the implementing methods from LocatorImpl for the unrelated SourceLocator interface, and it required no code to do it. It made sense to me, as the interfaces were created for the same reasons. Maybe they should have been merged already.

The next problems come from the changes to Xalan in the latest JVM. This is due to the need to implement XSLT extensions. Reading the online documentation, this requires the use of org.apache.xalan.extensions.XSLProcessorContext and org.apache.xalan.templates.ElemExtensionCall. Unfortunately, Java 1.4 never made these classes publicly available, and the version of Xalan that comes with 1.5 doesn't include them.

I've been looking online for the latest way to implement XSL element extensions, but the only examples I can find are for the previous version of Xalan. The latest version of Xalan includes the new XSLT Extensions library, which is an example of this, but I've yet to find any documentation on how it was done. At this point I'm thinking that I will need to find the source code for this library in order to find out how it is done. After that I'll be trying to work out how to do a portable implementation that will work with both versions.

Tuesday, November 09, 2004

More Paper
I spent today bouncing back and forth between coding and helping write the paper for WWW 2005 conference, to the detriment of the code. I made some progress with the Resolution implementation, but it was pretty slow. It really needs to be based on a ConstraintNegationTuples, but this class has a lot of things in it that are irrelevant to the new code base, and the constructor takes parameters that don't really work in the new code base either. Unfortunately, rather than concentrating on it I kept getting interrupted to work on the paper.

As for the paper itself, it seems to be coming along well. Initially I was trying to make sure it would be ready for TA and DW before the due time, but then there was an extension to 5pm due to server problems. I find that funny: the WWW is having web server problems. :-)

Basically, the paper just describes the internal workings of Kowari, and some of the work we've been doing. TA likes it overall, but I'm not that satisfied. For a start, I haven't had the time to spend on it that I would like. To my eyes, this really comes out when reading through as there was not enough time to smoothly merge the text that DW and I have written. Consequently, both of us would refer to the same things, only in different contexts. This had the effect of making it feel less structured than I would like, however I may just be extra critical because I wrote some of it.

Through the course of the evening I got to do some more corrections, so quite a lot of time went into this paper today. Ideally I'd have liked to spend more time on it, but over the course of a week or two, particularly given that this is for an international conference. Oh well, hopefully it will get accepted.

Spelling
I've noticed that the spell checker on Blog Spot has died again. I know this should make me do a better job proof reading, but given limited time, I don't think I'll be bothering.

Monday, November 08, 2004

Inverse Resolution
I spent quite a bit of time catching up on miscellaneous paperwork today, so I was less obviously productive than usual. That's a little annoying, as I plan on having the exclude statement working by Tuesday afternoon.

I spent a little more time converting the StatementStoreResolution class into a StatementStoreInverseResolution to do exclusion for the XA statement store. I didn't get the time to finish it, so I will be picking it up again in the morning.

As for handling an inverted constraint, I've been wondering if there should be a method on the resolver interface to test for that capability, or if an exception should be thrown. It seems cleaner to have a method to tell the calling code that it can't pass in an inverted constraint, but the result of that would probably be to throw an exception. So maybe the calling code doesn't need to know ahead of time, and an exception can be thrown by the resolver. That's probably how it's working by default, but I'll check with SR that he is happy with that.

Paper
In the meantime I've been writing parts of a paper for the WWW conference in Japan next year. I got a little done on Friday night before I left, and I'll be working on more tonight. It's been a bit difficult with the timing of the triathlon, but now that is done I'll have to concentrate on work and study.

The paper is for Tucana, but hopefully DW will let me put UQ down next to my name as well. That way it is for work and for uni. I just wish I'd had more time to sit down and concentrate on it.

Friday, November 05, 2004

Exclusion
This is my second day of really brief notes.

According to AN I should have been able to take the exclude code from the old code base, and copy it into the correct position in the new. To do this I started searching for the appropriate code in both code bases. It took some time to confirm, but I eventually concluded that this was not possible.

I also got to see the full extent of the differences between constraint resolution in the old code base and the retro-version in the new (the newer version of this code is in the old code base). This made it very clear that we need to port the constraint resolution code into the new system as soon as possible. There are several reasons for this. The most important is that a new constraint type changes a lot of code with if statements in the new code base, but only needs new classes in the older code base (older system, but newer code). This makes the older system more modular and easier to understand and modify. We will want these changes made quickly, before much more work is done on the new code base, making the changes harder to do (a situation we found ourselves in when trying to move everything over to the current new resolver code base).

In terms of the "exclude" code, I've worked out where the changes need to be. In the old codebase it was wrapped around a normal constraint, and inverted the results as they came back. The new architecture does not allow for this. The best example of this is shown with a remote resolver. It is impossible to return all data except the result of a constraint because the only data available is that result. There is no way to get the rest of the data from the server (without constructing another query).

So each resolver will need to handle an "inverted" constraint and return the correct answers. For our purposes, we only need to be capable of doing this on the StatementStoreResolver to get our old functionality. However, in future we may need to consider mandating that all resolvers be capable of handling an inverted resolver.

In the meantime, the StatementStoreResolver code needs a new Resolution object (basically a Tuples) which can return all data except that which a constraint resolves to. This won't be too hard, but will take a little while to code.

Thursday, November 04, 2004

Other Stuff
With Noosa being run on Sunday the 7th I was a bit too distracted to sit down after work on the 4th and 5th to write about my day at work. So once more I'm writing after the fact. This is just a couple of quick notes to remind myself of what I was doing on those days.

TKS Tests
Most of the day was spent on TKS, so I could confirm the distributed query tests. I was led to believe that all our recent work had been moved over and would be running. In particular the Remote Resolver code appeared to be in there, so I figured it should be quite easy.

Instead I discovered that I couldn't build the code at all. After struggling with missing classes, tests I expected to see and couldn't, and configurations which didn't mesh with my understanding of the architecture, I discovered that the recent work had not been moved into TKS yet. TJ suggested I move onto something else.

One misleading aspect of the TKS code was the presence of the Remote Resolver classes. These have only just recently appeared in TKS, and I don't understand why they went in when nothing else has. Now that I think about it, I should look at the CVS log to see who did it, and then I can ask them.

In the meantime, I started looking at implementing exclude on the new codebase. In the course of this I've discovered that all the recent changes to clean up and modularize the constraint resolution code has been left behind in the old system. AN is rather frustrated at this (as am I), but I suppose we need to get the new codebase running and passing all tests before we start bring forward features like this.

Wednesday, November 03, 2004

Debugging
The code from yesterday did pretty well, though there were a few errors. Overall the structure worked out, and it was just a few little things that wouldn't work. For instance, I had an array of SPLimit objects for each type, but SPObject (the parent class) will not accept a data type of TCID_FREE (0) so I had to leave that array entry at null.

The only significant issue was in a line which renames the variables of a tuples to those of the requesting constraint. Since the data has been sorted, it is contained in a HybridTuples object. It appears that the renameVariables method in this class expects to see variables which are named "Subject", "Predicate", "Object" or "Model". However, I had copied the Tuples returned from the XAStringPoolImpl which names the single column "gnodes".

While I could just fix this by renaming my single column to "Subject", I was at a loss as to why the method had previously worked on the Tuples returned from XAStringPoolImpl. I eventually realised that this was because the tuples was always renamed before being resorted into a HybridTuples, so the HybridTuples.renameVariables() method had never been called on data returned from XAStringPoolImpl before.

Looking at the code from various angles, I was able to determine that the name "gnodes" was just a convenient placeholder, and that "Subject" was not only more convenient, it also made semantic sense in the context of where this code appears.

Once it seemed to be running, I gave the code to ML to try with his code. Unfortunately there had been no unit tests for this code, and I was running a little short on time to build a comprehensive set, so letting it run in context was my best bet. There was a small bug, where I hadn't realised that the limit parameters could be set to null to indicate no limit, but I fixed this easily, and after that it worked as advertised.

Running everything in the unit tests showed up a slew of new failures in the JRDF tests. This ended up being because these tests required all new resolvers to be in the classpath, as it does reflection on each of them. Adding the nodetype resolver to its classpath fixed this. To prevent this in future, RT (who is maintaining that code) changed the build script to include the whole distribution in the classpath for that test.

Now I have to move the Remote Resolver from Kowari and into TKS, so that the distributed tests in TKS can validate this code. I've checked out TKS, and I've started exploring the paths (again). I'm not really sure what to do, but hopefully I'll figure it out early tomorrow.

Tuesday, November 02, 2004

In Memory String Pool
I spent the whole day on just two aspects of this class.

The first part was trying to create a new type of object which matches a given type of SPObject, and compares either larger or smaller than every instance of that type. I called this class SPLimit. I had initially hoped to create a kind of SPObject whose byte buffer would always compare larger or smaller, but on closer inspection I realised that the byte buffer could be translated into almost any kind of data type, and this plan would not work.

Instead I had to override the comparator to enforce my semantics. To avoid excessive usage of the expensive instanceof operator, I introduced a new method to SPObject called comparatorOverride which returns a boolean. The comparator calls this on its arguments before deciding to proceed normally or with my new comparison operation.

I also re-worked the existing lowValue-highValue version of the findGNodes method to make it a little simpler and straightforward. I also noticed that I could probably avoid creating a new set for appending an SPObject if I play with the iterators a little more. I haven't done this yet, but it may be worthwhile, as it would allow subsets to be exclusively used, saving significantly on memory in some circumstances.

The other part of the code was a new Tuples object for returning from the two findGNodes methods. Internally I had a sorted set of the required SPObject, but the required data was a Tuples of local nodes (which are all long integers). This was a relatively simple wrapper which does a translation from SPObject to long as required. Unfortunately it took a long time to implement something so simple (and get it right), as Tuples has 21 of methods other than the constructor.

I left the SetWrapperTuples as a private inner class of MemoryStringPoolImpl. This was partly to hide it, and also to provide access to the private maps that the MemoryStringPoolImpl holds, so it can do local node lookups on the fly.

This took me all day, and I only just got it compiling by the time I had to leave, so I won't be testing and debugging until tomorrow.

Thinking
I saw Bob again today, and discussed a couple of useful things. I hadn't done much as I should have, but I figured I could wing it. However, toward the end he confirmed with me that after the triathlon this weekend that I would start "thinking again".

Oops. Looks like I got caught out. :-)

While I'm at it, this blog has been suffering lately as well. Perhaps I'll start putting some more effort in again after the weekend. In the meantime, I'm just concentrating on exercise, food and rest, so my technical interests are taking a back seat.

Monday, November 01, 2004

Memory String Pool
After removing the exception that was thrown when the node type resolver is created with a canWrite parameter set to true, I found I was still getting the error. This was simply due to the file not being re-built, so I mentioned it to ML, who has been fixing these things. He told me later that it was a couple of items which were out of order in the Ant script. Later in the day I did an update, an a file which had implemented a new method wasn't rebuilt before a calling class was compiled against it - resulting in a compilation error. ML claimed that this problem could not be fixed, which I'm a little surprised at.

So for the moment, I'm still having to do clean rebuilds.

With the exception that I mentioned above removed, I was able to create models of the appropriate type. That just left me needing to perform queries against it. While looking at the code I realised that the ResolverSession implementations were returning sorted data from the findStringPool* methods. I was previously sorting the returned data so that I could append it, but since I had already appended the data from two string pools, it was already sorted, and I could use it in an append operation without further processing. I was worrying about the excessive use of new HybridTuples objects so this was pleasing to see.

I had a little minor debugging to get through before discovering that the in memory string pool does not implement either of the findGNodes methods. Looking internally, I discovered that this string pool is implemented with hash tables, making these methods impossible. ML asked me about this at almost the same moment I discovered it for myself, so I had to find a fix that would suit us both.

Fortunately, the SPObjects which are stored in the string pool are all comparable. This means that they can be stored in a SortedSet. While this will use a little more memory, the objects are already being stored, so the only space overhead will be that of the tree structure for the set. So I've added a new index of the SPObjects which just stores them in order.

The first findGNodes method was relatively straightforward. Using the SortedSet.subSet method did a lot of the work, though I needed to fiddle a little with the results in order to appropriately include or exclude the first or last item, according to the parameters of findGNodes. I still need to write a small wrapper class which can convert the resulting subset into a Tuples.

The second findGNodes method needs to be able to find an entire data type. The most efficient way I can think of doing this would be to create a couple of SPObjects which are guaranteed to be the smallest and largest of a data type, and use the SortedSet.subSet method again. Hopefully that won't be too hard.

Friday, October 29, 2004

New Build System
ML has spent the last 2 weeks re-writing the Ant build script for Kowari. It had been getting messy and difficult to modify, but it also occasionally missed some dependencies, meaning that we had to do clean builds most of the time. A full clean build took about 5 minutes on my system, so this was very wasteful.

ML's work was checked in on Thursday, so today I pulled in the new scripts, and started work on the NodeType resolver scripts to make sure everything was integrated. Unfortunately this took quite some time, and it wasn't until after lunch that I could compile again. Then ML told me that he needed a new method on ResolverSession, so I had to track down all of the implementing classes again in order to add the method.

I was starting to run my first tests by the end of the day, but hit a couple of initial hurdles. Initially the system would not recognise my model type. I had accidentally changed the type's URI, but this was not the issue. I realised that Kowari had not picked up my resolver factory, and so I spent a little time trying to remember how the factory was supposed to get registered. SR reminded me that this is done in the conf/kowari-config.xml file.

The next problem was that the newResolver() method was being called with a true value in the canWrite parameter. I was under the impression that this meant that the object was not going to be treated as read-only, so I was throwing an exception. SR explained that this was not the case, and that it was more like a hint for the system. If I really wanted to be read-only then I should just throw exceptions from any modifying methods. I'm already doing this, so I'm covered. In the meantime, I just have to remove the check on the canWrite parameter in order to proceed.

Thursday, October 28, 2004

Resolving
The resolve method came along reasonably well today. It needed quite a bit of pre-initialised information that I hadn't provided, so I spent some time getting all of that together. This included things like preallocated local nodes for the URIs of rdf:type, rdfs:Literal, and the URI objects and local nodes for the Kowari Node Type model URI, and the URI representing a URI Reference.

With this info in place I could get going on the resolve implementation, which is based mainly on the String Pool. Unfortunately, there turned out to be no easy option to get the String Pool, so this resulted in a half hour discussion about how it should be done.

The String Pools are stored internally in the ResolverSession object which is given to every Resolver instance. Ideally, we don't want anyone implementing a resolver to have unfettered access to this data, as it really is internal. It is only internal resolvers, such as the node type resolver, which need to see this information.

It is on occasions like this that I see the point of the "friend" modifier in C++ (a modifier I try to avoid... unless it is really called for). The closest analogy to this in Java is "package scope" which applies to anything with protected or default visibility. Unfortunately, this only applies to anything in the same package (not even a sub package) so it is of limited use.

Rant on Inheritance
Java does annoy me a little on inheritance. Both default and "protected" modifiers permit classes in the same package to see the data. However, there is no way to only allow inheritance, while hiding the same information from the rest of the package.

It would also be nice to permit subpackages to have the same access rights as package peers. That way, "internal" packages could see data that external packages can't, without having to pollute the parent package that everyone needs to see.

Back to String Pool Access
I suggested a class in the Resolver package which could see the internal data from the ResolverSession as this would let me get to the String Pool data that I need, without changing the ResolverSession interface. SR was against this, as he didn't see it providing much benefit in data hiding.

I finally ended up adding some of the String Pool methods to the ResolverSession interface. At least these methods were all about finding data in the string pools, so they were read-only operations. So even though external packages can now see the methods, they can't cause any damage.

In the course of the discussion, it turned out that ML also needs some access to the string pool as well. So instead of adding just a single method to the ResolverSession interface, I had to add two. The result was a modification of 6 classes which all implement this interface. Fortunately, only 4 of them were test classes which did not need to support all of the methods, so I just threw an UnsupportedOperationException.

Since each ResolverSession has two string pools, the methods I implemented actually performed the operation twice (on both the persistent and the temporary string pools), and then used tuples operations to concatenate the results. The NodeType resolver then calls the new findStringPoolType method twice (once for typed literals and again for untyped literals) and concatenates those results. So there's a bit of concatenation going on.

Finally, the results needed to be sent back as a Resolution object. Resolutions are just a Tuples interface along with 2 methods, called getConstraint and isComplete. The last time I implemented this class I was able to extend a TuplesImpl, but this time I don't necessarily know what type of tuples the data will be based on, so I had to wrap a Tuples object instead.

Wrapping an object with 23 public methods is a pain.

By the end of the day I had it all coded and compiling, but I hadn't run anything yet.

Wednesday, October 27, 2004

Node Type Resolver
Spent the day bringing over the URL Resolver into the Node Type package, and implementing all the fiddley bits. This means that I still have the main resolve() method to implement, but hopefully that will be very similar to the previous implementation.

Many of the other methods were trivial to implement, as the model is read-only.

The only issue I had with some of the methods was for model creation and deletion. I asked SR about these, and he told me about a class called InternalResolver which has several protected methods specifically for these kinds of operations. I had to extend NodeTypeResolver from this class, but this only required a few small changes. I also needed to promote an internal method called findModelType from private to protected so I could use it. It really is a utility method, so it was a valid change to make. I think it was only kept internal to the class as SR didn't need it for the two resolvers that used it.

Tuesday, October 26, 2004

Node Types
Bringing node types into the new resolver system had me stumped for most of the morning, but I eventually started to get a picture of it.

For a start, I will be implementing it as a new resolver, and registering it as "internal". It will then get found based on model type rather than protocol. As before, it will build its data from the string pool, only now it will be using more than one string pool, so it will be appending data.

The trick is to make sure that the node type resolver uses the same string pool, on the same session, as all the other resolvers. I was concerned abotu how to get this, but SR was able to reassure me that I can get it easily.

The other important requirement is that constraints to be resolved against a node type model will occur last. This is so all other resolvers will have already populated the string pools with data returned from their query. This is a little harder to guarantee.

At the moment, the order of resolution is based on the approximate size of the data that the constraint expects to return. One way to be executed last would be to return a size of Long.MAX_VALUE. Unfortunately, several other resolvers try to bias themselves to go last by doing this. In this case the resolver absolutely must go last, so it can't necessarily rely on this trick.

In the interim, SR has suggested that I try returning Long.MAX_VALUE as a size. If another resolver tries to get in the way then we can deal with it then. Since most resolvers play well, this should not be a real problem, at least not in the general case.

Armed with this design I've started coding up the new resolver. It will probably take me a day or two.

Working notes