Working notes: 11/01/2004

Tuesday, November 30, 2004

Local Sessions
My idea to put LocalSessionFactory in with RmiSessionFactory didn't work out quite as I'd planned. The classes associated with RmiSessionFactory were to be found in server-rmi-base-1.1.0.jar, which was really the wrong place to put the local session factory classes. So I made sure that server-local-1.1.0.jar was included everywhere that server-rmi-base-1.1.0.jar was. However, this still didn't prevent the ClassNotFoundExceptions.

I tried placing this jar into numerous places without any luck, so in the end I started putting the LocalSessionFactory classes into different jars. This didn't work either. I spoke to various people who ought to have known, but no ideas were forthcoming. I guess this explains why TJ wants to change the classloader framework!

Finally, I was frustrated enough to jut put it all in server-rmi-base-1.1.0.jar. This eliminated the ClassNotFoundException, and left me with an exception relating to unknown transaction phases.

At this point I took a step back to re-examine what I was trying to do.

When the original code needed a session factory, it would always get an RmiSessionFactory, which wrapped a RemoteSessionFactory, which wrapped a local session factory. The local session factory is configurable, but at this stage it is always a Database object (Database meets the SessionFactory interface). So if I wanted to avoid RMI altogether, all I needed to do was get hold of a Database object, just like RemoteSessionFactory does.

The Database object is a single object created by EmbeddedKowariServer (well, it's actually created by the ServerMBean that EmbeddedKowariServer creates). The RmiServer class uses EmbeddedKowariServer to find this object, and gives it to the RemoteSessionFactory constructor. So if I wanted to use a local session factory, all I should need to do would be to get it from EmbeddedKowariServer.

I tried this, and again it didn't work. This time the exceptions all said "Unable to commit", and the database files had not even been created on the disk. I discussed this with DM, but we were unable to work out exactly what was happening. All I can guess is that the RmiServer somehow initialized the database before using it for the first time, but I couldn't see where that happened.

I'd now made it to the end of the day, with very little progress. At this point I realised that about the only thing I was trying to do was to avoid using RMI with descriptors. All other "clients" would occur in another JVM and RMI couldn't be avoided anyway. So it seemed reasonable to use RMI for descriptors as well.

To do this I just needed to change the test in RMISessionFactory so that it would ignore connections to the current server if the URL for the server was exactly the URL used to find the server. This is safe, as the looping bug only occurs when connecting to a server using a name that the server is not familiar with.

Once this was done I set the tests running, and crossed my fingers that all would be well. I'm guessing that something will go wrong, but hopefully I'm converging on the solution.

Monday, November 29, 2004

Kowari and JDK 1.5
I forgot to say... Last Thursday night I got Kowari building and running with JDK 1.5. Maybe I should submit those XMLC patches.

Unfortunately, I couldn't check my changes in, as it now has problems with JDK 1.4. This is mostly because the XMLC interfaces now require certain methods to be implemented, which the JDK 1.4 system doesn't have.

When I get some time I'll have a go at bringing in the latest W3C libraries to see if these will override the built-in libraries with the latest interfaces.

OSX and MIDP
I've also been wasting some time looking at developing for my new phone with MIDP on OSX. It looks like I can do it, so I'll have to spend some time brushing up on J2ME. All I really want is a calculator, and I can probably find one if I go looking, but where would be the fun in that?

In the meantime, I've discovered that the phone has numerous bugs. For a start, it can't render Slashdot. It starts to show some headers, but when the page is complete all that is visible is a very large page of whitespace. The browser also makes it impossible to select some links on certain pages (for instance, the Sony Ericsson page!).

The other big bug is in the email client. It supports SSL, so after fiddling with the configuration for a couple of days, plus some support from my phone provider, I finally got it to connect to GMail. If there is no mail there, then it all works fine, connecting to the server, and telling me that I have no new messages. However, if the smallest text message is present, it will start to download it, and then hang. After a few minutes of complete unresponsiveness it starts to flash the screen and then reboots. Yuck. At least I can make phone calls with it. :-)

Discussions
With TJ back we spent a little time talking about the needs of the customers he spent the week with. Most of it seems to come back to the work we are currently doing, but I think it has focused TJ's ideas a little.

We had a chat this afternoon about the OWL resolver he would like to see. I was initially worried about this, but in the end I've been mollified. Basically, it will return statements from a model of inferenced data, combined with the model of base facts. The techniques I'm using to create an inferred model will be essentially the same, but now there will be some extra information on how the models relate. This will allow many operations occur automatically whenever the OWL resolver is used. I'm all for this approach, so I'm happy to go ahead with it.

In the meantime TJ is very keen on the idea of the "change problem", which I've been avoiding up until now. While I know how essential it is, I think I've talked him into letting us get inferencing working correctly first. I think this is important, as we need to learn to walk before we can run. While I've done no work and little research on the problem so far, I have been thinking about it, and I can think of several things we can do to make this work more efficiently than the brute-force approach. However, I also think that we will learn some lessons as we implement the brute-force method. So I believe that putting off this design for the time being will allow us to get a working (though slowly updating) system in a shorter timeframe, and allow the final version to be better designed and faster than if we attempted it from the outset.

Looping Bug
I was back onto this after a weekend away from it all.

An initial discussion with AN revealed that the JRDF and Jena code actually use the LocalSessionFactory class, and not RMI (yay, I was beginning to lose faith). This ran counter to SR's earlier assertion that the code which returned a LocalSessionFactory was no longer in use. AN seemed initially skeptical that no one else used anything but RMI, but I eventually convinced him.

So this left me in a position of trying to create a LocalSessionFactory and returning it from the findSessionFactory() method when the NonRemoteSessionException method is thrown from the RmiSessionFactory constructor. Writing the code for this was the easy part. Getting it to work in the tests was the difficult part.

Unfortunately, no one seemed to have access to the LocalSessionFactory class. I spent more than a little time looking through the myriad build scripts trying to find where I should add it, but with little luck. On each occasion when I thought I'd found it, I added the needed classes to the appropriate jar and re-ran the tests. On each occasion I discovered that I hadn't found the correct place.

I really beat my head against a brick wall on this today, with little progress. Only as I was leaving did I think to look for the jar file which implements RemoteSessionFactory as this must also have access to the local data store. I thought I might get to it tonight, but I've spent quite a bit of time catching up on last week's blog, so I'll be doing it in the morning instead.

Busy, busy, busy
Well I've just been through a rather full week, both at home, and at work. As a result, I didn't log on to blogger once. Darn.

So what did I do? Well if only I'd been blogging as I went I'd have an easy answer to that. Since I haven't, I'll just have to write a few loose and disconnected notes. It won't be a complete description, but it will give a vague idea of what I've been on about.

I've been pretty slack about this blog in the last few weeks. It is obviously a very busy time of the year from a social context, and I'm tending to go to bed earlier at the moment due to training (I do a lot of blogging at night).

Actually, the training is going quite well, and I had a great race on the 21st. I was aiming for an hour, and was initially disappointed that I got 1:00:18, but then I discovered that the bike leg was 1.8km too long (they wanted to put the turns at appropriate places in the road), so I'd have made the time if I did the advertised distance. I have a longer race on this Sunday, and the last race has inspired me to work extra hard... which has led to me blogging less.

Sorry, I don't usually write much about training, but I've been really happy with my progress lately. Oh, and for anyone who needed proof I was at Noosa, you can find some pics at Supersport images if you look for BIB number 2426. (You probably can't see it, but my bike has a flat front tire - very frustrating).

I'll try and make more of an effort with blogging this week.

In the meantime, here are a few notes on the past week:

Monday - 22nd
TJ is away training a client this week, so things seem a little more relaxed today. I started the day looking at doing more of the "OWL inferencing with iTQL" documentation, but by the end of the day I was back onto the looping bug.

Just to describe this bug again...

DM has put some effort into divorcing model names from locations. Consequently, it is now possible to move models from one machine to another. However, we still don't have any way of separately describing locations and names, so model names still have their location in them. The difference is that the model can be moved and the new location in the name will reference it correctly. (In the past the whole name had to be the same, including location. This is why models couldn't move from one machine to another).

The problem with the current code is that it expects models to be restricted to a specific set of names on the current machine. Hence, a model which is called rmi://machine.domain.com/server1#foo must be referred to by that name. It can't be called rmi://machine/server1#foo even though that machine name resolves to the same address.

This problem manifests when a query is made on a name that the server does not recognise. If the name rmi://machine/server1#foo is queried, then the request will be sent to the address for "machine". Since this server does not recognise that particular name (it expects the names machine.domain.com or localhost) then it forwards the request on to the server named "machine". Not only do we get infinite recursion, we even do it through RMI! :-)

I had thought that I could try and compare the hostname in the model name. However, this isn't a complete solution, as it is always possible to miss an IP address on the current machine, or a DNS server could hold a name for a host which that computer has no knowledge of. As a failsafe, the host has to be queried for its "servername" and this will get compared to the servername on the connecting client.

The servername is found in the startup class EmbeddedKowariServer, which determines the name from the user configuration, or else calculates a default name. However, as the program starts the EmbeddedKowariServer class is unable to see the RmiSessionFactory class in its classpath, as it isn't until later that the appropriate class loaders are set up.

Similarly, the RmiSessionFactory class is unable to see EmbeddedKowariServer in its classpath while compiling. However, I experimented a bit with reflection, and discovered that EmbeddedKowariServer was available at runtime. That took me until the end of the day.

Tuesday - 23rd
I started out by modifying the session interface so it could test both ends of an RMI connection in the constructor of RmiSessionFactory. If they compared equal, I initially expected that I could silently change the RmiSessionFactory to return new local sessions instead. However, I had difficulty getting access to a class which would do this, and after some discussion with DM and SR I changed tack. Incidentally, all of my suggestions on how to proceed were considered to either be inappropriate, or else impossible by both SR and DM. However, some of what I'd already done was working fine, which invalidated some of the "impossibility" arguments I was given. :-)

When the RmiSessionFactory constructor detects that the client and server are on the same machine it now throws a new exception called NonRemoteSessionException. I changed the SessionFactoryFinder to pick up this exception, and fall back to using a local connection. However, even though the code for a local session was all there, the class paths were not. Asking SR about it, he said that this code would never be run anyway, and that it should probably be excised (he was wrong, but I didn't discover that until the following Monday).

This led to a discussion which resulted in two things. The first is that I needed to try to recognise the name of the current machine, and change the requested model wherever possible. That means picking up all the known IP addresses, and comparing these to the results of a DNS lookup on the host name in the model name. This would catch almost all problems. The second part, was to continue to test the client and server for their names, to prevent loops where the first method fails.

Wednesday - 24th
The first half of this day was spent at the WISE conference being held here in Brisbane. My school organised my attendance, but I could only afford a small amount of time at it. Fortunately, I got to see my supervisor delivering a paper on his current work, and it explained a couple of points for me. Most of the other presentations were a bit lacking.

The worst part was that many presenters were not there. In some cases they had been denied visas, but in quite a few cases presenters had shown up for registration on Monday, and had then spent the rest of the week at the Gold Coast instead. I have no objection to them not attending for most of the time, but it just seems rude to not show up for your own presentation.

I also learned something surprising about conferences. Bob wasn't impressed with a few of the presentations, and so I asked if the presenters were ever given feedback on their performance. I was surprised to learn that this never happens. The closest thing to feedback is the number of questions posed to the presenter, but this could indicate numerous different things, so it becomes hard to judge.

As a future presenter, this has its good points and bad. As a plus, its nice to think that I'd gain credit from the school simply for presenting the paper, and I can be as bad as I like. Indeed, I can even leave a conference feeling really good about myself, no matter how I performed (who wants to have their ego deflated?). However, the negatives are probably bigger. Unless I have some brutally honest friends about I'm unlikely to get honest feedback, which will make it very hard to improve. Lets face it, no one is great on their first presentations, so I will be looking for every avenue for possible improvement. It's just a shame that the obvious one is not there.

After returning to the code I started on the methods for renaming models to a canonical form. I continued some of it in the evening, but it was not quite finished on this day.

Thursday - 25th
It took me a few hours, but by lunch I'd completed the methods to return canonical forms of a model name. Model names had to go back and forth between local and global space, which made it a little less straightforward, but it seemed to work well.

The next job was to find all points in the code which attempted to use a model name, and make sure it was replaced with the canonical version. This mostly happened in DatabaseSession but when I'd finished I did a little grepping and discovered that it was needed in some new classes called SetModelOperation, RemoveModelOperation and ModifyModelOperation. The appearance of these told me that each operation is starting to be factored out into its own class. Later in the afternoon I was having an implementation discussion with AM and discovered that he is the culprit. I'm expecting to see more operation classes soon.

By the end of the day I was reasonably confident that this bug was nailed. Everything appeared to be working, so all that remained was a full set of tests to be run in the morning.

Friday - 26th
The tests failed. Doh!

Unfortunately, our full set of tests now takes some time to run, so this stage can get very slow. I got to spend a bit of time reading documentation while tests ran. It can be very frustrating.

I finally tracked the problem down to the Kowari Descriptors. These are essentially clients which run within the server. When they try create a model they explicitly ask for an RMI connection to the current server and send the request out on that. Since the client and the server are on the same machine, the RmiSessionFactory constructor throws and exception, and the test fails.

So what went wrong? Well it turned out that I had a lot of my code for handling the problems intercepting the Resolver layer. I was under the impression that all requests were now supposed to go through this layer. However, the ItqlInterpreter does not do this. Instead, it obtains a session factory directly, and asks for a new session from it. Since the model will always start with "rmi://" then the factory will always be and RmiSessionFactory. I discussed the problem with SR, and he explained the code which did some of this work (to the annoyance of KA, who was trying to work in the same room. Sorry about that KA).

SR suggested not setting a session to a server, as this was supposed to leave the session connected to the local server. I was skeptical at this, as I couldn't see how the local server was ever set, but I gave it a go anyway. Unfortunately, this simply resulted in a lot of errors referring to a NullSessionException.

Closer inspection of this code revealed that ALL connections were being made with RMI, and only the RemoteSessionFactory is able to talk to a database on the same machine as itself. Since most clients are in a separate JVM this is not a big issue, but it certainly causes problems in some circumstances. More discussion with SR led me back to Monday's (now deleted) code which attempted to get a local session when an RmiSessionFactoryconstructor failed. However, that had failed on Monday, so now I needed to figure out how a RemoteSessionFactory was able to get these sessions for itself.

And that got me to the weekend (now that I know its OK, it gives me some kind of perverse pleasure to start sentences with a conjunction. Not really sure why. Maybe it's because I know I'll be annoying someone out there who thinks that you can't do it). :-)

I was no where near as productive as I'd like, and I hate that. In particular, I'm frustrated that haven't been able to return to working on OWL code. At least it will give me incentive to get this done as quickly as possible.

Friday, November 19, 2004

XMLC
I mentioned that I would talk about Barracuda and JDK 1.5, only I forgot.

The main problem is that Java now includes updated org.w3c.dom packages, which now implement DOM Level 3 (JDK 1.4 only did DOM Level 2). The new code includes numerous extra methods, and a few new subpackages. The subpackages don't affect Barracuda, but the new interface methods do, since Barracuda does not implement them.

At the moment I'm just putting all the method signatures into the correct places in Barracuda. I'm not doing any work in those methods (since no Kowari code will ever call them), but I'm putting in all the Javadoc which is time consuming. The only operations these methods make is to throw an appropriate exception (usually UnsupportedOperationException).

Maybe if I submit these changes someone can use this as a template for a proper implementation.

Remote Server IDs
I don't want to spend long on this blog tonight, as I'm trying to upgrade Barracuda to JDK 1.5, and I want to get back to it.

I spent a little while talking with DM today about the new bug with remote server names. We are finally storing model names as relative URIs (well, it takes URIs, and if they happen to be URLs, then it accepts relative URLs). This is great, but relative URLs now means that we can refer to a model using numerous machine names, and the current code is expecting only the canonical machine name. The result is that if a query is issued against a model using a URL based on a non-canonical machine name, then the server doesn't recognise the model as being local. Instead it connects to the "remote" machine that it finds in the URL, and sends the query on to that machine.

An example might be a query on a model stored on a machine named mycomputer.domain.com. The model's URI could be rmi://mycomputer.domain.com/server1#model. If the model is instead referred to as rmi://mycomputer/server1#model then the query will go to the correct server, but the server will not recognise that the model is local, so it will make a connection to mycomputer and forward the query. This leads to an infinite loop.

Unfortunately, it is not possible for a computer to know absolutely every name that it can be known by. For instance, all it takes is a new entry in a DNS for a previously unknown name to come into effect. As a result, a client needs some way of telling that the server that it is connected to is actually the local server. This is easily accomplished by giving each server a unique identifier and checking for it with each new connection. I had considered some kind of UUID, but server URIs are really supposed to be unique, and they can be used to tell the user useful information in the event of a failure.

I had a chat with DM about the DatabaseSession.createModel() method today, but after thinking about it I realised that it operates a little differently to the other types of operations. For a start, it refers to models which should not exist, but which nevertheless represent valid URIs. It then makes the assumption that the model is local, and does a lot of its own work to create the model, rather than passing the work on to a resolver, like other operations do. However there is no reason that it can't handle non-local URLs, and pass the createModel request on to the appropriate server. It just needs a little extra checking, and then it can pass the request on to a RemoteResolver instance.

Another thing that makes this operation a little different is that it will probably never get called unless the model being referred to is local. This is because the iTQL client (and ITQLInterpreter class) will only connect to the correct machine to make this request. So you'd think I could just ignore this operation. However, there are other ways to call this method (eg. JRDF does it) so it would just be asking for trouble if I didn't cover this case.

Finding the ID
In order to find the server URI, I had to start looking in EmbeddedKowariServer. SR has been complaining about this code for some time, comparing it to a post-apocalyptic wasteland where nomads have taken pot shots at various pieces of code as the need arose. After looking at the code I started to see his point.

For instance, there is a static value which holds the hostname, and another instance variable which also holds the hostname. There are two places where the instance hostname can be set, and the static hostname gets set on the following line. One page later appears the line:

EmbeddedKowariServer.setBoundHostname(this.getHost());

Of course, this just sets the static hostname to the instance hostname, which is what it was already set to. Someone wasn't paying attention here.

Now I was looking for the part of the code which would give me the server's URI, and I could do that by checking which parts of the code referred to the hostname. Unfortunately, silly things like I just described make it annoying to track down every part of the code which refers to the hostname.

Anyway, I finally found what I want in the constructor for RmiServerMBean (wow, I haven't seen MBeans since I implemented the JMX framework for Enhydra in 2000). However, there is no simple way to get the URI from this object, mostly because the code where I need it (RmiSessionFactory and RemoteSessionFactory) doesn't know anything about the server MBean that launched them. (Not a very clean MBean design either. These things are supposed to provide instrumentation and control for other objects, not actually implement a service).

I spoke with SR about this problem (among other things - he really hates EmbeddedKowariServer. I mean, who ever heard of a database that contains an HTTP server? It should be the other way around! Kowari would be a lot smaller and more modular then). We eventually agreed that since a server will have only one public URI, then it can be kept statically, and we can just put it in a central location (like EmbeddedKowariServer) to allow anyone to find it.

Exceptions
The other trick, is working out where to do the test of the remote server (to check if it is local), and what to do in the event of a failure.

The thing that seems the most logical is to do the test in RmiSessionFactory just after it has obtained a RemoteSessionFactory (I alluded to this above). If it turns out that the RemoteSessionFactory is on the same server, then it should be closed, and an exception thrown. The SessionFactoryFinder.newSessionFactory() code which called the RmiSessionFactory can then catch this exception, log a warning that a non-standard name was used, and then use a local SessionFactory instead.

I made a start on this code, but it was Friday afternoon, so I wasn't going to pretend to myself that I wanted to finish this before Monday. :-)

Oh, and I looked a little at the documentation of iTQL for OWL predicates, but that was all in the morning.

Thursday, November 18, 2004

Documentation
Today has been spent reading yet more OWL documentation, and looking again at some Kowari docs. There are still a couple of OWL predicates for which there are no iTQL implementations, and I need to either provide them, or explain the current shortcomings of iTQL.

TJ has also set me a series of tasks, starting with solving a problem of a server knowing not to make a remote connection to itself when using a Remote Resolver. If a model is to be found on a particular server, then the name provided could be any one of several qualified or unqualified names, or even "localhost", not to mention numerous IPs. When this happens, the server has to know when to handle a query itself, and when to pass it on.

The only thing I can think of that would work reliably, is for each server to generate a GUID on startup, and to set up a protocol to negotiate this with all connecting clients. That way if a server wants to pass it on, and the connection it makes happens to be to itself (via a name or IP that it didn't recognise) then it can drop the connection and handle the query locally. That way there can be no IPs or names that escape the server's attention (eg. dropping and restarting a network connection could allocate a previously unknown IP to a server).

So I'll be spending my next few days working on this, plus the documentation mentioned above.

Other than that (and filing my tax return) it has been a very quiet day.

Firefox and Virtual Desktops
I've been trialing CodeTek Virtual Desktop, and overall I've been really happy with it. It's a little expensive (given that I only have Australian dollars to play with), but I think I'm using it enough to justify the expense when my free trial runs out. The only thing that makes me wonder if I should get it, is that Apple might try providing virtual desktops as a built in part of Tiger.

One problem with the CodeTek software has been that it doesn't play well with Firefox. I normally use Safari as it has a full Aqua GUI and is well integrated, but Firefox is the browser I use for Blogger so I get all the cute features in the editor. (Camino would seem to offer the best of both camps, but it does not properly render all pages, eg. The Blogger editing page).

When I click on a link to create a new post in Blogger, or I change to another application and back, then I cannot get Firefox to accept any kind of keyboard input. Initially I was thinking that it was a Firefox problem, until I discovered that I can get the focus onto Firefox when I change virtual screens and back. Firefox is the only application I've found with this problem, so it seems to be an interaction bug. I suspect that it is because Firefox uses Carbon, while almost everything else I use is based on Cocoa (maybe I should check out IE?). Firefox is about to start moving over to Cocoa for OS X, so this problem might eventually go away even without any intervention from CodeTek.

Wednesday, November 17, 2004

More OWL
Absolutely nothing of interest to report on today.

I read more OWL documentation. I find it interesting that I can read one part of the documentation and not think much of it, but after I learn more about OWL and come back to the same document I discover something really significant that I didn't get the first time through.

As a result, I'm reading this stuff, and re-reading it. Once I work on it a bit more I'll probably have to come back and re-read it again.

One set of things I really need to search for are any papers which refer to applying DL to OWL, such that OWL DL was arrived at. Ian Horrocks probably wrote some of that, so I should go looking there.

Phone
In the midst of this, my new phone arrived. It's a Sony Ericsson K700i, and I got it on my current account for no extra money. Apparently I'm already being ripped-off adequately. :-)

I just wanted a phone that can do Bluetooth, so I can sync it to my PowerBook. Well it does that really well (I've been impressed at some of the connectivity features, both built-in, and through Salling Clicker), but the other features look very nice as well. I comes with video (no 3G unfortunately, so it only works with MMS) and a snapshot camera, has a very large screen, and runs Java programs really nicely (it seems to be implementing Java3D. Does this phone have hardware 3D support?). All of this comes in a small and light phone, which is another important consideration. Unfortunately, it has a sharpish edge which I find uncomfortable against my ear. Oh well, maybe I'll get used to that.

There are a couple of configuration issues that bother me. First, I can't change the volume of things like the camera snapshot sound (it's REALLY loud). I also wish I could change the volume of different ring tones. For instance, the main ring tone is just fine, while the MP3 I've recorded for Anne's ringtone is a bit soft. In fact, all recorded MP3s sound soft, so it would be nice to either modify the levels on the recording, or else just allow each item to be played back at an individually assigned volume.

One feature I like is that I also have individuals' photos come up whenever I get a phone call or message from them. I'm used to individual ringtones, but I quite like this feature (though I'll probably get over it soon enough).

Overall, I like the phone. I suppose I should start brushing up on my MIDP again.

Tuesday, November 16, 2004

CVS
I mentioned to AN how he had not called setRows on the "trans" and "walk" constraints when he added the code which called getRows on all constraint types. He insisted that he had, and so we looked at the CVS logs.

Sure enough, he had put this code in, almost verbatim to my own changes (he used an extra space when casting the constraint). CVS dutifully logged both of our changes, so it was plainly visible that we had both made this modification.

So how was I not able to see AN's code? I had done numerous CVS updates, and even checked in my own modifications (overwriting AN's), all without complaint from the system. There were no conflicts or updates to be seen on this file.

So I'd wasted Monday on a bug that should never have existed, and Friday seems to have been wasted on a similar problem (although the HybridTuples problem found while trying to print the constraint tree was an important fix). Similarly, I'd witnessed CVS failing for AM a couple of weeks ago. So what is the deal here?

I mentioned this problem to TJ at our morning meeting. Unfortunately, I have no idea why CVS might be behaving this way, nor any reasonable suggestions for an alternative file repository.

XA API
I went back to the RemoteResolver code to see what I needed to do with the getXAResource method. Looking at the javax.transaction.xa package just confused me, as I couldn't work out where the Xid was supposed to come from.

I finally started searching for examples. The most informative I could find was an IBM site. Once I got to see how client code uses this interface I got a much better idea of what was required... and it became clear very quickly that the Resolver interface is not built to be able to handle this API.

The main problem is that the transaction manager needs to associate an Xid with a session. A session will get associated with a RemoteResolver instance, so the Xid will really get mapped to a RemoteResolver instance (in fact, it will get mapped to a set of resolver instances). So when the Xid is first used (by calling XAResource.start(xid,...)) the mapping to the resolver needs to be set up. This would usually be done by creating the resolver in the start(xid,...) method. However, the getXAResource() method is currently a non-static member of Resolver. This means that the Resolver object had to be created before it could be mapped to an Xid, which could cause problems if it gets used before a transaction that it will become a part of. A bigger problem is that the XAResource will be associated with a single resolver session, and yet it is supposed to manage numerous sessions with different Xid's. Since this last proposition is impossible, it becomes clear that getXAResource() cannot be attached to the Resolver class. The appropriate place is probably the ResolverFactory, though that will still restrict it to a single Resolver type when XAResource objects are supposed to be able to manage an entire transaction (involving multiple sessions, probably on multiple Resolver types).

The upshot of all of this is that getXAResource() cannot be implemented on the current interface. Ideally we'd remove it before it got released in the next few days, but the DummyXAResource objects which are currently being returned are being used for some logging, so there would be significant changes needed, and we don't have the time.

OWL
I saw Bob again today, and we discussed OWL a little more. I'd actually done some work this time, so it all went smoothly. :-)

I still need to write up something about the differences (and reasons for those differences) between the OWL species, however I now have some new priorities. The first thing I need to do is to get hold of a copy of the Description Logic handbook. There is a copy in the library, but it is currently out. It is a relatively recent publication, so it is unlikely I will find it online (though I can find the chapters for it easily enough). While discussing this, Bob was able to explain some of the differences between Description Logic and Datalog. I knew some of it (and I still need to learn more), but it has been a little confusing for me as different people have referred to "DL" with some meaning Description Logic, and others meaning Datalog. I understand that DL is supposed to mean Description Logic, but not everyone seems to get that. At least I understand more of the differences now.

The other thing I need to do is start writing my confirmation. Theoretically I don't need to have it done until next September, but I want to do it sooner. Unfortunately, the head of the school will be away for some time, so anyone who wants theirs done before August will need to do it in February. This timetable gets squeezed as everyone (including Bob) will be away for the Southern Hemisphere summer. So if I want any feedback from Bob it needs to be in the next 3 weeks. After that I won't be seeing him again until February... which is just before the confirmation will be due. Consequently, I'm dropping everything else to work on the start of the confirmation over the next couple of weeks.

Fortunately, after working out that I couldn't do anything with the XA API, TJ asked what type of resolver I want to work on, and was more than happy with my suggestion of an OWL resolver to representOWL Abstract Syntax. So while I have to work on my confirmation out of hours, I can continue to work on OWL at work. :-)

I spent the rest of the day (what was left of it) going over OWL documentation. I'll be doing the same tomorrow.

Monday, November 15, 2004

Trans and Walk
For some reason the trans and walk JXUnit tests started failing when I tried to check the new bug fixes from Friday.

For some reason, recent configuration changes have caused the output from JXUnit tests to no longer include stack traces of any exceptions which are thrown. This is really annoying, and it means that it can now take some time to duplicate the conditions of the error, just to see the trace.

Once I had the exception being thrown, I was able to look at the offending code and discover that the problem happened when the Tuples.getRows() method was called on a transitive constraint, or a walk constraint. This was a problem because the constraint had never had the row count set, which made calling getRows illegal.

For some reason, other people had this code working correctly for them. I did a CVS update (several times) with no effect, so I thought I was up to date. However, I could not work out how others could have this code passing when I couldn't.

I asked AM about this, and he explained that AN had needed a workaround for a problem, so he had re-introduced the setRows() and getRows methods for use with a ConstraintExpression.getCanonicalForm() method. AM was not too happy about it, as it took a previously immutable object, and made it variable.

So now that I knew where my problem was coming from, the solution (which AN's hack was still in place) was to set the rows for the "trans" and "walk" constraint types. Once this was done everything worked correctly, and I could check all the bug fixes in for the release that TJ was orchestrating for the day.

This left me free to pursue some documentation, which was boring, but necessary.

Friday, November 12, 2004

Debugging Joins
AM and I spent a lot of time today trying to debug the joins which occur when using a remote resolver.

The first part was trying to print out the constraint expression tree. Yesterday we discovered that this was causing a NullPointerException, and we spent some hours tracking it down.

The problem occurred during the sort of a Tuples. This created a new HybridTuples (via a factory method, so we didn't see immediately that the returned type was really a HybridTuples), and it was in the HybridTuples constructor that we discovered that the incoming Tuples object was being kept, but not being cloned. Consequently, when the tuples which had been given to this constructor was closed later, the HybridTuples object became invalid.

Once this was solved, we were able to observe the constraint tree. It was immediately apparent there was a problem, as the entire tree was being duplicated, with the results of both sides being unioned (and duplicates removed). While this worked for some queries, it created a problem for others when the joining code expected the columns of the tuples to be ordered in a particular way.

AM had seen something like this before, and discovered the problem in a piece of code which builds up the constraints. It was a simple typo with the wrong constraint variable being used.

A concern here was that AM says that he has fixed this code before. This seems similar to a problem he had a couple of weeks ago where a "cvs update" was not getting the latest version of a file. We could look at the CVS log for the file, and everything seemed in order, but for some reason the latest version of the file would not come down for him. In the end he had to delete the file and get it again. In this case he was lucky that it was not a file he had modified.

Thursday, November 11, 2004

Working Exclusion
OK, I'm writing this one really late (by 5 days), but here goes...

This was the first day that I felt I was really productive for the release since last weekend. I got a bit done for the release before now, and I got a lot done on the paper with DW and TA, but I have obviously been tired from the Noosa tri. I'll make the effort to take time off after it next year.

This day was spent debugging the excludes code. I finished the port of the ConstraintNegationTuples class to StatementStoreInverseResolution, which was at least enough to make it run. Of course, it didn't work, which meant that I had to sit down and figure out exactly what it was doing. I'd love to put a full description in here, but since I'm having to go back a few days to remember, and because I'm currently a little short on time I'll just let the reader look at the source code and figure it out for themselves.

The first thing to recognise was that there are two situations. The first is if the Metanode (ie. the model) was fixed. When this happens then it is necessary to slice that model out of the statement store in order to do the inversion. This has consequences for the beforeFirst method, as the first integer will always need to match the given model. It also means that the iterator has to be limited to a single model.

The second situation is for a variable metanode. This ranges over the entire datastore, and the code is simpler. There were just a few tricks making sure it worked side-by-side with the fixed metanode case.

Once I had it all running I put it up against the relevant JXUnit tests, and it all worked. I was very happy. :-)

Remote Joins
The next thing to work on was the Remote Resolver tests which were failing. According to the symptoms from ML, it sounded like the join code was failing. I don't know a problem like that would only show up against this particular resolver, but since it does it falls to me to fix it.

Given that it appeared to be "join code" that was at fault, I asked AM for some assistance. He was happy to help, and immediately suggested a logging statement to observe the structure of the constraint expression. The system immediately failed with a NullPointerException. We spent a little time looking for the culprit, but came up with nothing before calling it a day.

Wednesday, November 10, 2004

Exclude
I got a lot more done today, but I didn't finish the exclude code as I'd expected. I started out by realizing that the ConstraintNegationTuples has a second constructor that I hadn't noticed before, and this one accepts the parameters I'm interested in. After that, it was a matter of only 2 methods and it would be a full Resolution object, so I copied it over my earlier work and modified it.

Of course things never go quite as smoothly as planned. The semantics of the statement store are slightly different, so I had to update a few things. Then when I ran the code I discovered that this inverted resolution class wasn't even being created. Quite a bit of debugging later, and I discovered that the Constraint object I was being given was not a ConstraintNegation. I was also able to log the fact that a ConstraintNegation was being created by the client, and received by the server. So somewhere after that I was losing the information that the constraint was inverted.

It doesn't sound like a lot, but the last two paragraphs were actually quite a bit of work. Certainly more effort than over the last two days.

Java 1.5
Tonight I tried to continue the debugging at home, but that led to its own problem. I've installed Java 1.5 at home, but this doesn't support Kowari yet. So I could either go back to the old compiler, setting up the environment variables, etc, or else I could try and get Kowari to compile under 1.5.

Since it was after hours, I figured I could do the more enjoyable thing and had a go at porting. :-)

The first class to have problems was PIErrorHandler. This includes the class WrappedRuntimeException, which is an internal class from the Apache XML support library in Java 1.4. Speaking with KA I've learned that he was intent to obtain the source of all exceptions. Since this class can be thrown, it is necessary to pull out the wrapped exception to find out the real source of the problem. The best way to do this seemed to be to use reflection to find the getException method. This matched up nicely with a method of the same name in the SAXException class which can also be shown.

The next problem was with another "1.4 only" class called SAXSourceLocator. Again, this is an internal class for XML which is not included in Java 1.5. However, this time it meets a publicly available interface: javax.xml.transform.SourceLocator. There is no public implementation for this interface, but there is an identical interface, org.xml.sax.Locator which does have a public implementation called org.xml.sax.helpers.LocatorImpl. With identical methods, all I needed to do was merge the two into a class which extended LocatorImpl and SourceLocator. I liked the way this worked, as it didn't borrowed the implementing methods from LocatorImpl for the unrelated SourceLocator interface, and it required no code to do it. It made sense to me, as the interfaces were created for the same reasons. Maybe they should have been merged already.

The next problems come from the changes to Xalan in the latest JVM. This is due to the need to implement XSLT extensions. Reading the online documentation, this requires the use of org.apache.xalan.extensions.XSLProcessorContext and org.apache.xalan.templates.ElemExtensionCall. Unfortunately, Java 1.4 never made these classes publicly available, and the version of Xalan that comes with 1.5 doesn't include them.

I've been looking online for the latest way to implement XSL element extensions, but the only examples I can find are for the previous version of Xalan. The latest version of Xalan includes the new XSLT Extensions library, which is an example of this, but I've yet to find any documentation on how it was done. At this point I'm thinking that I will need to find the source code for this library in order to find out how it is done. After that I'll be trying to work out how to do a portable implementation that will work with both versions.

Tuesday, November 09, 2004

More Paper
I spent today bouncing back and forth between coding and helping write the paper for WWW 2005 conference, to the detriment of the code. I made some progress with the Resolution implementation, but it was pretty slow. It really needs to be based on a ConstraintNegationTuples, but this class has a lot of things in it that are irrelevant to the new code base, and the constructor takes parameters that don't really work in the new code base either. Unfortunately, rather than concentrating on it I kept getting interrupted to work on the paper.

As for the paper itself, it seems to be coming along well. Initially I was trying to make sure it would be ready for TA and DW before the due time, but then there was an extension to 5pm due to server problems. I find that funny: the WWW is having web server problems. :-)

Basically, the paper just describes the internal workings of Kowari, and some of the work we've been doing. TA likes it overall, but I'm not that satisfied. For a start, I haven't had the time to spend on it that I would like. To my eyes, this really comes out when reading through as there was not enough time to smoothly merge the text that DW and I have written. Consequently, both of us would refer to the same things, only in different contexts. This had the effect of making it feel less structured than I would like, however I may just be extra critical because I wrote some of it.

Through the course of the evening I got to do some more corrections, so quite a lot of time went into this paper today. Ideally I'd have liked to spend more time on it, but over the course of a week or two, particularly given that this is for an international conference. Oh well, hopefully it will get accepted.

Spelling
I've noticed that the spell checker on Blog Spot has died again. I know this should make me do a better job proof reading, but given limited time, I don't think I'll be bothering.

Monday, November 08, 2004

Inverse Resolution
I spent quite a bit of time catching up on miscellaneous paperwork today, so I was less obviously productive than usual. That's a little annoying, as I plan on having the exclude statement working by Tuesday afternoon.

I spent a little more time converting the StatementStoreResolution class into a StatementStoreInverseResolution to do exclusion for the XA statement store. I didn't get the time to finish it, so I will be picking it up again in the morning.

As for handling an inverted constraint, I've been wondering if there should be a method on the resolver interface to test for that capability, or if an exception should be thrown. It seems cleaner to have a method to tell the calling code that it can't pass in an inverted constraint, but the result of that would probably be to throw an exception. So maybe the calling code doesn't need to know ahead of time, and an exception can be thrown by the resolver. That's probably how it's working by default, but I'll check with SR that he is happy with that.

Paper
In the meantime I've been writing parts of a paper for the WWW conference in Japan next year. I got a little done on Friday night before I left, and I'll be working on more tonight. It's been a bit difficult with the timing of the triathlon, but now that is done I'll have to concentrate on work and study.

The paper is for Tucana, but hopefully DW will let me put UQ down next to my name as well. That way it is for work and for uni. I just wish I'd had more time to sit down and concentrate on it.

Friday, November 05, 2004

Exclusion
This is my second day of really brief notes.

According to AN I should have been able to take the exclude code from the old code base, and copy it into the correct position in the new. To do this I started searching for the appropriate code in both code bases. It took some time to confirm, but I eventually concluded that this was not possible.

I also got to see the full extent of the differences between constraint resolution in the old code base and the retro-version in the new (the newer version of this code is in the old code base). This made it very clear that we need to port the constraint resolution code into the new system as soon as possible. There are several reasons for this. The most important is that a new constraint type changes a lot of code with if statements in the new code base, but only needs new classes in the older code base (older system, but newer code). This makes the older system more modular and easier to understand and modify. We will want these changes made quickly, before much more work is done on the new code base, making the changes harder to do (a situation we found ourselves in when trying to move everything over to the current new resolver code base).

In terms of the "exclude" code, I've worked out where the changes need to be. In the old codebase it was wrapped around a normal constraint, and inverted the results as they came back. The new architecture does not allow for this. The best example of this is shown with a remote resolver. It is impossible to return all data except the result of a constraint because the only data available is that result. There is no way to get the rest of the data from the server (without constructing another query).

So each resolver will need to handle an "inverted" constraint and return the correct answers. For our purposes, we only need to be capable of doing this on the StatementStoreResolver to get our old functionality. However, in future we may need to consider mandating that all resolvers be capable of handling an inverted resolver.

In the meantime, the StatementStoreResolver code needs a new Resolution object (basically a Tuples) which can return all data except that which a constraint resolves to. This won't be too hard, but will take a little while to code.

Thursday, November 04, 2004

Other Stuff
With Noosa being run on Sunday the 7th I was a bit too distracted to sit down after work on the 4th and 5th to write about my day at work. So once more I'm writing after the fact. This is just a couple of quick notes to remind myself of what I was doing on those days.

TKS Tests
Most of the day was spent on TKS, so I could confirm the distributed query tests. I was led to believe that all our recent work had been moved over and would be running. In particular the Remote Resolver code appeared to be in there, so I figured it should be quite easy.

Instead I discovered that I couldn't build the code at all. After struggling with missing classes, tests I expected to see and couldn't, and configurations which didn't mesh with my understanding of the architecture, I discovered that the recent work had not been moved into TKS yet. TJ suggested I move onto something else.

One misleading aspect of the TKS code was the presence of the Remote Resolver classes. These have only just recently appeared in TKS, and I don't understand why they went in when nothing else has. Now that I think about it, I should look at the CVS log to see who did it, and then I can ask them.

In the meantime, I started looking at implementing exclude on the new codebase. In the course of this I've discovered that all the recent changes to clean up and modularize the constraint resolution code has been left behind in the old system. AN is rather frustrated at this (as am I), but I suppose we need to get the new codebase running and passing all tests before we start bring forward features like this.

Wednesday, November 03, 2004

Debugging
The code from yesterday did pretty well, though there were a few errors. Overall the structure worked out, and it was just a few little things that wouldn't work. For instance, I had an array of SPLimit objects for each type, but SPObject (the parent class) will not accept a data type of TCID_FREE (0) so I had to leave that array entry at null.

The only significant issue was in a line which renames the variables of a tuples to those of the requesting constraint. Since the data has been sorted, it is contained in a HybridTuples object. It appears that the renameVariables method in this class expects to see variables which are named "Subject", "Predicate", "Object" or "Model". However, I had copied the Tuples returned from the XAStringPoolImpl which names the single column "gnodes".

While I could just fix this by renaming my single column to "Subject", I was at a loss as to why the method had previously worked on the Tuples returned from XAStringPoolImpl. I eventually realised that this was because the tuples was always renamed before being resorted into a HybridTuples, so the HybridTuples.renameVariables() method had never been called on data returned from XAStringPoolImpl before.

Looking at the code from various angles, I was able to determine that the name "gnodes" was just a convenient placeholder, and that "Subject" was not only more convenient, it also made semantic sense in the context of where this code appears.

Once it seemed to be running, I gave the code to ML to try with his code. Unfortunately there had been no unit tests for this code, and I was running a little short on time to build a comprehensive set, so letting it run in context was my best bet. There was a small bug, where I hadn't realised that the limit parameters could be set to null to indicate no limit, but I fixed this easily, and after that it worked as advertised.

Running everything in the unit tests showed up a slew of new failures in the JRDF tests. This ended up being because these tests required all new resolvers to be in the classpath, as it does reflection on each of them. Adding the nodetype resolver to its classpath fixed this. To prevent this in future, RT (who is maintaining that code) changed the build script to include the whole distribution in the classpath for that test.

Now I have to move the Remote Resolver from Kowari and into TKS, so that the distributed tests in TKS can validate this code. I've checked out TKS, and I've started exploring the paths (again). I'm not really sure what to do, but hopefully I'll figure it out early tomorrow.

Tuesday, November 02, 2004

In Memory String Pool
I spent the whole day on just two aspects of this class.

The first part was trying to create a new type of object which matches a given type of SPObject, and compares either larger or smaller than every instance of that type. I called this class SPLimit. I had initially hoped to create a kind of SPObject whose byte buffer would always compare larger or smaller, but on closer inspection I realised that the byte buffer could be translated into almost any kind of data type, and this plan would not work.

Instead I had to override the comparator to enforce my semantics. To avoid excessive usage of the expensive instanceof operator, I introduced a new method to SPObject called comparatorOverride which returns a boolean. The comparator calls this on its arguments before deciding to proceed normally or with my new comparison operation.

I also re-worked the existing lowValue-highValue version of the findGNodes method to make it a little simpler and straightforward. I also noticed that I could probably avoid creating a new set for appending an SPObject if I play with the iterators a little more. I haven't done this yet, but it may be worthwhile, as it would allow subsets to be exclusively used, saving significantly on memory in some circumstances.

The other part of the code was a new Tuples object for returning from the two findGNodes methods. Internally I had a sorted set of the required SPObject, but the required data was a Tuples of local nodes (which are all long integers). This was a relatively simple wrapper which does a translation from SPObject to long as required. Unfortunately it took a long time to implement something so simple (and get it right), as Tuples has 21 of methods other than the constructor.

I left the SetWrapperTuples as a private inner class of MemoryStringPoolImpl. This was partly to hide it, and also to provide access to the private maps that the MemoryStringPoolImpl holds, so it can do local node lookups on the fly.

This took me all day, and I only just got it compiling by the time I had to leave, so I won't be testing and debugging until tomorrow.

Thinking
I saw Bob again today, and discussed a couple of useful things. I hadn't done much as I should have, but I figured I could wing it. However, toward the end he confirmed with me that after the triathlon this weekend that I would start "thinking again".

Oops. Looks like I got caught out. :-)

While I'm at it, this blog has been suffering lately as well. Perhaps I'll start putting some more effort in again after the weekend. In the meantime, I'm just concentrating on exercise, food and rest, so my technical interests are taking a back seat.

Monday, November 01, 2004

Memory String Pool
After removing the exception that was thrown when the node type resolver is created with a canWrite parameter set to true, I found I was still getting the error. This was simply due to the file not being re-built, so I mentioned it to ML, who has been fixing these things. He told me later that it was a couple of items which were out of order in the Ant script. Later in the day I did an update, an a file which had implemented a new method wasn't rebuilt before a calling class was compiled against it - resulting in a compilation error. ML claimed that this problem could not be fixed, which I'm a little surprised at.

So for the moment, I'm still having to do clean rebuilds.

With the exception that I mentioned above removed, I was able to create models of the appropriate type. That just left me needing to perform queries against it. While looking at the code I realised that the ResolverSession implementations were returning sorted data from the findStringPool* methods. I was previously sorting the returned data so that I could append it, but since I had already appended the data from two string pools, it was already sorted, and I could use it in an append operation without further processing. I was worrying about the excessive use of new HybridTuples objects so this was pleasing to see.

I had a little minor debugging to get through before discovering that the in memory string pool does not implement either of the findGNodes methods. Looking internally, I discovered that this string pool is implemented with hash tables, making these methods impossible. ML asked me about this at almost the same moment I discovered it for myself, so I had to find a fix that would suit us both.

Fortunately, the SPObjects which are stored in the string pool are all comparable. This means that they can be stored in a SortedSet. While this will use a little more memory, the objects are already being stored, so the only space overhead will be that of the tree structure for the set. So I've added a new index of the SPObjects which just stores them in order.

The first findGNodes method was relatively straightforward. Using the SortedSet.subSet method did a lot of the work, though I needed to fiddle a little with the results in order to appropriately include or exclude the first or last item, according to the parameters of findGNodes. I still need to write a small wrapper class which can convert the resulting subset into a Tuples.

The second findGNodes method needs to be able to find an entire data type. The most efficient way I can think of doing this would be to create a couple of SPObjects which are guaranteed to be the smallest and largest of a data type, and use the SortedSet.subSet method again. Hopefully that won't be too hard.

Working notes