Monday, November 29, 2004

Busy, busy, busy
Well I've just been through a rather full week, both at home, and at work. As a result, I didn't log on to blogger once. Darn.

So what did I do? Well if only I'd been blogging as I went I'd have an easy answer to that. Since I haven't, I'll just have to write a few loose and disconnected notes. It won't be a complete description, but it will give a vague idea of what I've been on about.

I've been pretty slack about this blog in the last few weeks. It is obviously a very busy time of the year from a social context, and I'm tending to go to bed earlier at the moment due to training (I do a lot of blogging at night).

Actually, the training is going quite well, and I had a great race on the 21st. I was aiming for an hour, and was initially disappointed that I got 1:00:18, but then I discovered that the bike leg was 1.8km too long (they wanted to put the turns at appropriate places in the road), so I'd have made the time if I did the advertised distance. I have a longer race on this Sunday, and the last race has inspired me to work extra hard... which has led to me blogging less.

Sorry, I don't usually write much about training, but I've been really happy with my progress lately. Oh, and for anyone who needed proof I was at Noosa, you can find some pics at Supersport images if you look for BIB number 2426. (You probably can't see it, but my bike has a flat front tire - very frustrating).

I'll try and make more of an effort with blogging this week.

In the meantime, here are a few notes on the past week:

Monday - 22nd
TJ is away training a client this week, so things seem a little more relaxed today. I started the day looking at doing more of the "OWL inferencing with iTQL" documentation, but by the end of the day I was back onto the looping bug.

Just to describe this bug again...

DM has put some effort into divorcing model names from locations. Consequently, it is now possible to move models from one machine to another. However, we still don't have any way of separately describing locations and names, so model names still have their location in them. The difference is that the model can be moved and the new location in the name will reference it correctly. (In the past the whole name had to be the same, including location. This is why models couldn't move from one machine to another).

The problem with the current code is that it expects models to be restricted to a specific set of names on the current machine. Hence, a model which is called rmi://machine.domain.com/server1#foo must be referred to by that name. It can't be called rmi://machine/server1#foo even though that machine name resolves to the same address.

This problem manifests when a query is made on a name that the server does not recognise. If the name rmi://machine/server1#foo is queried, then the request will be sent to the address for "machine". Since this server does not recognise that particular name (it expects the names machine.domain.com or localhost) then it forwards the request on to the server named "machine". Not only do we get infinite recursion, we even do it through RMI! :-)

I had thought that I could try and compare the hostname in the model name. However, this isn't a complete solution, as it is always possible to miss an IP address on the current machine, or a DNS server could hold a name for a host which that computer has no knowledge of. As a failsafe, the host has to be queried for its "servername" and this will get compared to the servername on the connecting client.

The servername is found in the startup class EmbeddedKowariServer, which determines the name from the user configuration, or else calculates a default name. However, as the program starts the EmbeddedKowariServer class is unable to see the RmiSessionFactory class in its classpath, as it isn't until later that the appropriate class loaders are set up.

Similarly, the RmiSessionFactory class is unable to see EmbeddedKowariServer in its classpath while compiling. However, I experimented a bit with reflection, and discovered that EmbeddedKowariServer was available at runtime. That took me until the end of the day.

Tuesday - 23rd
I started out by modifying the session interface so it could test both ends of an RMI connection in the constructor of RmiSessionFactory. If they compared equal, I initially expected that I could silently change the RmiSessionFactory to return new local sessions instead. However, I had difficulty getting access to a class which would do this, and after some discussion with DM and SR I changed tack. Incidentally, all of my suggestions on how to proceed were considered to either be inappropriate, or else impossible by both SR and DM. However, some of what I'd already done was working fine, which invalidated some of the "impossibility" arguments I was given. :-)

When the RmiSessionFactory constructor detects that the client and server are on the same machine it now throws a new exception called NonRemoteSessionException. I changed the SessionFactoryFinder to pick up this exception, and fall back to using a local connection. However, even though the code for a local session was all there, the class paths were not. Asking SR about it, he said that this code would never be run anyway, and that it should probably be excised (he was wrong, but I didn't discover that until the following Monday).

This led to a discussion which resulted in two things. The first is that I needed to try to recognise the name of the current machine, and change the requested model wherever possible. That means picking up all the known IP addresses, and comparing these to the results of a DNS lookup on the host name in the model name. This would catch almost all problems. The second part, was to continue to test the client and server for their names, to prevent loops where the first method fails.

Wednesday - 24th
The first half of this day was spent at the WISE conference being held here in Brisbane. My school organised my attendance, but I could only afford a small amount of time at it. Fortunately, I got to see my supervisor delivering a paper on his current work, and it explained a couple of points for me. Most of the other presentations were a bit lacking.

The worst part was that many presenters were not there. In some cases they had been denied visas, but in quite a few cases presenters had shown up for registration on Monday, and had then spent the rest of the week at the Gold Coast instead. I have no objection to them not attending for most of the time, but it just seems rude to not show up for your own presentation.

I also learned something surprising about conferences. Bob wasn't impressed with a few of the presentations, and so I asked if the presenters were ever given feedback on their performance. I was surprised to learn that this never happens. The closest thing to feedback is the number of questions posed to the presenter, but this could indicate numerous different things, so it becomes hard to judge.

As a future presenter, this has its good points and bad. As a plus, its nice to think that I'd gain credit from the school simply for presenting the paper, and I can be as bad as I like. Indeed, I can even leave a conference feeling really good about myself, no matter how I performed (who wants to have their ego deflated?). However, the negatives are probably bigger. Unless I have some brutally honest friends about I'm unlikely to get honest feedback, which will make it very hard to improve. Lets face it, no one is great on their first presentations, so I will be looking for every avenue for possible improvement. It's just a shame that the obvious one is not there.

After returning to the code I started on the methods for renaming models to a canonical form. I continued some of it in the evening, but it was not quite finished on this day.

Thursday - 25th
It took me a few hours, but by lunch I'd completed the methods to return canonical forms of a model name. Model names had to go back and forth between local and global space, which made it a little less straightforward, but it seemed to work well.

The next job was to find all points in the code which attempted to use a model name, and make sure it was replaced with the canonical version. This mostly happened in DatabaseSession but when I'd finished I did a little grepping and discovered that it was needed in some new classes called SetModelOperation, RemoveModelOperation and ModifyModelOperation. The appearance of these told me that each operation is starting to be factored out into its own class. Later in the afternoon I was having an implementation discussion with AM and discovered that he is the culprit. I'm expecting to see more operation classes soon.

By the end of the day I was reasonably confident that this bug was nailed. Everything appeared to be working, so all that remained was a full set of tests to be run in the morning.

Friday - 26th
The tests failed. Doh!

Unfortunately, our full set of tests now takes some time to run, so this stage can get very slow. I got to spend a bit of time reading documentation while tests ran. It can be very frustrating.

I finally tracked the problem down to the Kowari Descriptors. These are essentially clients which run within the server. When they try create a model they explicitly ask for an RMI connection to the current server and send the request out on that. Since the client and the server are on the same machine, the RmiSessionFactory constructor throws and exception, and the test fails.

So what went wrong? Well it turned out that I had a lot of my code for handling the problems intercepting the Resolver layer. I was under the impression that all requests were now supposed to go through this layer. However, the ItqlInterpreter does not do this. Instead, it obtains a session factory directly, and asks for a new session from it. Since the model will always start with "rmi://" then the factory will always be and RmiSessionFactory. I discussed the problem with SR, and he explained the code which did some of this work (to the annoyance of KA, who was trying to work in the same room. Sorry about that KA).

SR suggested not setting a session to a server, as this was supposed to leave the session connected to the local server. I was skeptical at this, as I couldn't see how the local server was ever set, but I gave it a go anyway. Unfortunately, this simply resulted in a lot of errors referring to a NullSessionException.

Closer inspection of this code revealed that ALL connections were being made with RMI, and only the RemoteSessionFactory is able to talk to a database on the same machine as itself. Since most clients are in a separate JVM this is not a big issue, but it certainly causes problems in some circumstances. More discussion with SR led me back to Monday's (now deleted) code which attempted to get a local session when an RmiSessionFactoryconstructor failed. However, that had failed on Monday, so now I needed to figure out how a RemoteSessionFactory was able to get these sessions for itself.

And that got me to the weekend (now that I know its OK, it gives me some kind of perverse pleasure to start sentences with a conjunction. Not really sure why. Maybe it's because I know I'll be annoying someone out there who thinks that you can't do it). :-)

I was no where near as productive as I'd like, and I hate that. In particular, I'm frustrated that haven't been able to return to working on OWL code. At least it will give me incentive to get this done as quickly as possible.

No comments: