Working notes: 04/01/2005

Monday, April 25, 2005

Weeks
Wow, is it really that long since I last posted? Well, I have been busy. Still, this log can save me time if I'm careful, so I shouldn't continue to neglect it.

So why am I so busy?

Confirmation
Yup, this ugly beast has reared it's head at last. It shouldn't be a big deal, but since I'm trying to work full time it has been almost impossible to find the time to do it. Consequently, I've been spending my evenings and weekends working on the report. That's the time I normally spend on this blog, which is why the blog hasn't been attended to.

Anyway, the report is now written, which is good, because I have to submit it today. I was not aware of the timetable for this, but I was informed on Friday that I really needed it in by Tuesday, and even that was late. I'd hoped to have it finished by then, but I really wanted some feedback before submitting it, and now it looks like I'll have to forgo that.

Now I need to work on the presentation. I know what I need to talk about (after all, I've been discussing it for months, and I've just written a report on it), but the slides will take me some time. It's been a while since I've had to do this.

During that, I have to try to be careful of my exercise and sleep, as I'll be at the Mooloolaba triathlon this weekend. My recent illnesses have taken their toll, so I won't be doing the whole thing. This time I'll be in a team, so I'll just be doing the 1500m ocean swim.

Somehow in amongst all of this, Anne and I are finally getting married in two weeks. What started out as a small event with immediate family and close friends has now grown. I'm looking forward to it, but I could do without the stress of all the new preparations that are required with a bigger event.

Collection Prefixes
I've now coded everything to match the collection prefixes. This went right down in the string pool, all the way to the AVL nodes. I hate going that far down for a high level concept, but it seemed necessary. There are well established interfaces to find the first of something in the string pool, but not the last of it. To simplify things, I decided to make the search only look for URI prefixes, rather than strings in general. If we want it working on strings, then the odds are good that we will want full string searching, so I didn't see the point in adding prefix searching in general.

This work took much longer than I thought, because of the sheer scope of the changes. Every StringPool and Resolver class needed modification to handle the new interfaces for searching by prefix. It turns out that there are a lot of them.

Once I thought I was finished, I discovered that I also needed to add support for the in-memory string pool. I'd forgotten about this class. I went in to see how it worked, and discovered that it was written by Andrew. So I set about trying to work out how the existing findGNodes codes worked, with the hope of modifying it. 5 minutes later I realised that I was the one who'd written this method! I'm guessing that if I search back in this blog I'll even find an entry for it (that reminds me... I should add a Google search to this page).

OK, so I knew how this method worked. It's based on a sorted set (a TreeSet). But was there any way to find the last element which matches a prefix in a tree set? I'd need to create a new comparator again, but this was getting beyond the scope of what I'd wanted to do. After all, I only wanted to find all the strings that matched a prefix.

So at this point I tried a new tack. To find the last string which started with a prefix, I tried adding a new character to the end of the string. I just had to make sure that character was higher than any character this might be legitimately found here. I started out with ASCII, but then I remembered that URIs can be in Unicode (I really should learn another language, I'd remember these things). So I went looking at the Character class, and discovered a character called Character.MAX_VALUE. So I added this to the end of the prefix and used this as my end point for searching. It worked fine.

In fact, it worked too well. Now I'm wondering if I wasted a lot of time with the findPrefix methods in the string pools. Strictly speaking, these methods are more correct than using the MAX_VALUE character to find the last string, as it is theoretically possible to use this character in a URI. However, in practice I don't think it ever could be. It certainly won't be used in the context of the rdf:_ prefix.

ASTs for iTQL
Writing the RDF for a set of rules is a time consuming task. I have everything I need for RDFS, but when I get on to OWL it will get a lot more complex. Since these rules are representing iTQL then a tool which can create the RDF from an iTQL query would make rule generation significantly faster and less error prone. I will really need a tool like this when I get to testing, for fast, reliable turnaround.

So I've spent a little time writing such a tool. This started by pulling out the SableCC parts from Kowari, and re-writing the semantics code. It meant learning more about SableCC than I thought I needed to, but I'm pleased to have picked it up.

In the process I discovered that it was going to take a little more work than I'd originally anticipated. Plus I'll need to consider how to pass in non-iTQL options (such as the URI for the name of the rule being generated). I'm considering using some magic tags in comments, like javadoc does, but I'm also putting that decision off while I can.

Anyway, it's a good intro for AST transformation into RDF, which I've been getting interested in recently. It's more work than I initially anticipated so I still have a lot to write on this. However, it's interesting, and necessary, so when I can I'll be spending time on it. That will be some time after the confirmation.

Thursday, April 14, 2005

Day Care
Once more, Luc brought something home from day care. It probably doesn't help that I've been working late most nights. :-)

So Wednesday and Thursday were very unproductive. I don't think I got a lot of programming done, but I did look at some debugging, particularly on the SOAP interface. This morning (Friday) I took a little while to get moving, but once I did I've been able to write quite a bit for the confirmation. Now that I've done that, I think I'd better use the weekend to make up some time on coding.

After dealing with CVS brokeness on Sourceforge, Richard helped me to find that the SOAP problem is due to machine names in model URIs. This was fixed last year, but I'm guessing that it was only fixed along the code path for RMI. I'll try and add some debugging (yet another attempt at configuring log4j) and see where the execution path goes. The machine renaming should either get moved into a common path (if possible) or duplicated into an appropriate area for SOAP.

However, I have enough to do at the moment that this kind of debugging will probably be procrastination. I guess I'll have to come back to it after I've confirmed the Masters.

Tuesday, April 12, 2005

Time to Write
At the moment I'm spending my days programming, and my evenings studying. That's not leaving a lot of time to blog! I'd love to write more, but it's just really difficult.

Evening study has become more pressing at the moment because I have my confirmation on the 29th of this month (that's Friday fortnight for those of you who understand the term). Because of this I'm thinking that I might spend my next few days "inverting" my working habits... study in the day, and programming at night. That's because I'm getting easily distracted from study in the evenings, but Anne thinks that when I write code I wouldn't notice a bomb going off. :-) Besides, I tend to be at my most productive when programming in the evenings. I always have.

Testing and Debugging
The subtraction operation testing went slowly, but smoothly. Once I got through the full suite of tests the first time, it didn't take long to discover that the test target I needed was test_tuples. That helped it all go faster.

I ended up with some problems getting the full set of JXUnit tests to work. While the simple case worked correctly, I tried some variations on the subtrahend, which led to some failures. The Difference was fine (the unit tests proved that), and the problem was due to sorting the arguments in the TuplesOperations.subtract() method. This ended up being due to an inverted test, but it took me a little while to find it.

The difficulty in finding the problem was because I could not modify the log4j configuration (again!). I thought I'd solved this problem months ago, but no matter which XML configuration files I modified, the system always picked up a file that was internal to the JAR file, and would not see my modifications to display debug info. Rather than spend another day on this, I just resorted to error messages, and removed them when I had fixed everything.

Whoever installed the log4j system need to document exactly how to modify the configuration for this. As I said, I thought I'd worked it out some time ago, but obviously not. This time around I modified the log4j.conf file in the main directory and in the conf directory. I then did a "clean" and a full rebuild, and there was no effect. I also did a "find" to look for any other XML files that might affect the configuration, but couldn't find any.

The problem with my desire to get the installer of log4j to document it, is that no one remembers who did it. I suspect Simon, but he doesn't remember. :-)

OWL Again
The next thing I worked on was the underlying semantics for OWL. Bijan has led me down a dark path, and now it seems I have to re-evaluate a lot of what I knew about OWL. It's strange and fortunate, but many of the changes in my understand have not resulted in practical differences to my implementation, although a couple have. At least it has led me to the documentation I need for a more complete implementation of OWL inferencing.

I should also respond to a few people who have written in about doing a closed world assumption for cardinality testing. I can now say with complete confidence that this is a bad idea. For a start, it is not OWL, and will break legitimate inferencing. You can't just switch between open and closed world like that. I agree that the closed world is better in many ways, particularly when working with databases, but it can't be used here, else it will make OWL something that it is not.

For those people who'd like these kind of semantics on cardinality, then perhaps it is time to create a "closed world" namespace. I don't know if it really works with an open world RDF, but if people are doing this sort of thing anyway, then it would be a good idea to do it in a carefully closed off area, rather that changing the meaning of OWL.

Resolvers
I could only spend a short while on that before getting back to the collection resolver. Well, it's just a sequence resolver for the moment, as I don't need to traverse the rdf:rest links yet. When I do need to handle those collections, I'll need to update trans to variables bound to a set, instead of single value bindings. This is proceeding smoothly, though it doesn't run yet, as I'm still in the row comparator for the string pool.

This comparator isn't as nice to use as the one from the type resolver, as it needs to look at the string pool entry from the block files, which means much more disk activity. I'll go with the obvious implementation for the moment (I don't want to prematurely optimise), but I'm wondering if there is a hint I can give to the index about where the rdf namespace starts and finishes.

In the best possible world I'd rebuilt the string pool as a Patricia tree. However, I don't know how I can do that and still maintain our read/write phases. Maybe I could build a Patricia tree on the side, and update it periodically. It wouldn't be guaranteed to give perfect answers all the time, but that may still be useful. After all, the updatedb/locate database on many systems works like this. Even Google falls behind changes around the world, and yet no one would claim that it is not useful. Unfortunately, that is one of those pie in the sky ideas that I'll have to put aside until I get some time.

SOAP
I had a request from someone at UQ today about the SOAP interface (I'm linking to Apache instead of the canonical site, because that's what Kowari uses). He was using Ruby to talk to the interface, so Java's RMI was not an option for him. Unfortunately, SOAP didn't seem to work for him either.

That made me think about our external interfaces. If you want to use Kowari from an external package, you need some sort of IPC to talk to it. RMI is great if your code is in Java, but what about those projects which are not? For instance, how about Ruby? SOAP would seem like a good idea, but there are apparently problems.

The first problem came about when the server recognised that the wrong servername was being used to access the local database. I was concerned about this problem, as I fixed it in November. I looked at my own source code, and discovered that the reported line number in SessionFactoryFinder was for a blank line, meaning that the JAR file being used was at least as old as November. I looked into this, and discovered that the suspect JAR is the latest binary download from Kowari. We had better get the latest one out there. (That is a hint Andrew!)

In the meantime, I asked this person to use a CVS checkout, only to discover that Sourceforge would not let him check out a copy! In the end, I had to use my own account to get a checkout. Bizarre.

While this is all a bit of a diversion from my real task, keeping this stuff running is still important. I'll see how far I can get with it tomorrow, and if it's taking too long I'll palm it off to Simon, since he wrote the SOAP code in the first place.

Again, the time is too short, so I'll have to leave it here. It's annoying, as I have so much more to write! :-(

Friday, April 08, 2005

Differences
The other big thing this week has been the difference code. I believe that I'd already mentioned that I seemed to have it working, but sorting the tuples resulted in an empty set.

After wasting some time on debugging, I tried another tack. The result of a difference is essentially just the original minuend, filtered by the data in the subtrahend. Hence, all the method calls on the difference calls which ask about the structure of the tuples should return exactly the same information as the original minuend (with the exception of the row count).

So what was I returning that was different to the minuend? For almost every method I was passing the call straight through to the minuend. The exceptions were next(), isMaterialised(), getOperands(), close() and clone(). None of this seemed suspicious, so why did the sorting code think that the Difference class looked different to the minuend?

Thinking about this for a while made me realise that Difference does not implement Tuples, but rather, it extends AbstractTuples. This means that it was providing some default implementations of a couple of methods, rather than using the minuend's implementation. I don't remember the exact methods now, but I think that the default methods I was using were isUnconstrained() and getComparator(). I tracked these all down, and implemented them myself, or else passed them on to the minuend.

After that, I tried the unit tests again, and had a lot more success. I still had quite a few errors, but in almost every case this was due to the unit tests being incorrect, so it was easy (though time consuming) to track them all down.

The only unit tests I had trouble with were those which included unconstrained variables. It appears that beforeFirst(prefix) is not implemented correctly for cases where the prefix includes UNBOUND. This is not so bad if the unbound variable is at the end of the prefix, as I could try to detect that and truncate the length of the prefix, but it would make the method very non-general.

Considering the general case, having unbound variables at the start of the prefix would require suffix matching, which is something that Andrae and Simon have pointed out is explicitly not supported yet. I'm not sure how I'd go about handling an unbound in the middle of a prefix, except by iterating or re-ordering the tuples. Neither of these are appealing options.

On reflection, I can't think of a use case where I need support for unbound variables in the difference, so I was more than happy to disable these tests. If I ever see a need for them I'll reconsider, but my current needs always result in bound variables.

From the passing unit tests I moved on to testing the iTQL. I'm pleased to say that this all worked perfectly on the first attempt. :-) I'm now putting together some JXUnit tests to script these iTQL tests. (JXUnit is certainly convenient, but it's frustrating how much effort goes into creating even a simple test).

I'll check it all in this evening.

Thursday, April 07, 2005

Q & A on Cardinality
I'm not blogging as much these days, which is annoying me. Instead, I'm working longer hours, particularly in the evenings when I normally write the blog. I'm pretty happy about the amount of work I'm getting done, but it would be nice to be keeping track of it better.

Over the last 3 days I've spent some significant time working over some problems to do with cardinality. Anyone watching the RDF Logic mailing list will have seen me working through some of these issues. I'll confess that it's a little embarrassing having the world watch me blunder along as I learn something, but at least I'm coming to a better understanding. Thanks go to Bijan Parsia for this.

So what is the deal with owl:minCardinality? Well there seem to be two broad cases to consider.

The first is that owl:minCardinality can conflict with another restriction on the domain of a predicate. For instance, there may be an owl:maxCardinality restriction which is for a lower number than the owl:minCardinality restriction. Or perhaps the range of the restricted predicate is a limited set of instances (from a owl:oneOf definition on the class of the range), and the size of the set of the range is less than the minimum cardinality. In cases like this, the class is unsatisfiable. That means that while the class can exist without conflict, any instances of the class will cause a conflict.

I had earlier thought that it would be illegal to have an unsatisfiable class, but this is not the case. The classic case of this is the owl:Nothing class. Allowing these classes does change things a bit.

So owl:minCardinality can be used to define (and detect violations of) unsatisfiable classes.

The second case is where owl:minCardinality is used to define a satisfiable class. In this case, it tells you that a predicate is in use, but it can't say anything else. If the knowledge base includes some usages of that predicate, then fine. If the knowledge base does not include the usage of that predicate, then you still know that the predicate is being used... but you can't know how it's being used.

From the perspective of supporting OWL minimum cardinality in a database (like Kowari), this means that the database can't do anything for you (except detect the use of an unsatisfiable class). It can't entail any new statements, nor can it detect an inconsistency that must be corrected.

For people who were hoping to use owl:minCardinality to restrict their models in some way, this will come as a disappointment.

I'm still thinking that it may be useful for some people if I allow for a "temporary" closed world check. That way, if they want to use minimum cardinality to check that a predicate is used at least that many times, then they can. This is an incorrect usage of OWL, but one that I think some people expect to have.

I'll work on the correct implementation first, and then I'll look at installing a switch that will allow of this incorrect (though potentially useful) behaviour. At that point I'll have to decide if I want to assume unique names or not. Yuck. :-)

Tuesday, April 05, 2005

Mach_override
The other day I read an interesting article by Jonathan Rentzsch, on the implementation of mach_override and mach_inject. I think it's interesting, because last year I wrote almost identical code for Windows. Reading this gave me a funny sense of déjà vu.

The main difference between Windows and OSX for this code, is that the mechanisms for doing it in Windows are not properly documented. There are articles for doing it on the net, but they all suffer from problems that I had to overcome.

For a start, Windows does not document that fundamental DLL files (like system32.dll) are mapped into the address space of every process at the same location. This is essential, or else it would not be possible to know where the function to override could be found. Fortunately, it has been well established by people like Matt Pietrik that this can be relied upon, though future versions of Windows could theoretically break this assumption.

More importantly, the only establish technique of code injection in Windows is to create a new thread in the remote process and have it load the code. Unfortunately, we discovered that this causes a race condition that can occasionally cause applications to crash. Since we were building commercial code I needed a better solution.

The solution here was to create the process with the primary thread already paused. Once the program was loaded I could allocate some extra memory (in the same way that mach_override uses vm_allocate) and write some of my own code into it. Then I could look for the entry point and overwrite the start of the program with a JMP instruction to go to my freshly created code.

This all sounds straight forward, but it isn't. There are a number of APIs for looking at a remote process's address space, but it turns out that none of them work properly when the process is started in a paused state. It seems that all of the information describing a process's memory is initialised at some point by the application itself.

Fortunately, the memory is partitioned into modules, even though they are not correctly labeled. The solution is to iterate through these modules, and look at the data in them. Since an executable program gets mapped into memory, I realised that one of the modules must start with the head of the file, so I went looking for a Windows executable file header. Once I found it, I was able to trace through the various headers and pointers to find the address of the program entry point. Adding this to the offset of the executable image gave me the entry point of the program.

So while all of the APIs which are supposed to help here were actually useless, it was still possible to find the entry point, and modify the code found there. The code that the program was instructed to jump to would load a DLL which did the rest of the work during its intialisation later on. The end of this code then replaced the original instructions at the program entry point, and jumped back to it. Then when the application was run, it would start with my injected code before doing anything else. It didn't require threads and was therefore free of the errors from the other technique. It also prevented a program from setting anything up which might preclude a function interceptor from working.

The initialisation of the DLL that was loaded ended up doing the work of overwriting any functions that we needed to intercept. This worked almost identically to the mach_override code with one minor exception. mach_override works on a RISC architecture, and so the instructions are guaranteed to be a particular width. Unfortunately, the CISC instructions on Intel chips need to be decoded in order to work out how many bytes wide they were. Luckily, there is a short list of opcodes which are used to start all the functions in Windows, and DavidM spent a few days tracking them all down. This meant that we only needed a small table to work out just how many bytes were needed to be copied from the beginning of a function, so they could be replaced with a JMP to the replacement code. Fortunately, none of those instructions referred to relative addresses, so no re-writing of these opcodes was necessary.

The weirdest thing I discovered was when various memory pages started changing the permissions. By default, memory with executable code will generate a fault when it is written to, so its permissions have to be updated first. However, I was finding between my loading of my DLL, and the initialisation of the DLL, some pages which I set to being writable had reverted back to being read-only/executable, but only for certain "Office 2000" applications. This was how I discovered that these applications contained statically linked DLLs (which were loaded by the OS after the program was started, but before the entry point of the process was reached) which were modifying the pages of the entry point as well. In other words, someone at Microsoft was also dynamically re-writing the binary at the start of the executable, just like I was.

This kind of dynamic re-writing is simply patching the code after compilation. Given that it comes from Microsoft, I'm surprised that the writer didn't just go and ask to modify the original code! People like me, outside of Redmond, have to perform hacks like this, but Microsoft shouldn't have to! :-)

So the details were very different, but the principles were the same. I always liked this code, and was looking forward to releasing it, but I never seemed to get around to it. DavidW asked that I write all of this up, and I've been very slack about doing it. I've written it down here mostly as a reminder that I need to get around to it one day. Maybe I'll do it while procrastinating about writing my thesis. :-)

Container Prefixes
I realised that I've spent quite a bit of time working on the difference code, and I've been getting frustrated. The problem has not been writing my code, but rather the need to read and learn other people's code in the joining framework. My recent round of debugging was a good example: the Difference class appears to work correctly, but the sorting code does not want to recognise it. This means I have to learn the sorting code more carefully.

One of the problems of working on my own is that there is rarely anything to encourage me to poke my head up and look at the world around me. So I thought a little break might be good for my perspective, and I might improve my productivity while I was at it.

Consequently, today I've spent the day working on a new resolver for finding all collection predicates. These are the URIs which follow the pattern: rdf:_*.

The code to do this is easy and a lot of it is relatively boilerplate, so it hasn't been hard to do. It is very like the "Node Type" resolver, only it needs to find sections in the string pool using different mechanisms. NodeTypeResolver was easier as it could perform selections using type comparators which were already available in the string pool, but it should not be too difficult to write the new one that can select by prefix instead. However, it does make it more complex. I hope to finish it in the next few days, and then I can go back to debugging the sort.

Monday, April 04, 2005

Sorting
What is wrong with sorting?

Debugging the difference operation has been a frustrating process, not least because it appears to be working. To start with there were a few problems, but I tracked these down quickly, and resolved them all. But then the unit tests started to give me some strange results.

The first thing I found was that the result of a difference had a null set of variables. This didn't make sense, as it was just returning the variables of the minuend, and the difference operation would have thrown an exception if the minuend had no variables.

After trying to work out what was wrong with the Difference class for way too long, I suddenly realised that the object I was looking at was not the result of a difference, but the sorted result of a difference. So I looked at the unsorted result, and sure enough everything was correct. As far as I can tell, the Difference class is working perfectly well.

So where is the problem? Well sorting does not use a RowComparator (like the lexical sorter does), but is based on how the variables line up with columns from the underlying indexes. This will not work with a Difference class, as defined by its interface, as this technique is presupposing knowledge of the internal workings of the class.

So there are two choices here. Either I create a RowComparator and use the other sort method (the one used for lexical sorting of the output to the user), or I drop the principle of information hiding, and allow sort to have special knowledge of the Difference class. I fear that the latter is the way to go here, but it bothers me, as it will not work on a different (and more efficient) implementation of the difference algorithm.

For the moment, differences are calculated lazily by iterating on the minuend, and filtering based on what is found in the subtrahend (that's a simple description of a lot of work!). So structurally, the result should resemble the minuend in almost every respect. Many of the Difference methods reflect this, in that they simply delegate their work to the minuend.

Given this, I have not been able to work out why the sorting functionality does not just work on the minuend part of the Difference object. This won't work if the Difference implementation is ever updated to sort the minuend as well, but since that is not required in this implementation there should not be a problem.

A cursory glance at the sort didn't reveal anything, so I'll be going through it in detail tomorrow.

In the meantime, I altered the tests to check for the unsorted data first. So far everything has passed, which gives me confidence that it is all working (though I still need to check the unbound variables tests more carefully). Once "sorting" has been resolved I should be able to get out of this code.

Saturday, April 02, 2005

Slow
To test the bad memory in this PowerBook, I pulled out my 512MB module, and tried re-running the Kowari tests. A successful run would mean that module has a problem, while a failed run would indicate an internal problem in the PowerBook. The tests passed.

So now I just need to replace that module, rather than give up the computer for several weeks. Whew.

The big problem now is the performance of this computer. When running the Kowari tests it became completely unusable. Indeed, doing anything beyond text editing is painfully slow. Based on this, and Kowari's need for lots of memory, I'm thinking that it may be time to upgrade to a full GB of RAM. I should also be able to get new memory faster than a warrantee replacement on the old RAM, so it will have me up and running much sooner. I'll still get this other module fixed on warrantee, as I may be able to sell it to help offset the price of the new memory.

Friday, April 01, 2005

Memory
For anyone concerned about Kowari tests failing on the Mac, I've just been able to confirm that it's my fault, not Kowari's. It seems that I may have some intermittently faulty memory.

Yay.

Hopefully it is my memory expansion module, and not the built-in RAM. When I have some time this weekend, I'll try taking the extra RAM out and doing another build. Hopefully it will pass, indicating a problem with the memory expansion. Fortunately, everything is under warranty, but I'd rather be without memory for a few days than without the whole computer while it gets fixed.

This may also explain why the system crashed last weekend.

Working notes