Monday, April 25, 2005

Wow, is it really that long since I last posted? Well, I have been busy. Still, this log can save me time if I'm careful, so I shouldn't continue to neglect it.

So why am I so busy?

Yup, this ugly beast has reared it's head at last. It shouldn't be a big deal, but since I'm trying to work full time it has been almost impossible to find the time to do it. Consequently, I've been spending my evenings and weekends working on the report. That's the time I normally spend on this blog, which is why the blog hasn't been attended to.

Anyway, the report is now written, which is good, because I have to submit it today. I was not aware of the timetable for this, but I was informed on Friday that I really needed it in by Tuesday, and even that was late. I'd hoped to have it finished by then, but I really wanted some feedback before submitting it, and now it looks like I'll have to forgo that.

Now I need to work on the presentation. I know what I need to talk about (after all, I've been discussing it for months, and I've just written a report on it), but the slides will take me some time. It's been a while since I've had to do this.

During that, I have to try to be careful of my exercise and sleep, as I'll be at the Mooloolaba triathlon this weekend. My recent illnesses have taken their toll, so I won't be doing the whole thing. This time I'll be in a team, so I'll just be doing the 1500m ocean swim.

Somehow in amongst all of this, Anne and I are finally getting married in two weeks. What started out as a small event with immediate family and close friends has now grown. I'm looking forward to it, but I could do without the stress of all the new preparations that are required with a bigger event.

Collection Prefixes
I've now coded everything to match the collection prefixes. This went right down in the string pool, all the way to the AVL nodes. I hate going that far down for a high level concept, but it seemed necessary. There are well established interfaces to find the first of something in the string pool, but not the last of it. To simplify things, I decided to make the search only look for URI prefixes, rather than strings in general. If we want it working on strings, then the odds are good that we will want full string searching, so I didn't see the point in adding prefix searching in general.

This work took much longer than I thought, because of the sheer scope of the changes. Every StringPool and Resolver class needed modification to handle the new interfaces for searching by prefix. It turns out that there are a lot of them.

Once I thought I was finished, I discovered that I also needed to add support for the in-memory string pool. I'd forgotten about this class. I went in to see how it worked, and discovered that it was written by Andrew. So I set about trying to work out how the existing findGNodes codes worked, with the hope of modifying it. 5 minutes later I realised that I was the one who'd written this method! I'm guessing that if I search back in this blog I'll even find an entry for it (that reminds me... I should add a Google search to this page).

OK, so I knew how this method worked. It's based on a sorted set (a TreeSet). But was there any way to find the last element which matches a prefix in a tree set? I'd need to create a new comparator again, but this was getting beyond the scope of what I'd wanted to do. After all, I only wanted to find all the strings that matched a prefix.

So at this point I tried a new tack. To find the last string which started with a prefix, I tried adding a new character to the end of the string. I just had to make sure that character was higher than any character this might be legitimately found here. I started out with ASCII, but then I remembered that URIs can be in Unicode (I really should learn another language, I'd remember these things). So I went looking at the Character class, and discovered a character called Character.MAX_VALUE. So I added this to the end of the prefix and used this as my end point for searching. It worked fine.

In fact, it worked too well. Now I'm wondering if I wasted a lot of time with the findPrefix methods in the string pools. Strictly speaking, these methods are more correct than using the MAX_VALUE character to find the last string, as it is theoretically possible to use this character in a URI. However, in practice I don't think it ever could be. It certainly won't be used in the context of the rdf:_ prefix.

ASTs for iTQL
Writing the RDF for a set of rules is a time consuming task. I have everything I need for RDFS, but when I get on to OWL it will get a lot more complex. Since these rules are representing iTQL then a tool which can create the RDF from an iTQL query would make rule generation significantly faster and less error prone. I will really need a tool like this when I get to testing, for fast, reliable turnaround.

So I've spent a little time writing such a tool. This started by pulling out the SableCC parts from Kowari, and re-writing the semantics code. It meant learning more about SableCC than I thought I needed to, but I'm pleased to have picked it up.

In the process I discovered that it was going to take a little more work than I'd originally anticipated. Plus I'll need to consider how to pass in non-iTQL options (such as the URI for the name of the rule being generated). I'm considering using some magic tags in comments, like javadoc does, but I'm also putting that decision off while I can.

Anyway, it's a good intro for AST transformation into RDF, which I've been getting interested in recently. It's more work than I initially anticipated so I still have a lot to write on this. However, it's interesting, and necessary, so when I can I'll be spending time on it. That will be some time after the confirmation.

No comments: