Thursday, April 29, 2004

More OWL reading.

Discussion with TJ and AN today revealed that AN isn't as enamoured of Jena as I thought. Basically, everything is still so immature that it's perfectly feasible to do it better... or at least do it differently in such a way as to allow Kowari and TKS to be accessed more efficiently that with any other existing API.

Got a better idea of OWL in my reading today. I wasn't quite getting how to apply an OWL ontology to real data. The whole OWL language is geared around describing data with no mention about how to apply the data. AN cleared that up. When data is loaded then the inferencing engine looks at the RDF and determines the classes for data based on the ontology. So for the camera ontology, there would be no mention that a Pentax OptioS is a digital camera, but the fact that you don't look through the viewfinder, that it meets all the requirements of a camera, etc, means that the inferencing engine can work out for itself that we're talking about a camera, and that it's a digital camera. This is why some operations in OWL are not guaranteed to be computable.

One would be tempted to infer these statements (such as "OptioS is a Digital Camera") and store them alongside the other data, but that has its problems as well. For a start, not everything can be inferred, so where do you stop? It also makes changes to the data or the schema quite problematic, as many inferred statements will no longer apply, but it may not be possible to find them again.

Inferencing types in particular sounds a little intimidating. I'll have to spend more time on this. I was under the impression that we needed to enforce the rules, but now I understand that we need to look at the data and determine if it follows the rules or not. That's a totally different problem. After going through the rest of the OWL docs I'll have to look at the Jena inferencing engine and see if I can learn anything from it.

Long weekend coming up again. I'll be spending it reading OWL, Fast Food Nation, and if I can I'll also get back to the USB design book that I'm also reading.

Back at work, but with a head full of cotton wool, so don't try and read too much into what I write here today... I'll probably refute it all tomorrow.

Early part of the morning was spent dealing with email and spam. It's about time I changed my IEEE address as it's starting to get out of hand, even with filters. Unfortunately I used that address for registering with MSDN for work last year, and within 48 hours I'd gone from about 2 spams a week up to nearly a hundred a day. Great security at MS.

Back to reading more of the Jena implementation of its Ontology API. I think I have a grasp of it all now. Both the general structure, and seeing it run in the debugger left me unimpressed. The question is, can we re-implement Jena as an API for TKS/Kowari? Should we? This was a topic of discussion with TJ this afternoon.

The current implementation just provides a mapping between Jena and a Kowari datastore. It leaves Jena to do everything it would normally do, and only replaces the datastore with a Kowari one. This means that all of Jena's inefficiencies are still there, and it needs to use a map of Jena nodes to the URIs used internally by Kowari and another map back again. (This was the code I've had to debug since Easter). These maps mean that the system doesn't scale for size. The reliance on Jena code means that the system also doesn't scale for speed. It provides a compatibility layer, but it doesn't add anything to Jena at all.

Personally, we should properly implement the Jena API. This would let us slot in with numerous existing systems, and get us off the ground in a hurry. It would also probably give us some real kudos in the community, as it would be one of the first Jena implementations which could handle large quantities of data, and do it quickly. Re-implementing the ontology API to directly access Kowari would also make a certain amount of sense, though the extra gain won't be so dramatic. This is because the Jena ontology API doesn't do a whole lot of work, leaving most of it up to Jena.

There are a lot of classes in the Jena API, so it will take some time to reimplement them directly on top of Kowari. I don't yet see anything insurmountable, nor anything that isn't scalable, so I'm confident that there isn't a huge risk in doing this. Even if we were to find an operation that would be grossly inefficient, a simplistic and inefficient implementation couldn't be any slower than what is offered by Jena now. The Resolver might prove to be cumbersome, but for the moment we could probably use the one found in Jena.

TJ has two major concerns with reimplementing the Jena API for Kowari. The first is that he's worried we might find something in the API which is just not efficient to put over the top of Kowari, creating a significant drain on our resources for no real benefit. As I mentioned in the last paragraph, I don't think this is a real risk. Jena currently performs most operations by iterating over the entire database, and in the worst case we can match that performance.

TJ's second concern is that we might be able to offer features and efficiencies that would be useful to developers that the Jena API simply can't take advantage of. Neither of us know what these might be, but I do see where he's coming from. At this stage I think I'd need to be actually implementing this before I'd see it. Maybe AN could offer some insight here.

For this reason TJ wants to create an externally available API to access the Kowari query layer. It wouldn't have to differ too much from what we currently have, but it would clean things up a lot. This would offer an alternative (and more efficient) interface into Kowari/TKS rather than using an iTQL interpreter. TJ even suggested it could be done as an extention to JRDF (though I'm not convinced of that one). From a technical point of view I agree with the eventual need for this API.

TJ thinks that writing an API like this for a datastore that is as scalable and as fast as Kowari/TKS could entice many developers away from using the Jena API and into using ours. He knows of at least one company that would happily do so. I'm not so sure how widely accepted it would be though. It really depends on how much the developers out there have come to accept Jena as the standard API for this sort of thing. After all, there are viewers, and other pieces of software out there that are already Jena aware. It's a shame AN wasn't here, as he would have quite a bit to contribute to the conversation. I'm sure DW has some definate ideas as well.

We both agreed that ideally we'd have both a Jena API, and our own - giving the best of both worlds. We can even implement the Jena API in terms of our own (since our own would be a reasonably thin wrapper anyway). Unfortunately, as always, it's a matter of time and money. I'll let wiser heads rule on this.

Until we have a decision on this, I really need to have a better idea of OWL, since I really only know how Jena does OWL. That way I can see what is involved in providing our own support without involving Jena. Knowing what OWL needs to do/doesn't need to do will help me work out whether Jena is doing it the best possible way, or if we can do it better.

So I'm back into the W3C documents and reading again. That requires a clear head. I really wish I didn't have this cold. :-}

BTW, does anyone know how I can get Mozilla to use the spell checker provided with Blogger? All I get is a popup that appears too briefly to see what it says.

Wednesday, April 28, 2004

Sick today. Yuck. Spent the day sleeping and reading "Fast Food Nation".

Tuesday, April 27, 2004

Well I've finally got this blog going. I'd been keeping a log for the last few weeks in my .plan file, so I was able to use that to seed this. AN tells me that he's linked to it from his own blog, so hello to anyone who's come here from there.

Yesterday was a public holiday for ANZAC day, hence no post.

Spent the morning working on why fixing Jena code seemed to have broken other parts of the system. Had AN going through it with me, which had the benefit of showing him the modifications I've made lately. Turned out that everything was going fine. The failed tests were all known errors, and had nothing to do with my modifications. However, I did spot a mistake where I always created missing JRDFFactories and JenaFactories even when the graph was already in a cache and no new factories were needed. This would not have caused a problem, but it was unnecessary. It was a trivial fix and all seems to be running fine.

Spent a lot of time running tests, and making tweaks to things like unclosed Answer objects as I heard others talk about them. Finally got it all checked in. This has left me free to get back to reading Ontology documentation, which is what I plan on doing over the next few days with AN. We'll be working out just where we're going with OWL and Jena support in Kowari. After seeing the unscalable mess in Jena, I'm looking forward to putting an efficient Jena API over the top of Kowari. OWL support will be much more efficient too, which is what has me really excited about this approach.

As for the long weekend... I finished "Ringworld" (by Larry Niven) which I've been meaning to read for years. Quite enjoyed it. The plot wasn't really inspiring, but the technical description of the world was really wonderful. Now I've moved onto "Fast Food Nation". This book is about the effect that fast food has had on american society, and not simply the effect on their waistlines. Of course, a lot of this applies equally well to Australia, hence my interest. Some of it is quite disturbing, but much of it is to be expected. It forms quite an indicting list when you see it all in one place. I'm grateful that many of the comments on food safety (about how it is controlled by industry) are not quite as applicable in Australia. Most of it isn't really about fast food per se, but rather about how corporate giants in America have run roughshod over the general population. For the companies mentioned, this has been enabled due to the success of fast food as an industry. I've yet to read anything about the health effects of this food, but I'm sure it's there. It has me more interested to see the movie "Super Size Me" now.

Friday, April 23, 2004

DM's concern about using multiple graphs was based on the concern that each graph may be using different transactions. This was not the case, so I could go ahead with the fix as described yesterday.

Unfortunately, every insertion and deletion into a model has been creating a new graph on the current with which to do it. Since graphs now need to have a JRDFFactory and JenaFactory on creation, this meant that I had to make sure each of the delete and insertion methods were aware of these new objects. Every method which is has Jena objects in its parameter list has been given a Jena and JRDF factory as well. There are still a couple of outstanding methods which are not aware of Jena objects, and so I gambled and let them use null for their factories, based on the assumption that since they are not Jena aware then code which calls these methods will not be calling any methods which depend on either of these factories. This appears to have worked.

These changes invovled quite a few dependencies, so it took some time to get it all through.

At the same time, I instituted the fixes and cleanup for the RMI objects as described yesterday. These all work fine, and hopefully they will help alleviate some of the out of memory problems experienced on the server by RT. They won't fix them all, as the errors are coming about due to unclosed file tuples, but it will certainly help. The presence of these errors seems to indicate that there are unclosed answers still in the system somewhere.

Unfortunately, some time was wasted by trying to test both fixes at the same time. I did the RMI tests in a separate directly, so I could build in a known clean environment. This let the tests be independent, but they couldn't be run at the same time, as the server for each test was going to bind to the one port. This led to some strange things failing, and I had to back out a few things before I worked out what was wrong. After running the tests one at a time everything seemed to work.

Again with the debugging of Jena unit tests.

Getting serious with the logging now, and have added copious print statements and stack traces into the source code. It didn't take to long to realise that the hashmap that JRDFFactory uses to map anonymous Jena nodes to anonymous Kowari nodes seems to be cleaned out between one line and the next. Also, the first set of anonymous nodes were based on positive numbers and the second were on negative numbers.

AN noted that on the first line the nodes are being created, then on the second line the nodes are being queried. This explains by negative nodes. Negative nodes are used for anonymous nodes that are created during queries. However, anonymous nodes are only created in a query if the system has not seen them yet, but these nodes should have been created already. Since the hashmap is empty on the second line, and it gets used for this query the system doesn't realise that the nodes are in use, and creates some temporary ones.

The remaining question then is why was the hashmap cleared out? Some more verbose logging eventually showed that the hashmaps were different between the first line (when the nodes were inserted) and the second line (when they were queried). This meant a new JRDFFactory was being created. Logging a stack trace for each JRDFFactory creation finally showed up the problem. The GraphKowari class is the owner of the JRDFFactory object, and this class creates its JRDFFactory correctly. However, it also gets a "FilteredStatementStore" and this is the cause of the problem. In the method which gets this store for the GraphKowari, a whole new graph is created internally. A FilteredStatementStore is requested from this graph and this is returned to the GraphKowari object. Unfortunately, the new graph also creates its own JRDFFactory. We can't even get this factory from the graph, as we never see the graph, only the filtered statement store that it supplies. I'm not sure why, but when queries are performed it is this graph that gets used, though that seems strange, as the reference to the graph gets lost. I suspect that the FiteredStatementStore knows about the graph that it's attached to.

Essentially, we need to make sure that the same JRDFFactory gets used by both the GraphKowari and the graph used by its filtered statement store. Since we can't get the JRDFFactory back from the filtered statement store without a lot of messy restructuring the logical approach seems to be to create the JRDFFactory first, and pass it on to session.getFilteredStatementStore(). However, I'm going to hold off until tomorrow when I can speak to AN first. This it because DM expressed concern over the use of two (or more) graphs, as their relationship to the session is unclear. It is only after I've confirmed that this structure is correct that I'll go ahead and make this change. However, I think that it should all work to an extent, and regardless of strict correctness, it may have to do until a significant refactor is permitted. Consequently, I think I'm making the change anyway... meaning that I may finally have it working!

Also spent some time converting my time log file into the Cuckoo timesheets. Making more use of the multiple day entry, only 2 entries per day is almost never enough. At least when it's adequate it works quite well, as it means that you don't have to wait for 2 minutes or so to move onto the next day. That is Cuckoo's major problem: it takes too long to do anything. This gets in the way of what you may need to do, making the whole user experience painful. Consequently I rarely use it. If it were simple to just log in, make a change and be done with it then I'd use it more often, but instead I find that even a single day's entry can be a significant undertaking. I asked Ben about the speed, and he explained that it appears to be due to the box that it runs on being badly underpowered. Perhaps I should ask to an upgrade there. The request may get ignored though, as several other people don't seem to mind the interface, given that they are always diligent about filling it every day.

Quick question from AM made me look at RMI Answer objects again. He thinks that a finalize() for RemoteAnswerWrapperAnswer would help avoid running out of memory on the server for when clients are careless and don't close their Answers. This also made me realise that even closed Answers are holding onto memory on the remote server. This is because the RMI server keeps a reference to the AnswerWrapperRemoteAnswer even after it's been closed. This should be fixable with a call to UnicastRemoteObject.unexportObject(). This is a static method which correlates to UnicastRemoteObject.exportObject(), and hence I would have only thought it should apply to objects exported in this way, and not to objects which acheive RMI registration by extending UnicastRemoteObject. However, after looking at it more closely it looks like it should work fine. I'm sure there's some documentation out there that explains it clearly. Otherwise I might just give it a go when there's some time available.

Wednesday, April 21, 2004

Jena debugging most of the day.

Equality methods all seemed to be in order, with the exception that BlankNodeImpl did not have one. It turned out that it is only used in one place (unrelated to Jena) so implementing this didn't really help, though I did it anyway for completeness.

Did find one error in JenaFactory where convertSubjectToValue was not using the cache when building anonymous nodes. Unfortunately, this did not seem to improve anything. Also, the Jena unit tests insert anonymous nodes into the graph as predicates. This is something that is not allowed by OWL, so convertPredicateToValue did not handle anonymous nodes. Again this did not seem to fix anything. Logging showed that these methods don't get called with the Jena tests, which showed me (with confirmation from AN) that all of these tests us JRDFFactory rather than JenaFactory. The JRDFFactory code all seems OK, but I'm starting to get suspicious that this may be the source of the problem. There are also other classes in these Kowari/Jena packages, and I need to check them as well.

Looking at the unit test errors I found there are two types. The first is the set of Jena graph tests. These tests insert a number of triples with simple structures into the graph, and then test for their presence. In each case they fail when looking for the existence of new anonymous nodes. Jena anonymous nodes are differentiated by an underscore character prefix. Tried to log the anonymous nodes which are created (which is how I found out that this happens in JRDFFactory, as mentioned above), and found that some nodes have a negative node number. According to AN this means that they are global nodes, but it somehow seems suspicious.

The second failing unit test was simply due to different representations of anonymous nodes having toString methods which returned URIs in differing formats. This was trivial to fix.

Also helped AM with an RMI problem. It turns out that the next() method on the Answer interface cannot be called after it returns false for the first time. This was a problem for AnswerPageImpl which would fill itself from an Answer object using the next method, and stop when next() returned false. The next AnswerPageImpl expected to be able to call next() again, and get false again, telling it that it had zero size.

I fixed this by adding an isLastPage method to AnswerPage, and changing the client (RemoteAnswerWrapperAnswer) to not ask for the next page if a page returned true for this method. This could be a little fragile, in case the code ever changes, so AM is changing Answer to be more robust. This means that he has made next() continue to return false after it gets to the end, and to not throw any exceptions.

Tuesday, April 20, 2004

All day debugging Jena Ontology API. Spent a lot of time documenting the route used by the code. In many places the Jena code is deliberately obtuse, with over 6 methods to a line, chained, methods, nested methods, and so on.

It seems that the Jena code is doing the selection correctly, though the complexity makes it difficult to verify this. It has broken the camera ontology into 3 parts, and the result of a "find" is returning a 3 element list, which is transparently iterated over when needed. Each element of this list translates to a part of the database that Jena has broken up.

The problem seems to be in the iteration. While I missed the start of this operation (I'll check it tomorrow) I found that the iterator is storing the next item to be returned from a "next" method. At the time that I checked this value it was set to the Camera class. This is quite surprising, as the in-memory version of the Jena database does not normally return this value, though it is supposed to. However, when next() is called the node returned is "null", which shows up the same problem as before.

I tried to find out what next was returning for me, only to discover that the data is retrieved in hasNext() which I'd already missed (This is why I missed when it found SLR). After going over the correct answers I tried tracing hasNext() for the last item (the one that returns "null"). It was here that I discovered that the level of "iterator wrapping" is extreme, with 15 levels of hasNext() being called. Similarly, numerous levels of next() were being called in the same way. When I finally got to the code that executes the actual "next" I discovered a state machine! This is enourmously complex given the task here.

Rather than try and decipher this state machine I took a step back, and realised that the reason the "next" method might not be working might be the equality method that AN mentioned all that time ago. Now that anonymous node mapping bug has been fixed it may be possible to pursue this, so I'll try it that way.

As for the fixed anonymous mapping bug, I went through the code with AN, and he pointed out some possibly poor assumptions made about types, so we went through and fixed them. Also learned that the store tests are failing, but on inspection discovered that this might just be due to a problem with toString methods, and Ant failing to recomile all of the code. These should be a quick fix tomorrow.

Monday, April 19, 2004

Power was lost on the weekend, so all context for debugging was lost. This is annoying, as it takes some time to set these things back up again, and without the results of previous work still present, it takes a while to remember everything needed just to get the system running again. However, I hadn't written in this log on Friday, so catching up has been helpful in remembering what is needed.

After running the program in the debugger I tried printing the contents of the anonymous hashmaps kepts by Jena and JRDF, but both came up empty. Since this worked when the JRDF object belonged to AbstractDatabaseSession rather than the graph, I suspected that I may not have been re-using the correct objects. Before moving them back I tried logging all insertions into these hashmaps. Surprisingly I discovered that insertions into JRDF were not also going into Jena. A stacktrace showed that AbstractDatabaseSession does anonymous node insertions on directly into JRDF, which avoided Jena. I fixed this, and included a small refactor so that Nodes are always passed to the JRDF methods for anonymous nodes, rather than the String for the URI of the node.

After the above was complete, the code again runs to completion, now with anonymous nodes being handled correctly. Unfortunately, this did not fix the erroneous results we received earlier. The only difference now is that the anonymous node which was shown before, is now being shown as "null". So the bugs here were unrelated. It looks like I need to trace the inferencing to see why it includes the SLR intersection object for this query.

Friday, April 16, 2004

Writing this after the fact, so my memory may be a little faulty.

The first item for the day was the ongoing problem with RDFUnitTest. The timing problems here are quite frustrating, and offer little direction for things I should be trying to get it fixed. When DM suggested that he could look I was only to happy to let him. This let me get back onto the more interesting task of debugging the Jena Ontology API on Kowari.

Spent most of the day using ddd/jdb to trace through what is going on with the Ontology code. Unfortunately ddd for Fedora seems to be slightly broken in that it refuses to set the current execution position, either with glyphs (the arrow) or with text. This meant that I had to manually select the current line in the source code window, so that the correct source would be shown. I also typed "list" a lot, just to see the context of my current position. This was only marginally better than using jdb on its own.

The resolver in the Ontolgy code is failing when looking for a property called "first". This is the property used for the data in each element of a list. The problem occurs when first is supposed to return an anonymous node. In this case, no node is found at all.

Tracing through what happens, it became apparent that every single check for properties like this is performed by forming a triple like , where node is the node you want the property for. Jena then iterates through the entire database looking for all matches. This is really inefficient, and demonstrates enourmous scope for improving scalability and performance. Indeed, it seems that once the property is found, it iterates though the entire database again, based on the results of the first find.

The problem we are encountering is due to our mapping nodes into the database. It seems that Jena is keeping it's own RDF structure internally, which is how it knows the relationships between each node (though I've yet to completely confirm that). Jena finds the anonymous node that it needs, and then asks Kowari to give it some details on this node. Kowari then asks the JRDFFactory that it keeps for the details it stored on this node.

JRDF is the interface than AN wrote to let Jena communicate with Kowari more easily. It keeps a cache of the nodes which have passed through it, and AbstractDatabaseSession asks this interface for the nodes when Jena needs them. However, the nodes in question did not go through JRDF, and instead went through the Jena wrapper that AN wrote. After discussions with AN, it was determined that both the JRDF and Jena interfaces should be keeping their own mappings, as they each map nodes in different directions. To do this, the JRDFFactory was removed from the database session, and put in the current model along with a new non-static JenaFactory. Both of these factories are then passed to the database session whenever they are needed. The factories also inform each other of all of the nodes they have used, so that reverse mappings are always kept in sync. Unfortunately, an initial run of this was not successful. I will have to check the contents of each map to see if all the nodes are present. If so, then I'll need determine why they weren't found when needed, otherwise I'll have to determine why they weren't put in there in the first place.

Thursday, April 15, 2004

Beta release day. Fortunately the Ontology API was not required, but I was hoping to get it done anyway. However I'm still working on it.

It turns out that the Ontology API is iterating over the underlying dataset in order to inferencing. This is horribly inefficient, and now I understand why we want to implement OWL with Kowari queries. If we can make it work then we'll have several orders of magnitude speed improvement, and scalability over Jena.

I was able to determine the first part of the problem. When returning anonymous nodes to Jena we always wrapped them in URI resource objects, so Jena had no way of working out what was anonymous. That's been fixed, but it might cause problems for other parts of the system that expect to see URI objects. SR and AN agree that I should continue on even though it might break the existing system, as this is the more correct way to do it.

Jena wants to use its own internal identifiers for anonymous objects, so I've been creating these objects and storing them in a map from Kowari nodes to anonymous objects. However, I've yet to find where Jena tries and maps its internal identifiers back to the original anonymous nodes. This is a problem as the very first query that the OWL example attempts finds and anonymous node that appears to have no properties because it doesn't know how to map them back to the original nodes in Kowari. I'm tracing this in JDB to find the attempted mapping.

In the meantime, the old RDFLoadUnitTest bug showed up again today. It works fine on a faster Windows machine, and refuses to work on a slower box. I added some roll-your-own logging to one of the files, and saw some very strange behaviour. By adding a few print statements here and there I've caused one failing test to start passing, and another to stop working. Other times I've seen a log claim that a value was 5, but when junit tested the *same* value it reported 7. This sort of thing is normally caused by threads, but there are none in use here. The nature of the errors seems to indicate that there are some file flushing problems in Windows, but I'm not sure. I'm considering giving each test its own file, and trying to make them work that way, but I've tried this before and I'm not confident.

Wednesday, April 14, 2004

Moved off the Jena API and onto the Jena Ontology API. Implemented a few tests,which helped me to conceptualize was is going on with ontologies.

The camera ontology is a part of the Jena tutorial, so I've been working with this. It turned out that AN's test was based on this ontology and the code shown in the tutorial, so I was also able to start looking at the problems in our own code.

The test we're running is to ask for all classes which are Cameras. According to the tutorial, this should be equal to "Camera, LargeFormat, Digital". However, this is not what the Jena code returns (it only reutrns LargeFormat and Digital), so I've had to consider the ontology structure carefully.

The Camera query is supposed to return Camera because all classes are defined to be subclasses of themselves. This is apparently a failing in the Ontology layer's inferencing. It's a problem, but for the moment it's not *my* problem. Unfortunately, our code also returns two other nodes, "SLR" and "node96".

Now I understand the SLR, because even the tutorial states that SLR is a subclass of camera. However, it's *not* defined that way in the ontology. Instead, it's defined as an intersection of Camera and a restriction (the restriction is ViewFinder=ThroughThe Lens). IntersectionOf is defined to define a new class, so I'd expect SLR to be a subclass of Camera, but apparently not. This ignores the fact that Figure 3 in the tutorial clearly shows that it is. However, the reference implementation does not return SLR, so I guess I have to live with that.

The node96 is more interesting. This is an anonymous node, and should not show up at all. In fact, it's the anonymous node that defines the restriction for SLRs. That might explain why it gets picked up by the ontology inferencing (in the same way that SLR gets picked up), but it doesn't explain why an anonymous shows up at all. Still I'm learning more as I go.

I tried configuring logging in the AbstractDatabaseSession.query() method to show up so I could see what the Ontolgy inferencing was trying to do. This was MUCH harder than anticipated, but I got it going in the end. It involved putting a logging configuration object into my test class, and lots of fiddling with the log4j configuration file. SR suggested that I wanted to define my logging config file on the command line with a URL, but it turned out that this was not the case. After all this I still didn't see the queries showing up in my output. Finally I resorted to throwing exceptions and calling System.exit() in an effort to see the query getting called, and it never happened. That's when I realised that the Ontology API does not use any kind of query on the database.

Decided that I need to change tack, and try looking at the nodes being traversed by Jena to see why anonymous nodes are showing up when they shouldn't be.

Tuesday, April 13, 2004

Still sick today, but I went to work. Now learning Jena in order to fix some problems we are having exposing anonymous nodes. I'm starting this by reading documentation.

Learned the basics of the Jena API. Makes sense, although some of it seems a little obtuse for the sake of flexibility. Not sure that the tradeoff is worth it, but experience will tell there. Now learning the Jena Ontology API.

Monday, April 12, 2004

OK, show and tell time. Over the weekend I did quite a bit of exercise, got sick (still am), saw Anne's cousins, went out for 2 breakfasts, played a lot with Luc, and watched the first few lectures in a series from the 80's by Gerald Jay Sussman (GJS) from MIT.

Luc is making progress by the day. I know every kid in the world does, but it's still amazing to watch. He's very happy too, so all those smiles makes playing with him a lot of fun.

The GJS lectures were recommended by DM and AM, and I'm enjoying them. They were taught way back when computing was just starting to become mainstream. The language used is LISP, and it's certainly different. It's the first language I've ever seen which doesn't seem to have any looping constructs, instead doing everything with recursion. That makes me uncomfortable as recursion normally chews up stack space, but I'm starting to suspect that LISP doesn't suffer from that problem. I suppose I'll learn more about this as I go.
(Note: AM explained that LISP uses tail-recursion, which means this isn't an issue for iterative algorithms).

In the meantime I'm going to convert one program in the lecture into C, just so I can see how one part of it works. It's a Newton-Rhapson method for finding square roots, and I want to have a closer look at the differential function. I can follow it reasonably well, but there was a comment about it not being executed on every iteration which didn't seem to be right. There have been some minor errors in the lectures, so I suspect that this might be a slight mis-statement of what was happening. I thought that by translating the code for myself I might get a clearer idea of the exact mechanism.

Thursday, April 08, 2004

Much of today was taken up with meetings, starting with the morning meeting, and moving onto a teleconferenced meeting with Virginia. Not a lot of time for coding.

AN had some important things to say about coding standards and tests. I was feeling a little guilty until I realised that with the exception of 3 classes, I haven't written any new Java code now for 2 years. Those 3 classes were all written in the last few weeks for RMI, and all were compliant. I'll need to be diligent in future to prevent sliding.

Found all the Windows problems. RDFLoadUnitTest is using a log file which is re-used in each test. Some tests rely on this, but it left data in the file for one test, breaking it. Most of the tests have a finally clause which tests for the existence of this log file and deleted it. My initial solution was to delete the file in the "tearDown" method instead of the end of most tests, but that was how I discovered that some tests rely on the file being left from a previous test. I disagree with this, but it seems to be necessary here, and I don't really have the right to change it. The problem was fixed when I deleted the log file at the start of the failing method.

There is one remaining test which fails, and that is IntFileTest. This fails due to the truncate method being called on a mapped file. DM has already said that he know how he wants to address this, and it's been left for him.

Wednesday, April 07, 2004

It seemed that the problem with RDFLoadUnitTest could well be the parsing of newline characters, but it has been difficult to track this down. This is because the files in question only exists for the duration of the test. Keeping the file around indefinately does not work, as it seems to interfere with other tests. I ended up giving each test its own file to work with, but this resulted in most of the tests failing. It would appear that several of the global logging objects in RDFLoadUnitTest are holding onto data regardless of the file they read from.

While all this was testing I was able to find and fix a problem with ItqlInterpreterBeanUnitTest. This was definately due to problems with the EOL character, and it was a simple matter of using the line.separator system property.

Getting back to the RDFLoadUnitTest, the problem is coming from RDFLoadLog. I've added a new constructor that lets me log more data. Strangely, this simple addition has caused another test in the suite to fail! Logging in different places is causing things to change as well. It all seems remarkably brittle.

Tuesday, April 06, 2004

Another uneventful day. Everything seems to be running smoothly under 64 bit mode on Hamilton, with none of the "Unable to connect" problems occurring. There are still 3 tests which fail, all of which are unit tests for file-based tuples. In each case the test reports "Out of memory". Tate was happy enough with this, given that Hamilton is a very low end machine, but DM suggested upping the stack. Adding -Xmx256m to component test did the trick and these now pass.

Added some code to build.sh and build.xml to detect a sun4u processor and use the -d64 flag for the JVM. This is being passed in as a parameter called "arch.bits". There seems to be no other way to find out if the current architecture can execute Java in 64 bit mode. It is also annoying that only the Solaris JVM will accept the -d parameter. Consequently all other architectures are now defining "-Dnoop" instead of -d to act as a placeholder on the command line when invoking java.

AN would prefer if it the detection and setting of arch.bits could be done in build.xml so that running the system without build.sh will also work (this only seems to be an issue of anthill). However, given that this is only an issue on 64-bit Solaris, and all it does is default to the 32 bit version, there is no priority for it. I might look at it if I get time later, but for now build.sh does all the tests correctly, and running a 64 bit kowari (with assertions enabled) is just:
java -ea -d64 -jar dist/kowari-1.0.jar

Now testing Kowari on Windows, using the host "Carver".

For some reason JAVA_HOME was not set correctly. Added this to the kenv.bat file which was created for setting Kowari variables in order to run build.bat.

A clean build had a lot of unknown classes. Must be due to cvs not having -d by default on the windows login. Now fixed.

RDF Load unit tests all failed due to being unable to delete C:\cygwin\home\pag\kowari\tmp\rdfload-log.txt. This was due to the file policy for testing being different for Windows, and not set correctly. There was no perceived need for restricting file access during tests, so this was just changed to ALL_FILES. Now have a single error in RDFLoadUnitTest where the logfile appears to have parsed 2 extra lines (read 7 instead of 5). Possibly due to newline characters being out of whack for Windows.

Monday, April 05, 2004

Working on problems with running in 64bit mode on an UltraSparc. Frustratingly ineffective day. The only UltraSparcs available are quite slow... not that I know their exact MHz. That's really unforgivable as I own one of them! :-)

Mostly works, but clients occasionally refuse to connect to a running RMI server. In trying to reproduce this problem I started using an iTQL command line, using the SWING GUI found in itql-1.0.2.jar, and connected directly to a Kowari server to run the JXUnit tests manually. In the process discovered that attempting to load an RDF file into a non-existent model left the session in an unusable state. This was caused by an exception being thrown (due to the use of a non-existent model) before a try block was set up to catch it. As a result a temporary string pool file was left open. So I also discovered that DM now keeps track of all managed block files in a hashtable. That'll be handy to know.

Was prepared to fix this bug, but Tate preferred that it get addressed later. AN picked it up for me and fixed it. Back to the 64bit tests, but got held up for a while with broken SOAP libraries. Once that was fixed, ran the tests again and the day was over before they were done.

Also noted that JXUnit tests will never insert into a non-existent model (ironically, I coded this a couple of years ago, but forgot). The LoadDataJX class will always drop and re-create a model before loading a file into it.

While waiting on tests took a quick look at Joel Spolsky's User Interface book (indirectly got there after reading about Eric Raymond's rant on CUPS). Looks interesting, and I enjoyed the anecdotes. I'll have to make some time for it.