Friday, May 28, 2004

JXUnit Tests
This morning I discovered just why the JXUnit tests are taking so long.

The first time I ran the tests I discovered that they were running slowly, so I chose not to do a "clean" before running them each time. This was in the belief that doing a clean build would result in a lot of data being unnecessarily loaded again. Unfortunately the opposite is true. Each time the tests are run, they drop any models they need, and then load whatever test data they use. One of the tests drops and then reloads a Wordnet RDF file.

The current Kowari drops models by individually removing each statement, meaning that dropping large models is very time consuming. In fact, since so much effort has been put into loading large amounts of data quickly, dropping a large model is significantly more time consuming than loading it. This meant that dropping and reloading Wordnet took over 20 minutes. Since model dropping has come up as an issue as an issue TJ is considering improving it, and DM has started considering what is needed to make it happen.

The way to get these tests happening quickly was to blow the database away with a clean build, and then run the tests again. This eliminated the need to drop the Wordnet data, saving about 20 minutes on my computer.

Testing all went well, and is now checked in. I've had to restrict the tests to what the trans() queries currently return. I need to confirm with AN whether I should add in non-inferred statements (allowing tautologies to be removed by the current duplicate removal mechanisms), or if I should only return inferred statements instead and remove all tautological inferred statements.

The easiest way to remove tautologies would be to implement a difference operation. I believe that SR wants to implement this with an "AND NOT" combination, so it could be worth holding off until that is available. I'll talk to AN about this next week.

TKS Beta 2
I sat in on a meeting on the progress of the next TKS Beta release. This is so I can keep up with the other areas of the system that I'm not currently working on. I walked out of that meeting having been assigned a role helping get distributed queries into the next release. That will teach me to attend meetings indiscriminately.

My role is going to be relatively minor, and for the moment it may just be restricted to testing. I am nonetheless pleased to get the chance to work on distributed queries, as some of the original plans for it were my own.

I spent some time today learning the new Resolver SPI, as this is the basis of the forthcoming distributed query engine. I still have a way to go.

UQ and QUT
I got in touch with academics from both institutions today. Looking at their web sites, there seems to be no emphasis on the Semantic Web, but I'm hopeful that someone has enough of an interest to be willing to supervise a prospective candidate.

No comments: