Monday, May 15, 2006

Subversion


For anyone not familiar with subversion (I include myself in this group) then the 1.2 version of the code can be retrieved via:
  svn co https://mulgara.org/svn/mulgara/trunk/kowari-1.2

I just thought I'd mention this in case anyone tries to get the whole repository (which includes the entire history of the project). The entire thing comes in at 2.6 GB. The latest branch is a more modest 204 MB.

Thursday, May 11, 2006

Mulgara

The two most important elements of Mulgara.org are now running: Mailing lists and Subversion.

I've discovered that Yum is not the way to install Mailman on Fedora Core 4. It isn't integrated with any MTA which means that it needs a lot of manual configuration. After getting it working with Exim, I discovered that it wouldn't send any messages due to a problem with Sendmail. Moving from Exim to Postfix fixed the problem, but involved reconfiguring Mailman. Overall, a frustrating experience. I'm very grateful to Jesse for his help.

Jesse is the official administrator of mulgara.org, but like me he has other work to get done. Since I'm the one who's actually developing for Mulgara, he was more than happy to give me sudo and let me install the things I needed. The only problem is that he knows what he's doing, and I don't. :-)

Services

So where are we? We have:

Still to do:
  • The web page is currently just a place holder. We need to come up with content, and we need to move all the documentation over to it.
  • From a development point of view, we need to move all the org.kowari packages over to org.mulgara. This would be easier with the refactoring tools in Eclipse, but these won't use the required svn mv command. Much of the first round will be a simple search/replace, so I may just be dusting off my perl/sed skills. To anyone with suggestions on how to do this more intelligently, please get in touch!
  • The site needs a signed certificate. Thawte and GoDaddy both have certificates for under $100, but I should check out with other what we really need. I'm guessing we only need the bare minimum.
  • We need a Wiki and bug reporting system. Herzum Software has offered to provide licensed copies of Atlassian's Jira and Confluence for use by the project. I'll have to ask Jesse to install these, as I have no idea about either of them.


Feel free to write to me with any questions, advice or complaints. I'll answer, accept, and listen, respectively.

Wednesday, May 10, 2006

Work, work , work

Yeach. Long hours in the last few weeks have left me feeling wrung out. I tried to make time for family through this (not always successfully), so I didn't have time for anything else. The first thing I tried to catch up on afterwards was exercise. This blog came way down in the list of priorities.

Bugs

I discovered a couple of problems in Kowari in the last couple of weeks. Unfortunately I had goals to push towards, so I haven't been able to look at them yet. I'll get back to them, but in the meantime I'll write them down here for others to look at.

Full Text Searching

One thing I wanted to do (fortunately I didn't need to do it at the time) was full text indexing and querying. This is straightforward using a Lucene model. Unfortunately I couldn't get it to work! All my queries came up with empty results.

I figured that I must have been doing something wrong, so I went to the examples in the documentation. I couldn't get any results here as well. So now I think that something must be wrong. I haven't had an opportunity to run the full test suite lately (I really only have a notebook, and it's slow enough without having to run the tests in the background), but I'll have to run it soon to make sure everything is still OK. If it is, then I'll be running Kowari through a debugger to figure out why I can't get any results from Lucene models.

Normally I'd have my big Linux box doing these tests for me, but it's still in a shipping container, and I've been told that I won't see it before June. Ask me if I'm frustrated.

There is one other machine that I could use, and that's the Mulgara server. I haven't discussed Mulgara here, so I'll discuss it more later.

Datatypes

This bug doesn't affect me at all (at the moment), but it's surprising to see all the same. I discovered this bug when scanning an model which restricted a class using an owl:cardinality of "1". The data type for owl:cardinality is xsd:nonNegativeInteger, so that's what I used. So far, so good.

Then in some test code, I inserted a series of numbers into a Sequence. These were just 1, 2 and 3, and each of them was typed as xsd:int. However, when looking at the full contents of the model I was surprised to see the following numbers in the sequence:
  "3"^^<http://www.w3.org/2001/XMLSchema#int>
"2"^^<http://www.w3.org/2001/XMLSchema#int>
"1"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>
So if the number was already in the string pool as a "nonNegativeInteger", then it just picks up that value again, rather than re-inserting it.

This will be OK for simple cases like this, but it won't work in the general case. Looking at the hierarchy of XSD data types you can see that int and nonNegativeInteger are both subtypes of integer, but they are not in the same branch. So neither can be used in the place of the other, unless an application specifically calls for an integer (their closest common ancestor).

So having the string pool automatically convert one to the other is going to be a problem one day. I've already started checking literal data types, and need to make an exception for this case. I can certainly conceive of a system which would fail a necessary consistency check because of this.

Now I'm left with the option of fixing it now, or waiting to put it into the XA2 store. I still don't know when we'll have time to get to XA2, so maybe an interim fix is the way to go.

RMI Connections

By far my biggest problem is RMI connections. I've seen posts complaining about how slow Kowari is, which I find really frustrating. Queries in Kowari are nearly instantaneous, and insertions are pretty fast as well. So why do people think it's slow? The answer lies somewhere in the Session interface.

For some reason, when the first query is performed on a Kowari database, the Session can take several seconds to establish a connection. This happens whether or not the system is remote or local. I have yet to profile this delay, but it all seems to be in the RMI code. I can't imagine that RMI is that bad, so I'm guessing that it's doing a DNS lookup in there, though I may be wrong. Either day, it's an unacceptable delay. We need some way to speed this up, particularly when running on a local machine. I put a lot of work into making Kowari really, really fast, and the frustration I feel at these network layer delays nearly drives me to tears.

Another problem is the what happens with an error from the ItqlInterpreterBean. It seems that a single error (often avoidable as they are normally caused by programming bugs) can render the Session nearly useless. There is some kind of synchronization issue after an Exception is thrown, that a Session can't seem to overcome, at least for a couple of operations after that. I'm not sure why this is, but it bears investigating.

A lot of these problems seem to come down to RMI. I don't know what the best replacement technology would be (SOAP will also have DNS issues, etc) but we need to look at something. Whatever we choose, we will probably need to bypass it when accessing a local data store. At the moment we do this transparently, but perhaps we should consider something a little more optimized for local connections, since this forms the majority of the use cases I've seen.

Mulgara

As David Wood pointed out, we are forking the Kowari Project. The new project will be called "Mulgara". We felt this was necessary to keep developing the RDF database when there was uncertainty due to the relationship between Kowari and Northrop Grumman.

Some of the problems appear to be caused by a misunderstanding of Open Source Software by Northrop. This has made me reluctant to discuss Mulgara until now, as I didn't want further misunderstandings to develop. Fortunately, we've had legal advice from several quarters confirming our right to fork the project (since the project is licensed under the MPL).

My employer, Herzum Software, has offered to host the new project, putting up a new Linux box for this purpose (running Fedora Core 4). Unfortunately, the administrator for the machine has been overseas for the last couple of weeks, and with my recent long hours neither of us has had time to configure all of the necessary services. I hope to change that this week. In particular, I've discovered that Yum is not a very good tool for installing Mailman (it didn't even notice that there was no MTA installed!)

Some of the services are already going, such as Subversion, but I still need to organize a certificate (at the moment it just provides the "localhost" certificate that comes standard with Fedora Core).

Check back soon and we should have it all running.