Wednesday, May 02, 2007

Multiple Blanks

It turned out that re-mapping the blanks was quite easy. It required the use of a new getColumnValue method in the AnswerResolution class. This just tests to see if the node being returned from the remote server is blank, and if it is, then to create a different type of blank node instead. This is a new blank node called a ForeignBlankNode. While pretty much identical to existing blank nodes, it takes the server URI that it was returned from as a construction parameter, and it re-implements equals(Object), hashCode() and toString(). These differences allow it to be seen as distinct from another blank node with the same internal ID. Also, the localization process recognizes the node as new (and no longer a conventional blank node) and allocates something new in the temporary string pool.

This worked perfectly, but that interaction with the temporary string pool made me remember bug MGR-43, as it was making use of unknown elements going into a string pool. So I thought I'd have a look at what is involved in a SELECT/INSERT command.

To save myself time, I just attempted an insert from a selection coming from a different server:
insert
select $s $p $o
from <rmi://localhost/server1#foo>
where $s $p $o
into <rmi://localhost/server2#bar>;
Immediately I got a response from the distributed resolver factory telling me that it doesn't support write operations. So that told me that the query went to the server with the model to be read, and not the model to be written to.

To support write operations on a distributed resolver I have to implement the modifyModel(...) method. This takes a Statements object, which is cursor where each location has a subject, predicate, and object. The remote session is happy to accept a Statements object, but this is inappropriate here. If the object were simply sent as it is, then every Cursor.next() call would be done in RMI, along with every call to getSubject(), getPredicate() and getObject(). This would be horrible, even for a tiny set of statements.

There are two ways I could manage this situation.

The first is to not make the distributed resolver writable, and instead send the original INSERT/SELECT to the server containing the INTO model (the model being written to). This server would then use the read-only distributed resolver to do all the querying, and the resulting Statements would be local. This would work, but it would mean doing resolutions for every constraint across the network, even if the entire SELECT part of the query were from the same server. The entire query could be offloaded to the other server for the optimized query resolver, but that is not written yet, and we'll be keeping it commercial for a while anyway. I want/need something that will work for everyone.

The second approach is to make the the distributed resolver writable, while making it possible to send Statements across the network easily. For small statement sets we'd want a simple array of statements, and for large sets we'd want something that paged the statements, thereby minimizing the calls over the network, but still allowing large amounts to data to be moved.

This is exactly the solution I implemented for Answers a few years ago. In fact, I recently revisited the Answer code due to an unexpected interaction with transactions, so I feel familiar with it. While I'm reluctant to extend our usage of RMI even further, it's not going away any time soon, and this seems to be a good place for it.

I had to put the implementation aside a couple of weeks ago, as it was getting late. It's been difficult to get back to it, as I've needed to do some documentation at work. When I tried to set up my notebook computer (which I use in the office) to work with the latest checkout from Subversion, the poor little G4 PowerBook took so long to get anything done in Eclipse that I basically wasted the day. Since then I've had to look at other areas of the project (hmmm, this is starting to sound like my RLog project) and it's only today that I'm getting back to it.

No comments: