I thought that I might take some time over the weekend to do some more work on either the N3 loader or moving Answer pages over RMI. Between family, DVDs and fitness it never happened. Oh well, I guess we're supposed to clear our minds on the weekend. At least I have a new first aid certificate from St John's to show for it.
Paging All Answers
I went back to the externalization of
AnswerPageImpl today. I planned to spend the whole day making all of the components of an answer externalizable. That's externalizable, as opposed to serializable, in order to avoid the metadata overhead. This is possible because there are no real parent classes to consider for any of the components of an answer.
In many cases it's possible to represent a component of an answer as a string and a datatype. Datatypes include
Literals may require their own datatype). Representing components in this way has very little overhead, and should reduce the significant time we seem to be spending on the default serialization employed by RMI. At this stage I'm thinking of using the default serialization of a
java.lang.String as there seems to be a lot of work in the JVM to handle this class, and I'm assuming that RMI is probably pretty good with strings as well. Besides that, if I tried to do it myself with a char array and a length, then I'd probably make some erroneous assumption about Unicode for non-latin alphabets.
In the first instance I pulled apart the
AnswerPageImpl at a higher level. Other than a few integers and booleans, the entire dataset is a
Object, with each array being of fixed length. This was easy enough to encode as a pair of integer dimensions, followed by all the elements as an unstructured list. This didn't cover the expense of serializing each of those elements, but it did take two seconds off a 73 second transfer. Given that this only addresses the small amount of data in the wrapping objects, and does not consider the 3000 elements in each page of the answer, then this modest improvement shows that this is certainly on the right track.
However, testing this small change ended up being more difficult than expected. Because the tests involved a transfer between two computers, I've been using TJ's computer for a lot of the work. This is partly because TJ has been with me quite a bit during Friday and today to help out and see how I've been progressing. I checked for differences in the source code to CVS this morning, but I didn't realise that TJ decided to update the source code shortly afterwards. In the process it pulled down a modification that AN made to an RMI class, which resulted in a need for a new serialization ID. The consequence was a class that wasn't being rebuilt, but which needed to see the new ID, and that led to lots of RMI failures.
Since I had been making changes to the serialization of objects which are sent over RMI, I naturally assumed that the bugs were my own. I spent an inordinate period of time adding logging to my code, and seeing it all fail without a single log message. Finally TJ asked how I was going, and when I told him of my consternation he explained that he'd updated the source. After removing all compiled classes that were dependent on AN's new modifications, a rebuild got it all working again.
Well, it worked right up to the
NullPointerException that I caused! It was that old gotcha that I keep forgetting...
readObject) does not run the default constructor (yes, I know that's a C++ism). Hence, the
ArrayList was not instantiated yet. Easy to find, easy to fix.
Once it was going it was easy to see that we'd gained a little in speed, and that I was on the right track. It is frustrating when these things take so much longer than they should though.
While restructuring the externalization code, the
writeExternal methods started getting unwieldy, so I started pulling sections out and into their own methods. While showing some of these to TJ, and describing parts of the compression/decompression code that he hadn't seen, I realised that I'd missed an obvious optimization.
As I mentioned in my last post, compression was slightly faster when it was performed on an entire data buffer at once, rather than streaming on an
ObjectStream. On reflection, this made sense. Correspondingly, it made sense to decompress the entire buffer at once. This needed a length prepended to the byte stream, but otherwise didn't need too many changes.
The original code to get an
byte byteArray = (byte)in.readObject();My first attempt to decompress the data in one hit appeared like this:
InputStream byteStream = new ByteArrayInputStream(byteArray);
ObjectInputStream data = new ObjectInputStream(new InflaterInputStream(byteStream));
int uncompressedSize = in.readInt();This code threw an exception which described an unexpected termination of the data stream. Logging what was going on showed that the
byte byteArray = (byte)in.readObject();
InputStream byteStream = new ByteArrayInputStream(byteArray);
byteArray = new byte[uncompressedSize];
(new InflaterInputStream(byteStream)).read(byteArray, 0, byteArray.length);
byteStream = new ByteArrayInputStream(byteArray);
ObjectInputStream data = new ObjectInputStream(byteStream);
readmethod was only returning 611 bytes, when I expected 55620. Since I knew that the buffer size of the InflaterInputStream is 512 (I'm assuming this number is in bytes, but there's nothing to confirm this) it appeared that the
readmethod is restricted to the current buffer page. So it was easy enough to overcome by looping on the
readmethod until all the data was read.
The result of this was to knock 14 seconds off a compressed transfer that had taken 88 seconds with the previous implementation of compression. That brought it to within a few seconds of an uncompressed transfer.
Since it had become apparent that
InflaterInputStream.read()needed to be called numerous times because of the buffer size in the
InflaterInputStream, it seemed reasonable to revisit the issue of buffer size.
I initially thought that using the same buffer size on both ends would be appropriate, but it quickly became apparent that this was not the case. Increasing the compression buffer size had a very small effect, but only when the page size was increased significantly, to about 8kB. I also tried up to 16kB, but this was just a little slower, so I stayed at 8. The result of this was about a half second improvement, which was almost lost in the noise.
The decompression buffer size had a much more significant effect. Increasing from 512 bytes to 1024 took 2 seconds off straight away. Going up to 8kB or higher started to slow it down again. In the end I settled on 4kB. This may not be exactly the perfect size, but it provided the greatest improvement for the tests which I'd tried. Different shapes of data may result in slightly different buffer sizes being the most appropriate, so I saw little point in doing a binary search to look for the "perfect" buffer size.
The final result is a compressed transfer which takes exactly the same length of time on our network as an uncompressed transfer. On slower or congested networks this should offer a good alternative. Hopefully I'll get even better speed improvements tomorrow with properly externalized answer elements.