Monday, November 01, 2004

Memory String Pool
After removing the exception that was thrown when the node type resolver is created with a canWrite parameter set to true, I found I was still getting the error. This was simply due to the file not being re-built, so I mentioned it to ML, who has been fixing these things. He told me later that it was a couple of items which were out of order in the Ant script. Later in the day I did an update, an a file which had implemented a new method wasn't rebuilt before a calling class was compiled against it - resulting in a compilation error. ML claimed that this problem could not be fixed, which I'm a little surprised at.

So for the moment, I'm still having to do clean rebuilds.

With the exception that I mentioned above removed, I was able to create models of the appropriate type. That just left me needing to perform queries against it. While looking at the code I realised that the ResolverSession implementations were returning sorted data from the findStringPool* methods. I was previously sorting the returned data so that I could append it, but since I had already appended the data from two string pools, it was already sorted, and I could use it in an append operation without further processing. I was worrying about the excessive use of new HybridTuples objects so this was pleasing to see.

I had a little minor debugging to get through before discovering that the in memory string pool does not implement either of the findGNodes methods. Looking internally, I discovered that this string pool is implemented with hash tables, making these methods impossible. ML asked me about this at almost the same moment I discovered it for myself, so I had to find a fix that would suit us both.

Fortunately, the SPObjects which are stored in the string pool are all comparable. This means that they can be stored in a SortedSet. While this will use a little more memory, the objects are already being stored, so the only space overhead will be that of the tree structure for the set. So I've added a new index of the SPObjects which just stores them in order.

The first findGNodes method was relatively straightforward. Using the SortedSet.subSet method did a lot of the work, though I needed to fiddle a little with the results in order to appropriately include or exclude the first or last item, according to the parameters of findGNodes. I still need to write a small wrapper class which can convert the resulting subset into a Tuples.

The second findGNodes method needs to be able to find an entire data type. The most efficient way I can think of doing this would be to create a couple of SPObjects which are guaranteed to be the smallest and largest of a data type, and use the SortedSet.subSet method again. Hopefully that won't be too hard.

No comments: