Thursday, May 27, 2004

Testing
Reasonably simple day, which is just as well since Luc made last night a little difficult. Anne decided to try "controlled crying". If only she'd decided to do that on a Friday night! I could have survived a Saturday without sleeping much easier than a Thursday.

Today was just a matter of getting the JXUnit tests right. I had a short day today, so the tests took up most of it. While the tests were running I spent the downtime re-reading papers on TRIEs. DM also spent some time going over the indexing and file structure found in Lucene. He commented on the advantages of using skip lists, since these take advantage of the efficient nature of sequential reads from hard drives. He now wants to try a triplestore which uses skiplist indexes. I'm going to stick to hashtables for my own plans. It will be good to have differing implementations to compare.

TRIEs
As I've already mentioned, I'm planning on using a TRIE as the basis of my stringpool. Almost every description of TRIEs refers to them being an index, rather than a data store. I have a couple of ideas for using a suffix TRIE for the storage of the strings, but I'm not sure how efficient they will be. I could always use a simple linear list of strings, with a TRIE index providing indexes into the file, but that will lead to problems with deletions. On the other hand, strings don't have to be removed from the stringpool, and cleaning out the TRIE during a deletion would be quite slow. I might get a system working that does insertions only, and then consider my options from there.

One hassle with using something like a TRIE is that each node is has n children. This can be done with a fixed number of c children references, and a "next node" reference to form a linked list when the number of children exceeds c. Not as efficient nor as fast as a fixed size tree node, but more flexible.

Study
I've been considering getting back to study, after leaving for other things two years ago. My head isn't in physics at the moment, so I'll probably steer clear of that for the moment. Instead I've been thinking of the Masters courses at QUT. I did a couple of subjects there some years ago, and it's close to where I live.

Now the easy thing would be their coursework Masters degree. I've seen the subjects, and it wouldn't be difficult - just time consuming, and probably a distraction from doing real learning. On the other hand, the piece of paper would be handy, and could help me skip some odious requirements if and when I continue on with physics research.

A research Masters would be harder though much more worthwhile. The great thing about that is that I could do it on the Semantic Web, hopefully connecting it to my work. It would provide the same benefits as the coursework degree, and wouldn't be distracting me from what I really want to do. But then again, it would be harder. :-)

I think that typing this has convinced me that I want to do the research work. After all, what's the point if it's easy and I don't push myself? Why do I do these things to myself?

No comments: