Friday, February 03, 2006

Drowning
The visas finally arrived, and now I can do all the last minute things I have to get done before leaving Australia for good. The power will go off at the appointed time, as will gas, phone, internet, etc. Annoyingly, I can't cancel my mobile phone until I want it cut off, so I'll have to wait until the appointed time. I'll do that from overseas, as I'd like a bit of overlap so I can continue to make calls.

More frustrating is my health insurance. The Australian government gives a small subsidy to private health insurance but this is reduced if you spend any time after turning 30 without private cover. Because of this, I can only cancel my account without future penalties if I show my travel itinerary at an office. I have the itinerary now, but I'd rather not have to travel in.

My main problem at the moment is that I'm swamped with paperwork. Flights (obviously), visas (done, thank goodness), mail redirection (sorted), insurance (travel, landlords, shipping), mortgage (refinanced because the CBA would be a nightmare to deal with from the USA - they're bad enough in Australia), shipping forms, customs declarations, car sale, and rental documents. In the midst of this, our printer has decided to start mangling paper, so all those PDF documents I've been sent to fill in aren't helping.

Most of it takes a little time, but my "Supplemental Declaration for Unaccompanied Personal and Household Effects" form has one sticking point. It requires my "Resident Alien No." only I know nothing about this. My visa isn't forthcoming, and I have no other documents.

The visa has a 14 digit "Control Number" on it, but that seems to be a different number altogether. I found a document on the web (an application for info form at Sandia) which asks for both the Visa Control Number and the Resident Alien No. There is also an 8 digit number in red on the visa, but it's completely unlabelled, so I don't know what it might be. Google was surprisingly unhelpful in this regard. The best I've been able to do os to find the same form provided by other shipping companies, with all of them saying that the questions should be "self-explanatory". Great.

While writing some of this, my brother has come online and has started explaining some of it. I knew that I have to apply for an SSN once I get there (why not make it a part of the visa process?), but I didn't know that I also need to get a "permanent residence card". He also told me that he didn't get his "Resident Alien Number" until he'd been living there for a few months.

I'm starting to get confused about how I can actually move to this country. Is it possible?

Work
Unsurprisingly, work has slowed right down now, though I'm still trying to keep my hand in.

Someone had a problem the other day about using decision trees. They weren't scaling all that well (they scale by log(n) obviously), and he was trying to pick up some scalability from somewhere. He thought of breaking the tree into several subtrees, which might make the structure more manageable, but still requires the same number of decisions.

I started thinking about the first decision, which determines the sub-tree to go to. It occurred to me that several layers of decisions could be merged into a single "hashcode", which then finds the tree through a hash map. This tries to trade time for memory (less time for decisions, but more memory in the hash table). It's possible to go this route, but it requires careful merging of the data into the hashcode. If each of the elements going into the code require a test of some sort, then the number of tests to find the final data will not change. It's one of those things where you really need to see the shape of the data involved to see whether or not a particular optimization will work.

Merging data into a single "hash" is kind of like representing each of the elements of the data as separate dimensions. The hash code then leads to a point in N-dimensional (N-D) space representing the data you have so far. Using the hashcode to find the decision tree to use, means that these trees represent the work to be done for various regions of N-D space. This then brought me back to neural networks, as this is similar to how they are modeled.

This made me realize that in some ways a hash table can act like a neural network. The main difference here is that neural networks are forgiving for unexpected data. Hashtables can only be made to work this way if they can cover all allowable points in space. If the space isn't sparse, then that means that the hashtable might as well be an array - just so long as there is a way to map co-ordinates to something.

Anyway, it has me thinking that I might have to pull out some neural network algorithms, dust them off, and have a go at applying them to some of our problems in document categorization. I haven't used them in years, so it will be a real blast of retro for me.

RDF
In the meantime, it's about time that I look at RDF again. I still plan to use Kowari (since it's under the MPL, and that can't be changed), but while NGC are making things awkward I've been considering other routes. I'm tempted to use Sesame for a while, to learn more about it if nothing else. However, I'm not sure that I can make my thesis work in this framework, so I still need to look at other options.

I've been thinking of having a go at DavidM's skiplist code, to see if I can use it as the core to a new storage layer. If that works out, then I can start building some of the other layers on top, avoiding the problems that have had to stay in Kowari for historical reasons. Many of the top layers in Kowari were developed (as open source software) in the last year, so these can be transplanted easily.

I don't pretend that this would turn into a fully fledged RDF store (at least, not without a lot of time and help), but it could be a useful exercise. It would be enough for what I want to do OWL inferencing anyway. It might also go a bit quicker than Kowari, since I've already done a lot of this stuff once before! :-)

I'd rather work with Kowari (for the time being), but at least this way I don't have to worry about NGC interfering. It also starts introducing the skip list code that Kowari has needed for some time.

I'll see if I get any time for this in the coming week. It may be hard. I can't even do much on the plane, as we will be watching two young children. Still, it doesn't hurt to keep thinking these things over.

No comments: