Thursday, January 24, 2008

More Correspondence

Going back through my emails with Andy, I realize there's still a lot that might be of interest to include here. Hopefully the parts I choose to copy/paste don't appear to disjoint.

JRDF

Andy was confused about the role of JRDF in Mulgara, as well he might be. He was trying to work with a minimal set of classes, yet he kept needing JRDF even when he wasn't using the JRDF interfaces.

You can think of JRDF as having 2 faces. First, it provides definitions for classes that represent RDF nodes - URIResource, Literal, BlankNode. Second, there's an interface for inserting and querying for RDF statements. Initially, someone decided to use JRDF as the definition for nodes (I think it was Andrew, and I think he chose to use it since he'd already written this code while he was at home, and it made sense for him to reuse it). Some time later, Andrew decided that the interfaces for manipulating and querying for statements should also be implemented, since we were already using the JRDF code. (Now this is in my blog, I'm sure Andrew will clarify this point!)

So internally, yes we use JRDF. It's mostly for the interfaces and abstract classes associated with URIResource, Literal, and BlankNode. There are also interfaces for SubjectNode, PredicateNode, and ObjectNode which are used when putting triples in and out of Mulgara. At the lower levels, Mulgara is 100% symmetric around all 4 nodes (it used to be 3 nodes, but as most people should know now, we moved it up to 4). However, when the data gets pushed through these interfaces, this imposes certain type restrictions. This is why Mulgara won't let you use a literal as a subject, or a blank node as a predicate.

For this reason, you'll need the JRDF classes, even if you never use the JRDF interfaces. Yes, I know it's annoying. One of the many reasons I want to reimplement a lot of Mulgara (another big reason is that I want to use a less restrictive licence - specifically Apache).

Blank Nodes

Blank nodes can have any label an implementor chooses to use, so long as it meets certain criteria. In Mulgara, they are shown as an underscore, colon, and then a number. Andy was trying to figure out the significance of the numbers shown here, and how they get allocated.

These numbers are actually the raw graph node identifiers (a 64 bit long), or gNodes. All gNodes are allocated from the Node Pool, which is just a Free List.

... describing free lists..... Oh boy.....

To start with, any new requests for gNodes just come from an incrementing long. However, if you ever delete all the statements that use a gNode, then that gNode will be "released", meaning that it's added to the FreeList. So now, whenever you ask for a new gNode, the FreeList will try to give you any released gNodes first before it returns the incremented internal long value.

However, that's a vast simplification. If you released a gNode in the current transaction, then these will be given back to you first (until exhausted). Next, it will try to give you any nodes released in old transactions that are not part of a currently "open" result set. Once all open resources that refer to a set of gNodes have been closed, then the FreeList is able to hand them out. Finally, it uses the incrementing long.

All of this reflects the 32 bit thinking that the system started with. There is little need to re-use gNode values when you have a 64 bit system (if you allocate a gNode every millisecond, then it will take you half a billion years to use them all up, so I think we're safe). We need to update it, but unfortunately, there are arrays which are indexed by the gNode ID, meaning we can't just increment the long all the time. With the 32 bit approach this was OK, since the ID values were packed from the bottom. But if we move to an incrementing number for gNodes (simplifying things greatly - and speeding them up) then we will need a new on-disk structure for this array.

OK, this isn't describing Mulgara now. It's really my recent musings on making it all faster.

The Server

Andy was waiting for the server to start up, and his TQL client appeared to be getting confused with the intermediate startup state. There wasn't a lot said here, but I want to reiterate it anyway.

IMHO The server is WAY too heavy. I'm all for the services provided... but I think they need to be provided in an external framework, and let the database be a module that gets loaded by that framework. The fact that it starts so many services really bothers me. I'd fix this, if I had time.

Mind you, I'm being a bit harsh when I say "fix". It works. It's just I believe it needs to be made of smaller parts, which are either independent, or build on one another. The current server is monolithic.

Current Work

Since I can't do much at work during my "two weeks notice", I've been asked to stay at home this week. I'm still being paid, and have to be available if I'm needed, but in reality it's just a holiday. With the visa interview next week I'm not as relaxed as I'd like, but it's been a good week. I've enjoyed spending more time with Anne and the boys, along with my mother-in-law, who left here on Tuesday. But after having a few days to clear my head, I'm trying to get back to Mulgara. Unfortunately, my new computer has not arrived yet, so I'm back to my old G4 PowerBook in the meantime. It's fine for use with VIM and even Safari, but it's choking whenever I try to do real work on it.

I've spent a couple of days trying to catch up on email, and now I'm looking at getting back to actual coding. I should be doing SPARQL (and I'm looking forward to that), but I allowed myself to get side-tracked on some performance code.

String Pools

Indy offered to do some profiling of a large load, and immediately came back to show me that we spend most of our time reading the string pool. This makes sense, as every triple that gets inserted needs to be "localized" into internal graph nodes (gNodes). This means searching for the URI or literal, and getting back the associated gNode, or creating one if it doesn't exist.

There are two ways to index something like a string and map it to something else. You can use a hashmap, or a tree. Hashmaps are much faster with constant complexity, but have a number of problems. They take up a lot of disk space, they can be expensive if they need to expand, and they provide no ordering, making it impossible (well, not impossible, but you have to do tricky things) to get ranges of values. Trees don't suffer from any of these problems, but they have logarithmic complexity, and require a lot of seeking around the disk.

For the moment, the string pool maps URIs and literals to gNodes by storing them in a tree. It's an AVL tree, to reduce write complexity to O(1), though we don't store multiple values per node (unlike the triple indices), meaning the tree is very deep.

The code has many, many possibilities for optimization. We'll be re-architecting this soon in XA2 (there's going to be a big meeting about it in SF next month), but for the moment, we're working with what we have.

The first thing that Indy noticed was some code in Block.get(int,ByteBuffer). This was iterating its way through copying bytes from one buffer to another. This seems ludicrous, especially when the documentation to ByteBuffer.put(ByteBuffer) explicitly describes how it is faster than doing the same thing in an iterative loop. A simple fix to this apparently sped up loads by 30%! I wasn't profiling this, so I can't vouch for it, but Indy seemed certain of the results.

Initially I had though that this couldn't have been code from David an myself, but I checked back in old versions of Kowari, and it's there too. All I can think of is that one of us must have been sick, and the other absent. At least it's fixed. I've also noticed a couple of other places where iterative copies seem to be happening. I'd like to fix them, but there may be no point. Instead I'll let the profiler guide me.

After thinking about it for a while, I started wondering why one buffer was being copied into another in the first place. The AVL trees in particular are memory mapped, whenever possible, and memory mapping is explicitly supposed to avoid copying between buffers. Buffer copies may seem cheap compared to disk seeks, but these are regularly traversed indices, so the majority of the work will be done in cached memory.

A little bit of inspection showed me what was going on. The first part of a URI or strong (or any kind of literal) is kept in the AVL tree, while anything that overflows 72 bytes is kept in another file. The code that does comparisons loads the first part of the data into a fresh buffer, and appends the remainder if it exists, before working with it. However, much of the time the first part is all that is needed. When this is the case there is no need to concatenate 2 separate buffers together, meaning that the original (hopefully memory mapped) buffer can be used. I fixed this in one place, but I think it's appearing in other areas as well. I'll have to work through this, but again I shouldn't go down any paths that the profiler doesn't deem necessary.

Dublin Core PURL

After making my adjustments, I tried running the tests, and was upset to see that 5 had failed. This seemed odd, since I'd worked on such fundamental code that a failure in my implementation should have prevented anything from working at all, rather than stopping only 5 tests. I had to go out this morning, so it bothered me for hours until I could check the reason.

It turned out that the problem was coming from tests which load RDF from a URL: http://purl.org/dc/elements/1.1. Purl.org is the home of persistent URLs, so if a document ever changes location, the URL for it does not need to be changed. So using this URL in a test seems appropriate, providing you can assume an internet connection while testing. But unexpectedly, the contents of this file changed just last week, which led to the problems.

Given that this is a document describing a standard for Dublin Core, and given that it has a version associated with it, I am startled to see that the contents of the file changed. Shouldn't the version number increase? While I find it bizarre, at least I found it before people started complaining about it. It will be in the next release.

Moving On

Now that I've addressed the initial profiling black spots, I can move on to the things I ought to be doing, namely SPARQL (being an engineer I would prefer to be squeezing every last drop of performance out of this thing, but I have to manage priorities). I have to talk to a few people about Mulgara tomorrow, but aside from that, I'll be working in the JavaCC file most of the day.... I hope.

Sunday, January 13, 2008

Mulgara Correspondence

Recently I've been in a few email discussions with Andy Seaborne about the architecture of Mulgara. He's been looking at a new Jena-Mulgara bridge, but when he's had the time it appears he's been looking into how Mulgara works. There are certainly areas where Mulgara could be a lot better (distressingly so), so we will be changing a number of things in the not-to-distant-future. But in the meantime I'm more than happy to explain how things currently work. It's been a worthwhile exchange, as Andy knows what he's on about, so he's given me some good ideas. It's also nice to talk about some of the issues with indexing with someone who understands the needs, and can see the trade offs.

Since I wrote so much detail in some of these emails, I asked Andy (just before he suggested it himself) if he'd mind me posting some of the exchange up here. One could argue that if I hadn't been writing to him then I'd have had the time to write here, but the reality is that his questions got me moving whereas the self-motivation required for blogging has failed me of late.

There will be a lack of context from the emails, but hopefully I'll be able to edit it into submission. I should also issue a warning that what I wrote presumes you have some idea of what RDF is, and that you can look at the Mulgara code.

AVL Trees

If you want to keep read/write data on disk, and have it both ordered and efficient to search at the same time, then a tree is usually the best approach. There are other things you can do, but they all involve a tradeoff. Trees are usually considered the best thing to go with.

Databases usually have B-trees of some type (there are a few types). These work well, but Mulgara instead opted to go with AVL trees, with a sorted list associated with each node. This structure is much more efficient for writing data, but less efficient for deletion. This suits us well, as RDF is often loaded in bulk, and it gets updated regularly, but bulk deletions are less frequent. I mention the complexity of this later on.

Andy asked about our AVL trees, with comments showing that he was only looking at one of their uses. I think that understanding a particular application of this structure is easier if the general structure is understood first.

AVL trees are used in two places: The triple indexes (indices), and the "StringPool" (which is really an Object pool now).

The trees themselves don't hold large amounts of data. Instead each node holds a "payload" which is specific to the thing they are being used to index. In the case of the "triples" indexes, this payload includes:
  • The number of triples in the block being referenced.
  • The smallest triple stored in the block.
  • The largest triple stored in the block.
  • The ID of the 8K block where the triples are stored (up to 256 of them).
I'm only using the word "triple" because that's what we stored once upon a time (circa 2001). In reality, we store quads. On the first pass, the fourth value was a set of security values, but this quickly became a graph ID. Unfortunately, this happened back when everyone referred to graphs as "models", so you'll see the code uses the name "model" instead of "graph" everywhere. (I'd like to change this).

There is also some inefficiency, as we use a lot of 64 bit values, which means that there are a lot of bits set to zero. There are plans to change the on-disk storage this year to make things much more efficient. Fortunately, the storage is completely modular, so all we need to do to use a new storage mechanism is to enter the factory classes into an XML configuration file.

The code in org.mulgara.store.statement.xa.XAStatementStoreImpl, shows that there are 6 indices. These are ordered according to "columns" 0, 1, 2, and 3, with the following patterns: 0123, 1203, 2013, 3012, 3120, 3201. The numbers here are just a mapping of: Subject=0, Predicate=1, Object=2, Model=3. Of course, using this set of indices lets you find the result of any "triple pattern" (in SPARQL parlance) as a range inside the index, with the bounds of the range being found with a pair of binary searches.

We use AVL trees because they are faster for writing than B-Trees. This is because they have an O(1) complexity for write operations when doing insertions. They can have O(log(n)) complexity while deleting, but since RDF is supposed to be about asserting data rather than removing it, then the extra cost is usually OK. :-)

The other important thing to know about Mulgara AVL trees is that they are stored in phases. This means we have multiple roots for the trees, with each root representing a phase. All phases are read-only, except the most recent. The moment a phase updates a node, then it does a copy-on-write for that node, and all parents (transitively) up to a node that has already been copied for the current phase, or the root (whichever comes first). In this way, there can be multiple representations of the data on disk, meaning that old read operations are always valid, no matter what write operations have occurred since then. Results of a query are therefore referencing phases, the nodes of which can be reclaimed and reused when the result is closed, or garbage collected (we log a warning if the GC cleans up a phase).

Because all reads and writes are done on phases, the methods inside TripleAVLFile are of less interest than the methods in the inner class TripleAVLFile.Phase. Here you will find the find methods that select a range out of an index, based on one, two, or three fixed values.

The String Pool also uses an AVL tree (just one), though it has a very different payload. However, the whole phase mechanism is still there.

Object Pool

Andy noted that the comments for the ObjectPool class say that it is for reducing constructor overhead, but a cursory inspection revealed that they did more.

There's some complexity to avoid pool contention between multiple threads. Each pool contains an array of "type pools" (see the inner type called ObjectStack), indexed by an (manually) enumerated type. You want an object of type ID=5, then you go to element 5 in that array, and you get a pool for just that type. This pool is an ObjectStack, which is just an array that is managed as a stack.

Whenever a new ObjectPool is created it is chained onto a singleton ObjectPool called the SHARED_POOL. To avoid a synchronization bottleneck, each thread uses the pool that it created, but will fall back to using the "next" pool in the chain (almost always the SHARED_POOL) if it has run out of objects for some reason. Since this is only a fallback, then there shouldn't be much waiting.

I know that some people will cringe at the thought of doing object pooling with modern JVMs. However, when Mulgara was first written (back when it was called TKS) this sort of optimization was essential for efficient operation. With more recent JVMs, we have been advised to drop this pooling, but there have been a few reasons to hold back on making this change. First, we have tried to maintain a level of portability into new versions of the JVM (this is not always possible, but we have tried nonetheless), and this change could destroy performance on an older JVM. Second, we do some level of caching of objects while pooling them. This means that we don't always have to initialize objects when they are retrieved. Since some of this initialization comes from disk, and we aren't always comfortable relying on the buffer cache having what we need, then this may have an impact. Finally, it would take some work to remove all of the pooling we do, and recent profiles have not indicated that it is a problem for the moment. I'd hate to do all that work only to find that it did nothing for us, or worse, that it slowed things down.

32/64 Bits and Loading

Andy was curious about our rate of loading data on 32 bit system or 64 bit systems, given simple data of short literals and URIs (<100 characters or so). Unfortunately, I didn't have many answers here.

In 2004 Kowari could load 250,000,000 triples of the type described in under an hour on a 64 bit Linux box (lots of RAM, and a TB of striped disks). However, it seems that something happened in 2005 that slowed this down. I don't know for certain, but I've been really disappointed to see slow loads recently. However, I don't have a 64 bit Linux box to play with at the moment, so it's hard to compare apples with apples. After the SPARQL implementation is complete, profiling the loads will be my highest priority.

64 bit systems (excluding Apples, since they don't have a 64 bit JVM) operate differently to 32 bit systems. For a start, they memory map all their files (using an array of maps, since no single map can be larger than 2GB in Java). Also, I think that the "long" native type is really an architecturally native 64 bit value, instead of 2x32 bit values like they have to be on a 32 bit system. Since we do everything with 64 bit numbers, then this helps a lot.

After writing this, Inderbir was able to run a quick profile on a load of this type. He immediately found some heavily used code in org.mulgara.store.xa.Block where someone was doing a copy from one ByteBuffer to another by iterating over characters. I cannot imagine who would have done this, since only DavidM and I have had a need to ever be in there, and we certainly would not have done this. I also note that the operation involves copying the contents of ByteBuffers, but this doesn't make a lot of sense either, since the class was built an an abstraction to avoid exactly that (whenever possible).

I haven't seen the profile, but Inderbir said that a block copy gave him an immediate improvement of about 30%. I'd also like to check the stack trace to confirm if a block copy is really needed here anyway. Thinking about it, it might be needed for copying one part of a memory-mapped file to another, but it should be avoided for files that are being accessed with read/write operations.

Embedded Servers

Andy was also intrigued at the (sparse) documentation for Embedded Mulgara Servers. While on this track, he pointed out that LocalSession has "DO NOT USE" written across it. I've seen this comment, but don't know why it's there. I should look into what LocalSession was supposed to do. In the meantime, I recommended not worrying about it - I don't.

The Session implementation needed for local (non-client-server) access is org.mulgara.resolver.DatabaseSession. It should be fine, as this is what the server is using.

When doing RMI, you use RemoteSessionWrapperSession. I didn't name these things, but the standard here is that the part before "Wrapper" is the interface being wrapped, and the thing after "Wrapper" is the interface that is being presented. So RemoteSessionWrapperSession means that it's a Session that is a wrapper around a RemoteSession. The idea is to make the Session look completely local. The reason for wrapping is to pick up the RemoteExceptions needed for RMI and convert them into local exceptions. At the server end, you're presenting a SessionWrapperRemoteSession to RMI. This is wrapping a Session to look like a RemoteSession (meaning that all the methods declare that they throw RemoteException). Obviously, from the server's perspective, the Session being wrapped here must be local. And the session that is local for the server is DatabaseSession. So to "embed" a database in your code, you use a DatabaseSession.

The way to get one of these is to create an org.mulgara.resolver.Database, and call Database.newSession(). Databases need a lot of parameters, but most of them are just configuration parameters that are handled automatically by org.mulgara.resolver.DatabaseFactory. Look in this factory for the method:
 public static Database newDatabase(URI uri, File directory, MulgaraConfig config);
A MulgaraConfig is created with using the URL of an XML configuration file. By default, we use the one found in conf/mulgara-x-config.xml, which is loaded into the jar:
 URL configUrl = ClassLoader.getSystemResource("conf/mulgara-x-config.xml");
MulgaraConfig config = MulgaraConfig.unmarshal(new InputStreamReader(configUrl.openStream()));
config.validate();
(configUrl has a default of: jar:file:/path/to/jar/file/mulgara-1.1.1.jar!/conf/mulgara-x-config.xml)

As an aside, it's supposed to be possible to do all of this by creating an EmbeddedMulgaraServer with a ServerMBean parameter that isn't doing RMI. Unfortunately, there are no such ServerMBeans available. (Maybe I should write one?)

Also, I believe that the purpose of the embedded-dist Ant target is to create a Jar that has these classes along with all the supporting code, but without anything related to RMI. So the embedded Jar should be all you need for this, but I haven't used it myself, so I'm just making an educated guess. :-)

SPARQL

Since I had been working on this up until mid-December, it is worth noting where I am with it.

TQL includes graph names as a part of the query, and graph names have been URLs (not URIs) - meaning that they include information on how to find the server containing the graph. Unfortunately, the guys who wrote TQL integrated the session management into the query parsing (if that doesn't make you shake your head in disbelief then you're a more forgiving person than I am). I've successfully decoupled this, and now return an AST that a session manager can work with. This also means that graph names no longer have to describe the location of a server, meaning we can now support arbitrary URIs as graph names. This now puts the burden on the session manager to find a server, but that's easy enough to set up with configuration, a registry, or scanning the AST for graph names if we want backward compatibility.

The next part has been parsing SPARQL. (Something that Andy should be intimately familiar with, given that his name is all over the documents I reference).

With so many people talking about extensions to SPARQL, and after discussing this with a few other people, we decided to go with an LALR parser. This means I've had to write my own lexer/parser definition, instead of going with one of the available definitions, like the JavaCC definition written by Andy. We do have SableCC already in Mulgara, but everyone present agrees that this is a BAD LALR parser so I had to use something new. I chose Beaver/JFlex. It's going well, but I still have a lot of classes to write for the CST. The time taken to do this has me wondering if everyone is being a little too particular about the flexibility of an LALR solution, and maybe I should just go back to Andy's JavaCC definition. OTOH, I really like Beaver/JFlex and having an independent module that can do SPARQL using this parser may be a good thing.

Fortunately the SPARQL spec now has a pretty good grammar specification and terminals, though one or two elements seemed redundant, and I've jumped over them (such as PNAME_LN. Instead I defined IRIref ::= IRI_REF | PNAME_NS PN_LOCAL? ). I've been getting some simple CSTs out of it so far, but have a way to go yet.

Once I have it all parsing, of course, I have to transform the result into an AST. Fortunately, most of the SPARQL AST is compatible with Mulgara. The only exceptions are the FILTER operator (SPARQL's "constraints") and the OPTIONAL operator. I'm pretty sure I can handle OPTIONAL as a cross between a disjunction (which can leave unbound variables) and conjunctions (which matches variables between the left and the right). Filters should be easy, since all our resolutions are performed through nested, lazy evaluation of conjunctions and disjunctions. Handling the syntax of filters is another matter, but I expect it to be more time consuming than difficult.

Holdup

Since writing these comments about implementing SPARQL, I haven't had time to work on it again. Hopefully that will change soon with the new job. But in the meantime, the loss of time has me thinking that I should reconsider using a pre-built SPARQL definition for a less expressive parser, and come back to Beaver/JFlex at a later date.

I've heard that the Jena JavaCC grammar may be a little heavily geared towards Jena, but I've been given another definition by my friend Peter which is more general and apparently passes all the relevant tests. I suppose I should go and learn JavaCC now.

Saturday, January 12, 2008

Work

Between the realities of working hours, young children, the need for exercise, etc, I just don't make the time to blog that I used to. But the main reason I rarely blog now is because of my job. It can be hard to know what you can talk about when you work in a closed source world. However, there have been a few changes here lately.

In the latter part of last year, I was asked to write a SPARQL implementation for Mulgara. Two and a half years ago I was told I'd get to do a reasonable amount of Mulgara work, but when it came down to it I could only write for Mulgara in my evenings and weekends. I know that most open source developers are limited like this, but it's still not easy when you have small children. It was also frustrating, given that I had different expectations.

So I was pleased to be given this new task. SPARQL is sorely needed, and I was more than happy to do it during working hours. Since I was back to open source work, I could have blogged more, but I was trying to use all my spare moments on the computer to get ahead with the project. That isn't always as productive as it appears, as the process of blogging can really help with programming, but it can work for a short term push.

Then just before Christmas, a number of people left the company I work for, including the guy who authorized me to work on Mulgara. So I was asked to stop, while everything was worked out. With all of the Semantic Web staff leaving except for me, the company can't really continue in this area. The word I was getting over the break was that the owner of the company didn't "know what to do" with me. I can speculate on what might happen as a result, but I won't do that here. It certainly wouldn't involve Mulgara work.

Fedora Commons

There is a lot I want to accomplish on Mulgara in the near term, as I think an open source framework with the capabilities we are aiming for will enable a number of significant developments. If I can participate in getting Mulgara there, then perhaps I can play a part in what happens later. To this end, I have accepted a position with Fedora Commons to work on Mulgara full time.

Fedora have been building their code on top of Mulgara (and Kowari before that) for some years. They work with the Topaz Project and together these groups have provided technical infrastructure for the Public Library of Science (PLoS) (see the PLoS-ONE open access journal for an example of a deployment that uses Topaz and Fedora, along with Mulgara). I'm just starting to get a feel for the various relationships, so I'll leave the description there.

The important thing from my perspective is that both Topaz and Fedora Commons have a charter that supports the use and deployment of open source software. Also, PLoS is about making research material available to the entire community, enabling research to reach everyone who should see it. Despite commercial interests to the contrary, there are many people who think this needs to happen (a good interview on this is here), as even the US government has made moves in this direction. So this work not only fits in with my own goals, it also helps enable something I really believe in.

Role

In the course of negotiating this position with Fedora Commons, I realized that an exact statement of my roles and responsibilities had not been made. So I thought about exactly what I'd like to do, and proposed that. While I knew that our goals were aligned, I was still pleasantly surprised to have them come back and agree with me completely. I'm pretty happy about this... I've never been in a position to name exactly what I wanted to do before. :-)

So my work will basically come down to 3 things:
  • Mulgara development.
  • Consulting with Topaz and Fedora Commons on architecture and design.
  • Supporting and growing the Mulgara community.
Of course, all of these are to be done in alignment with Fedora Commons priorities, but this has already the case for some time with my after hours work (Fedora Commons and Topaz are the heaviest users of Mulgara at the moment). The second point I put in because I am always happy to do this whenever asked, and I think it is important to keep my hand in when it comes to the bigger picture. And the last point? Well, that's what makes this new position so cool. :-)

Visa

Of course, I need to get a new visa for the new position. Since visas are only issued expeditiously by a US consulate, then I need to travel to Canada in order to get it (it takes 6 months if I don't want to travel). It ought to be easy, but if for some reason the application gets denied, then I'm not even allowed back in the USA in order to "settle my affairs". Of course, that would be a nightmare for Anne, having to pack up our house while looking after two children under 4. I understand the risk is small, but with such dire consequences I'm feeling a little nervous. I don't think I'll feel really happy about the new job until I have the visa paperwork that guarantees it for me.

Notice

So I've handed in my two weeks notice. I made sure I had met all my commitments before resigning, and my current boss is traveling overseas for 3 weeks starting tomorrow, so I have no idea what I'll be doing for the next 14 days. If I can, I'll start on SPARQL again. After all, the company I'm leaving is still using Mulgara in some of it's projects, so they'll still benefit from the work. I guess I'll find out for sure on Friday.