Sunday, March 09, 2008

Review

You know you've been lax keeping up with your blog when your mother comments that you haven't updated it in a while.

Part of the reason for my silence has been due to a lot of changes going on for me lately, some of which I was obliged to keep quiet about at the time. More recently, I've been working hard on Mulgara, and when it's come to a choice between coding or blogging, then coding had a higher imperative. But today I find myself in SFO feeling to wrung out to code, so it seems like a good opportunity to play some catch up on my blog.

Talis

Way back in the middle of 2007 I was contacted by Talis who were wondering if I would be interested in working with them on semantic web systems, and possibly on Mulgara. My job at the time (with Herzum Software and the spin-off fourthcodex) was supposed to be based on Semantic Web technology, with a sizable proportion devoted to Mulgara. However, this had not happened for the 2 years I had been there, and so I was willing to consider this proposal. Also, I was getting great enjoyment and occasional inspiration from Paul Miller's Talking with Talis interviews (and even gaining an interest in libraries, courtesy of Richard Walis's productions). I'd also met Ian Davis at SemTech earlier in the year, and had noted with interest that Danny Ayers had recently made the move as well.

So in August I took a few days from work and flew to England for an interview. I was really impressed with the guys in Birmingham, both technically and personally, and had a great time. While my understanding of the details has changed at various times, it seems that Talis have an approach of investing in Semantic Web technology without an requirement of immediate return. They are also providing support to a growing Semantic Web community with the expectation that this will lead to a data infrastructure on which they can layer semantic applications at a higher level than is possible today. To me this seems to be both very forward thinking, as well as operating for the mutual benefit of themselves and the community at large. As an Australian I also found that the similarities in culture with the British gave me a level of comfort beyond what I usually have here in America.

Whether I would be working in semantics, or in the storage layer to enable semantic work by others, this really seemed like a place I'd enjoy working. However, the position would be telecommuting, and I need a visa sponsor while I live here in the USA. Talis were aware of this, and though they said they were in the process of setting up a legal entity over here, the delays this brought about have led to events overtaking this opportunity.

That said, I'm still trying to keep channels open with everyone there, and I'm hoping that I'll be able to work with them in the future, in whatever capacity that may be.

Google

Shortly before the trip to England, I found myself thinking of distributing immutable tree nodes (from Mulgara's internal storage) over a cluster, with the idea of improving scalability of speed and size for RDF storage. These thoughts led to ideas of leveraging a system like the GFS or BigTable. Hadoop is also interesting in this regard, but not as advanced or scalable as the systems at Google. With this in mind, and being particularly frustrated at work, I checked out the Google jobs page, and discovered that they had engineering positions available in Chicago. So I filled in their online forms and sent it off. Disappointingly, the next day I received a form-reply email explaining that I wasn't what they were after.

A few weeks later I met Eric Olson at Tech Cocktail. Eric was still working at Google at the time, and said that he'd mention my name. I have no idea if he did or not, but a couple of weeks later, a Google recruiter in California rang me and asked if I would be available for a phone interview. This was delayed while I went to England, and then delayed further as that recruiter left and another took on my case, but it finally happened in September. It was very strange to do an interview again, when I've conducted so many in the last couple of years. I've also managed to avoid the "normal" interview process for most of the last decade, since I have usually been interviewed or offered positions by people who already knew me, either personally or by reputation.

All the same, this interview went well, as did the next phone interview. So Google organized tickets for me to fly out to Mountain View and interview on site. I hadn't seriously considered a job with them to this point, but I thought it would be interesting to follow the process through.

Visiting the Mountain View campus was quite an experience. It is vast, and has been gradually subsuming the surrounding business district in recent years. Getting around is often done by shuttle bus, or bicycle. People bring their own bikes, but there are a number of Google bikes parked around the place, with helmets available in large bins in the lobby of each building. Not having been given a building number to go to, I started at the central building, where I was quickly spotted and assisted by a security guard. Indeed, I was very impressed at the rapid and efficient response of on-campus security, especially as they were also very helpful and courteous.

The receptionist I was directed to was also helpful, showing where I needed to go, arranging a shuttle bus, providing a visitor's badge and directions, and a fruit juice (Google have large fridges full of Naked juice in every lobby I saw. They also have more exotic flavors available than I have seen anywhere before or since).

Passing by the truck that had come to provide cheap haircuts to staff, I proceeded by a central courtyard which had a full sized Tyrannosaurus Rex skeleton (with pink flamingo in it's mouth - several of it's cousins scattered the lawn) and a large sign proclaiming that there would be a Farmers' Market there at 11am that day.

One bus trip later, I was where I needed to be, and being given a tour of the building. The variety of free coffee and other beverages was really impressive, as was the local version of Google's famous cafeterias. But the thing that really got me was seeing a projected list of Google's text searches scrolling up the wall. These are not done in real time (they would go by too fast) and have been filtered for inappropriate content (no searches for pornography, for instance), but they still served to drive home exactly where you were. This was ground zero. Those searches were resolved here.

The queries were also interesting to watch go by. There were questions on movies, Britney Spears, medical conditions, landmarks, and many questions in foreign languages, some of which were in foreign character sets, like Simplified Chinese. Watching these going by, it is immediately apparent where ideas like Google Zeitgeist came from.

I then went on to have my interviews. There were about 4 of them, with a break for lunch which I had with one of the people I'd had a phone interview with. While a few of the questions were more general, most of them were about how I'd solve programming problems, with an emphasis on doing things to a "Google level of scaling". Funnily enough, my last few years of Mulgara work were perfect for this. On a couple of occasions I even found myself describing code I had written, rather than describing an abstract answer. I also got the chance to ask more about how Google works, and what it's like to be there. I was impressed by everyone's enthusiasm for their work, and for the company culture in general. A couple of people I spoke with also had children, and while they admitted that in the past Google had not been very good at supporting people with young children, in recent years this had improved significantly. But the thing that everyone talked about the most was the "perks". These extend into areas you couldn't imagine, and they are constantly evolving. Unlike most companies who occasionally institute a perk for their staff, possibly guided by a suggestion box, Google has a department whose sole mission it is to identify and implement perks.

Finally the day came to an end, and I was able to head up to San Francisco. I had a very enjoyable evening with Peter and Trish, and the next day spent several hours having Mulgara discussions with Amit and Ronald at Topaz. I was very pleased to get in this last meeting, and had shuffled things around with Google to make sure it could happen.

As most of my friends know, a few weeks later Google made me an offer. While the base salary was simple enough, I was bemused at the complexity of the arrangements for paying bonuses, stock options, and common stock. It is the first job offer I've ever had that came with a set of equations attached. While not going into details, I will say that it was very lucrative - if you came close to meeting your goals. I hadn't really considered accepting an offer until this point, but an offer like that would make anyone seriously reconsider. Consequently I agonized over this for a couple of weeks, right up to the deadline that Google set. In the meantime, I visited the Chicago site (where I insisted I would want to work, despite being asked several times if I'd move to Mountain View), and again was impressed with their setup. In fact, I've had a few people suggest that the setup at Mountain View is getting a little out of control in some ways, but this was not an issue for Chicago at all.

I finally decided to turn Google down, and let them know as soon as I got back from Thanksgiving. I'd had advice from a few people, including some from inside of Google, who all pointed out that my work in the Semantic Web would be totally subsumed by working at Google. I had thought to do something with the "20% projects" that Google is known for, but it was pointed out that because bonuses are based on meeting (and exceeding) goals, then the option to use 20% of your time on something not related to your immediate work was often forgone. You also have to wonder how much of your bonuses, options, and common stock you'd get to see if you tried to keep a balanced lifestyle and didn't achieve your annual goals (apparently these are supposed to be set at a level that is challenging to achieve).

Another serious consideration was one I hadn't expected. Despite having signed an NDA, I learned nothing about Google that isn't already known to the public. Consequently, to an outsider it looked like the company was not doing anything really "interesting". I'm sure they are, but there was nothing inspiring about what they had to tell me. For most of the things I considered to be "cool" technology, I was told that those things were pretty much done, and the work they now do is in different areas altogether. In fact, the majority of the people I spoke to worked in AdWords and Billing. They were very enthusiastic about their work, and given the novelty of their service and the scale they have to work at, then I'm sure it's challenging and interesting work, but it didn't inspire me at all.

Most of all, I've spent my career working with people who know a lot more than I do, to my enjoyment and benefit, and yet, no one I spent time with really impressed me with their knowledge of skills. Don't get me wrong - they were all quite competent and intelligent people. But I really expect something special out of the people I work with, if they are to bring out the best in me. Now I know that Google has employed some of the brightest people in the industry, but the sheer size of the company convinced me that I'm unlikely to find myself working with those people.

For those not paying attention, these last few paragraphs are all a means of justifying to myself that I made the right choice. It wasn't an easy choice to make, since Google does seem like a cool company, the perks were huge, and the remuneration was potentially substantial. But I'm pretty sure I did the right thing, and as one friend said, he thinks it is much cooler to say that you've turned down a Google offer than to have accepted one. :-)

Fedora Commons

Coming up to Christmas, I was finally getting a chance to do some Mulgara work during office hours. This was a huge thing for me, as I had been getting more and more frustrated about it for the previous two years when I was supposed to be doing this. Then in the final days before Christmas my boss, and several others I worked with in fourthcodex, decided that they wanted to do something different in semantic technologies, and resigned. Without a team to work with, there wasn't a lot of scope for me to do semantic work any more, and I was told to stop working on Mulgara again. Sigh.

While some semantic options were being pursued, the fact remained that Herzum Software desperately needed some more senior coders, and it looked very much like I would end up on projects that were of little interest to me. A notable one here was a .Net project that would have me working on site in Pittsburgh. This was something that nobody wanted, including my family, and everyone I was working with on Mulgara.

Talis tried to help at this point (and I'm very grateful that they did), but their interim solution would have made it illegal for Anne to keep her new business running, and I couldn't do that to her. But then, Topaz and Fedora Commons came back to me with an offer to work for them (which distinct organizations, there is an administrative relationship between them, and both are contributing to the Public Library of Science). I've already written about my decision to accept this, which brings me up to today.

I've officially been working for Fedora Commons for about a month now. I've been dividing my time between the SPARQL implementation and responding to support and debugging requests. However, this week has been different. We got all the developers from Topaz and Fedora Commons together, to discuss our plans for the year, and how to manage the process. Mulgara has also been generating some more external interest again, and since we form the core of the active developers, we wanted to discuss ways in which we can work with the community, particularly developers.

Features

The most important features we are implementing in the coming year are SPARQL, multiple concurrent writers, and significantly greater scalability. We have been talking about the last one for a long time, but no one has had the time (or money) to do anything about it. This has now changed, and the work is commencing very soon now. It's been a long time in coming, so I'm quite inspired to get it done now.

Andrae was present for the meeting, and presented some very impressive results to his research on transactionality for multiple writers on an RDF graph. Not only has he demonstrated a mathematically sound foundation for this work, but he has also included an impressive level of engineering for scalability in his designs.

In the meantime, I have come up with a new scheme for indexing RDF, which appears to have significantly better complexity results than what we currently do. Fortunately, the majority of this work is orthogonal to Andrae's designs, with the consequence that the improvement to scalability will be cumulative between both redesigns. I'm pretty chuffed at this. :-) I will be writing more on the indexing shortly, but I have been under some pressure to write this up as an academic paper as well, so that may take priority over my blog.

Significantly, we had James Leigh from Aduna at the meeting as well. Aduna are the company behind the Sesame RDF store, which has been one of the big open source alternatives to Mulgara. They are interested in merging our systems to a certain extent, to the benefit of both. After hearing James out, it sounds like a really good idea (though I may end up throwing away the SPARQL parsing that I've finished - sigh again). I'm not sure when it will happen as everyone as a lot of immediate priorities to get through, but everyone has expressed support for implementing the SAIL API on Mulgara. This is very significant for us, as it will provide a host of new reasoning features, the ability for existing Sesame users to easily try Mulgara, and a SPARQL protocol interface (I'd just been working on the query language for the moment). In turn, I'm hoping that we can demonstrate these new levels of scalability and concurrency for Sesame.

A lot more came out of the meeting, but that was the crux of it. Rather than pre-empt some of the things that are still in motion, I'll let others explain their end of things.

I'm very happy to see this level of interest in Mulgara, and I'm excited to see all these new features starting to be realized at last.

No comments: