Thursday, June 07, 2007

Nova

I only heard about Radar Networks for the first time last year when conversing with Peter Royal. Being in "stealth mode" (I dislike this form of corporate speak, but it avoids pleonasms) there wasn't a lot to learn about them, except that they're working with various technologies tied up with the Semantic Web.

Nova Spivack is a name I only started to hear in more recent times, though I'm sure if I'd been in the Valley (or even the USA), then I think I'd have heard of him by the mid 90's. It was only during SemTech that I got to hear him speak for the first time, but since then I've started going back through his blog, and just now heard his interview with Paul Miller at Talis. If you want to know more about Nova, then I recommend the interview, just for the background list of projects he's been involved in during his career.

The fact that Nova had been at Thinking Machines caught my attention, for no other reason than the presence of Richard Feynmann. I have to digress here, because Feynmann is one of my greatest luminaries. The entire reason I studied physics (and would like to get back to it one day) can be traced back to Einstein and Feynmann (with a little nudge from Shor). Danny Hillis's essay on Feynmann has stayed with me for years. I had already read a lot about Feynmann, but this essay gave him a characterization you don't usually find elsewhere. It shows him as a person who had his flaws, but someone who thought no more of himself, nor less of others. The two things that really stuck with me from Hillis's essay were Feynmann's solution of a digital problem using PDE's (no wonder the man was my hero!), and a comment made about him by a recipient of his occasionally sexist behavior, when she said, "On the other hand, he is the only one who ever explained quantum mechanics to me as if I could understand it."

Thinking Machines did some really cool things, and even without Feynmann (Sr) the company is worth mentioning purely on its own merit. So Nova's presence there impressed me.

But the real reason that I'm following what Nova says at the moment is because he makes so much sense. Which is to say, I agree with much of what he has to say. Then there are times, such as during the Talis interview when I wanted to yell at my iPod. However, I was on a bus at the time, so I refrained, for fear of appearing to be a nutter.

All the same, anyone eliciting this sort of reaction from me is worthy of a comment.

Mulgara Optimizations

The reason I wanted to yell through my iPod came in the final 7 minutes of Nova's Talis interview.

Nova described Kowari as the "low end" version of TKS (Tucana Knowledge Server), which wasn't actually true. It was TKS, only it missed a couple of elements, such as the JAAS classes (security) and the distributed querying module. Nova explained that Radar Networks requires massive "federation" capabilities, which was not in Kowari, but then it wasn't really in TKS either.

"Federation" in TKS was really just distributed querying. This meant you could perform joins and unions between data from different servers. It's a useful capability, and no one else was supporting that sort of thing at the time, so this was a reasonably significant feature. After all, a single query could pull in data from an arbitrary number of servers, and form connections efficiently across that data. But I wouldn't describe TKS as having "massive" federation capabilities.

The main problem with this arrangement was that all remote data would come back into the first server described in the query and get joined there. There was no attempt to optimize network traffic. I've already explained this issue when I wrote about the new Mulgara distributed resolver a short while ago. Then, as now, I knew how to solve the problem in the main, along with some heuristics for iterative optimizations once the main work has been done. Then, as now, I just didn't have the time to implement these optimizations.

All the same, if TKS was acceptable because of its distributed queries, then Mulgara should be too, since I've implemented it again. Only this time I've done a better job of it, and I've included more features!

Nova also mentioned that he's aware of Mulgara, and the fact that we're looking at significantly greater scalability through the implementation of the new XA2 storage layer. He sounded interested in that.... only there's so much more that he doesn't know about, which I wish I could have told him. I've been talking with David, Andrae and Amit about a new set of ideas which will take the scalability several steps further along. Unfortunately, I just haven't had the time to document it for Mulgara's wiki or in this blog. Incidentally, all the current docs are in Google Docs (which I write about below).

Essentially, there are 4 types of improvements:
  • Optimizations on the existing architecture. Small, quick hits with incremental improvements, but we lose the benefits when we go to XA2.
  • Rearrangement of the indexing to take advantage of parallelism in hardware. It was never done in the first place because we had less capable hardware in 2001. These changes are small, quick hits, that apply to XA2 as well.
  • Restructuring of our systems to move into clustering. There are a remarkable number of opportunities for us here to take advantage of clustering. In particular, the phase-tree architecture lends itself to some interesting possibilities. A lot of this can also be applied to XA2.
  • A whole new design, architected on a colored phase-tree graph and built on top of a "Google File System" style of storage layer.
The first and second items I hope to do in the coming weeks. The third is a big deal, and would need help, since there's just no way I can do this after the kids are in bed, and before I need sleep myself (my only time to work at the moment).

The final one was an "Oh sh**" moment which has had me thinking about it constantly since I first worked it out. It's based on the fact that the phase-tree architecture gives us a mapping of 64-bit tree-node identifiers to immutable blocks of RDF statements, and the GFS lets you create mappings like this that scales out as you add commodity hardware. Believe me, we can make this sucker scale. :-)

The challenge isn't technical, but working out the resources to make this happen. 10 years ago you'd do it by founding a .com startup, but you'd have been dragged through the coals, and not actually been given the chance to release anything. Today's OSS community is a much better option, but building the kind of community that can accomplish this will be a herculean effort. The only way I can see forward is to lift the scalability of Mulgara by a couples of order of magnitude, so we can develop the profile of the project. Then we might have a chance to make this happen.

I mean, who wouldn't want to see a Google-scale RDF store? :-)

Sigh. I just realized that I've now committed myself into writing a lot of detail about how this design works. There go my evenings.

Web 3.0

During his Talis interview, Nova suggested that perhaps we shouldn't even be using the name "Semantic Web". He likes the name "Web 3.0" as it implies that it's the same technology we always had, but we've just been building it up iteratively. Indeed, he even suggests that the 1.0, 2.0 and 3.0 monikers just be used to refer to the decades which have roughly distinguished the capabilities inherent in the web (1.0 for the 90s, 2.0 for the 2000s, and 3.0 for the coming decade).

I was impressed. Someone actually gets it. With all the hype that has surrounded Web 2.0 (AJAX, social networking, linking), the Semantic Web, RDF/OWL, RSS, etc, etc, you'd think that we'd been inventing entirely revolutionary concepts and ideas every few weeks. It's not that these things aren't great (they are), or that they don't enable some new functionality that we could never come up with before (they do). Instead, it's all been a gradual development of abstractions, each of which have allowed us to see just over the next ridge on our climb up the mountain. There have been some really great ideas along the way, but in essence, it's all come down to good ol' engineering, or as Andrae likes to say, "It's just a simple matter of coding."

I still have issues with the names Web 2.0 and Web 3.0. I see Nova's point that version numbers just indicate a progression, but my expectation is that a new major revision number (2.0 -> 3.0, as opposed to a minor revision of 2.0 -> 2.1) implies a revolution in approach and functionality. This is the opposite of what Nova said he's trying to imply. I suppose I also have a problem with the marketing-speak approach taken in adopting the ".0" style of naming.

Marketing-speak makes me cringe. The people using the words often don't know how much their phrase implies, or they mean more than the phrase really carries. Consequently, the meaning of these phrases can be poorly understood, and may evolve over time. Meanwhile the people using them are often using these phrases to sound knowledgeable in a field that they have an incomplete understanding of, or sometimes using them to make other people appear unknowledgeable.

Still, jargon can be really useful, and as I said at the start, it does avoid pleonasms.

Web OS

The concept of a web OS has left me cold for some time. The suggestion has been that the OS is obsolete, and that the entire desktop can be the browser.

I've always felt that this had limited application. It would completely leave multimedia and gaming enthusiasts in the cold. People like many of the graphics features that have been creeping into the OS in recent years (like texturing, alpha blending and 3D effects), and there is nothing in the pipeline yet to get the browser up to these standards. (note: is it time for the next version of VRML yet?) Graphic artists, and other high-end users wouldn't have much support either.

On the other hand, most desktop usage doesn't need this kind of power. Back in the early 90's I was working in computer retail part time while studying engineering. Of course, I was always into the latest and greatest hardware (and still am), but I started to realize that many people were spending far too much money on computer hardware that was well beyond their needs. I would speak with people who had never owned a computer before, and wanted to get one for their small business. They typically had very modest needs that could be filled with a simple record system, a word processor, and a spreadsheet. Word Perfect and Lotus 1-2-3 (both of which ran on an XT!) would have been fine, and yet they were getting 486s with huge hard drives and maximum memory, and running Word 2.0 and Excel on Windows 3.1 (a less stable solution with fewer features, but it did look nice).

We now need absurd levels of processing power just to install a basic operating system and office suite, but 90% of users' needs can be met with what was available in DOS over 15 years ago (dare I say 20?). Seen from that perspective, maybe a web browser can fill the needs of most users.

I recently had to write a document (proposing significant scalability improvements for Mulgara - I hope to get the details up here soon) while on my notebook computer. However, I knew I'd shortly be on my desktop computer at home (a much nicer iMac), and I didn't want to go through the inconvenience of moving documents back and forth. The longer the document lived, the greater the chance that I forget to move it to where I'd need it later before shutting down the machine on which I'd composed the latest version.

One solution for me was to use Apple's iDisk. This came as part of the package when I bought some .Mac web space for Anne, and it works pretty well. It's just a Webdav filesystem with automatic local replication, so it's fast and easy to use. Unfortunately for this purpose, the replication is infrequent, so the notebook probably hasn't uploaded to the server whenever I send it to sleep. Saving directly to Webdav is slow and can lead to unnecessary pauses. The last option was to move the data up to the .Mac space manually, but then I'm back to where I started.

All of this is the perfect argument for the web-based desktop. So I decided to write the document in Google Documents. I'd already used Google's spreadsheets for similar reasons, though not in any serious way. I know there are competitors, and have heard that some of them are pretty good. On the other hand, I knew where to find Google, and I already have an account there, so it made sense to use theirs (I'm sure statements like that send a shudder through ever competitor Google ever had).

The interface was OK, but navigating with the keyboard felt a little clunky. My main problem was with missing features, such as limited styles (something the web taught me to use) automatic numbering of headings. It's on the right track, but it has a long way to go. This was basically the same experience I had with the spreadsheet application a while ago.

The big advantage was that I didn't have to the save the file, and it was there whenever I got onto a machine to look at it. After the reality of saving/copying files and not having access when you wanted it, then this kind of 100% availability was cool. We all know that this is how it's advertised to work, but it's so much nicer when you actually use it. It's enough to make me overlook the UI issues in many cases. It also makes me pleased that Google have such a good track record of keeping their services up, but a little trepidatious all the same. Google Gears suddenly looks really nice, so I'm looking forward to when it enters the user experience.

So for the first time I was feeling receptive to the idea of a Web OS when Nova talked about it recently. I've only mentioned the 24/7 availability of my data from any device connected to the internet, but I am also aware of how far this could spread when data gets linked and integrated across the web, in the ways that Nova describes.

Some of the ideas are compelling, but it still felt restricted to applications like an office suite, rather than the more general computing paradigm implied by integration into the OS. Then, for the first time I saw someone address this issue when Nova said, "When native computation is needed it will take place via embedding and running scripts in the local browser to leverage local resources, rather than installing and running software locally on a permanent basis. Most applications will actually be hybrids, combining local and remote services in a seamless interface."

This statement nicely encapsulates all the issues I had with total integration, and proposes a reasonable way of dealing with it. While browsers don't have the capability for a lot of this yet, people are trying to build it. Firefox is continually expanding in functionality, and AJAX and Google Gears are coming some way to building the required infrastructure. I suppose I can now accept that the integration of these components, along with many others will get us to the WebOS that Nova was talking about. I look forward to seeing it all come together.

2 comments:

Paul Miller said...

Glad to be of service! :-)

Mikael Bergkvist, XIN said...

You can test http://www.widgetplus.com to see it in action