Tuesday, May 29, 2007

SemTech Keynote

Day two had "Communities of Practice" sessions during breakfast. The Pragmatic Rule Standards may have been interesting, but I was more interested in breakfast, especially after the previous night. (Fortunately I didn't have a headache. Will wonders never cease?)

Straight after breakfast there was a keynote session with 5 presenters. I can't quite recall their order, but it started with Robert Shrimp from Oracle.


Robert made a presentation which basically said, "You should use semantic technologies". Maybe that's harsh, but I didn't find it very compelling. Someone next to me (I'd love to say who, as he's well known) asked if I recognized the slides, and indeed they looked just like the corporate presentations made by Oracle in the 80's and early 90's about how and why you should use relational databases. The only apparent difference was the labeling. I don't believe that Oracle have made great headway in the field of semantics (or even semantic stores), so it was always going to be hard for Robert to make this presentation.


Richard Mark Soley from the OMG made a point of saying that he wanted to "upset everyone" with his controversial statements. However, he wasn't that controversial. Instead he seemed to use this to cover his lack of confidence in his claims, which I think was a shame. After all, if his statements were to conflict with anyone's views then he had the fallback position of claiming that he'd intended just that, and had in fact given prior notice. I do the same thing (even in this blog) but it didn't seem an appropriate position of the Chairman and CEO of a group like the OMG.


Andrew Schain of NASA Headquarters spoke for a short while. I remember listening to what he said, but unfortunately none of it stuck. I fear that was more my fault than his.

One thing that I did note is that they are using a system called XACML-DL [PDF] for security descriptions. I've heard about using ontologies to describe security for a while, so I was quite interested in this. Of course, the paper I've linked to has Bijan as an author. That guy seems to be everywhere I stick my nose into (and sometimes chopping it off by pointing out my ignorance). Mindswap have done work for NASA in the past, so I'm not surprised to see that XACML-DL was a proposal from them.


The two that caught my attention in this session were Dave Beckett from Yahoo! and Tom Ilube of Garlik.

Dave Beckett is a name that has been around in the Semantic Web for a long time now. I'm sure anyone following this blog would have already heard of him. Personally, I associate him mostly with his Redland RDF library for storing and accessing RDF, the Raptor RDF syntax parser, and the Rasqal query library. Whenever I've gone long enough that I might have forgotten about Dave, a new version of one of these libraries shows up on the Semantic Web mailing list. The last one was an update of Redland and the Redland language bindings just 3 weeks ago. I've often considered using these libraries, but the fact that they are written in C is a problem when everything else you distribute is in Java. Still, it's supposed to be good stuff.

Dave's appearance was of interest to me, because of the position of Yahoo! vs. Google (I don't need to create a link for that second company, do I?). Google overwhelmingly have market share for web search and several other services, which will naturally force their competition to look for that competitive edge.

Interestingly, Google have always favored syntactic search, and have often denigrated the field of semantic search (especially the operation of semantic extraction). While they have a point that the field is in its infancy, I don't think it will stay that way forever. Google claims that they meet everything the market requires with their syntactic technologies, and from one perspective they do. However, the purely syntactic approach that Google employs is not suitable for every purpose. Links which are not on the first page of a Google search are almost never seen. I think the statistics claim that almost no-one ever goes beyond a few pages. Yet, I have seen many searches which return a lot of uninteresting results and the pages I know exist somewhere are impossible to find. It is only when I can remember (or guess) a word specific to the page in question that I can find these specific links, and even then they can be hidden in amongst many other unrelated pages.

People are happy to accept that picking the right search terms is the way to interact with Google, but I think that this is a result of being trained to think this way over the last 10 years. Certainly, 10 years ago Google was offering far and away the best search engine, with more relevant hits than anyone else. Over time, Google have done well to grow with the internet and deal with increasingly sophisticated and blatant attempts at manipulating their search results. However, computing capability has increased over time, and the internet has expanded exponentially. It seems reasonable that we should be able to consider new approaches.

While Google is offering an incredibly powerful service (one that I use all the time), it is still a service that would benefit from the inclusion of semantics. As the web continues to grow, the difficulty in finding relevant information will only get worse. Google have publicly been reluctant to work with the semantic web, which (at face value) appears naïve. On the other hand, Yahoo! have been willing to look at this area. Obviously the fledgling stage of the state of the art makes it impractical for general deployment, but there are still a lot of useful things you can do using RDF and OWL. The title of Dave's presentation said it all: A Little Semantic's Can Go a Long Way.

Dave's job as "Chief Semantic Yahoo" is to make a lot of these thing happen for Yahoo! They have several things happening already, and these will expand over time while new systems are also developed and deployed. It sounds like a lot of challenging (read: fun) work. Dave finished his short talk with a droll invitation to apply for a job. For about half a second I thought it was just supposed to be amusing, before I realized that the size and scalability of Yahoo! must require a lot of good people in this area, and they're probably quite hard to find. Unfortunately for Dave, this conference had a low ratio of developers present.


All on its own, the title of this talk was enough to make my pay attention: Billions of Triples and Counting. Geez, is everyone scaling this high now?

Tom's presentation didn't have a lot of technical detail to it, except to say that his business was reliant on having this much information, and after getting significant investment he'd managed to pull it off. The inspiring part for me was that Garlik was able to find that much venture capital in a field that until recently had not attracted much interest from the VC community. This and subsequent talks demonstrated that semantics are fast becoming a field that VCs find acceptable, and are even starting to actively pursue.

A point of interest was that Tom said that they'd raised 11 million pounds in funding, which just about knocked everyone off their chairs. I later heard that he misspoke, and that the figure was actually in dollars. Even so, that is an impressive accomplishment in a market that until recently knew nothing about semantic technology.

I was also interested to learn later that one of the reasons for the scalability of Garlik's triple store is that it is not indexed extensively. They only have a need for very simple queries, as opposed to the features of complex analysis that stores like Mulgara provide. With this in mind, the ability to scale that high makes a lot more sense.

Just because Garlik doesn't use extensive indexing, it doesn't mean Mulgara can't scale the same way too. It appears that we can doing just that in the foreseeable future, if we make appropriate use of clustering. I've been haranguing Brian with proposed architectures for clustering, and their scalability justifications. I hope to get all this together and blog about it in the coming days.


Anonymous said...

By the way, I have been looking up on the Internet and I have found some tools which are really cools to monitor the positioning of the competition, as well as seeing their tips and tricks. If you are interested, I advised to you have a look. It seems they are free: http://www.lineared.com/es/recuperar/en-datos-posiciones-google-msn-yahoo.htm

generic cialis 20mg said...

Hi, well be sensible, well-all described