After being aware of it for along time, today I finally had a brief look at Oracle's RDF support in 10g. I was really hoping that Oracle would bring their experience in developing database structures to the RDF domain, creating something very fast and scalable.
Unfortunately, it appears that 10g implements RDF as a schema in a set of standard relational tables, and have wrapped access to the system within Sesame interfaces, including SAIL. There's nothing wrong with building a Sesame system with an Oracle backend, rather than MySQL, but this isn't the Oracle system that I was hoping for. It doesn't bring in the extra efficiencies needed to make RDF really move. After all, RDF has a strict shape to the data, while a RDBMS needs to handle data of all kinds of shapes. This is why Kowari has such fast load times (when configured correctly).
Interestingly, the paper which describes the details of the system references Kowari. I was surprised at this when I read marketing phrases like:
Oracle Spatial10g release 2 introduces the industry's first open, scalable, secure and reliable data management platform for RDF-based applications.
Kowari was open, scalable, and reliable (to the best of my knowledge), and TKS was secure (one of the reasons for buying the commercial system over the Open Source Kowari).
Another quote says:
A key feature of RDF storage in Oracle is that nodes are stored only once - regardless of the number of times they participate in triples.
The wording here suggests that Oracle is unusual in this regard, but almost all the RDF systems I am aware of share this feature, including Kowari. Perhaps the multiple indexing in Kowari caused some confusion here?
It appears that RDF support by Oracle has been implemented at the highest layers by RDBMS programmers. While it undoubtedly works, I'm disappointed that they haven't implemented RDF at a lower level. Still, it would be interesting to see how many triples a second it can load.
Now that I've discussed the principle behind phases, the next step is to discuss the classes which support this process. At the lowest level this is the class called
FreeList. This class manages resources which are allocated and/or released in various phases. It enables new resources to be created as needed, and freed resources to be re-used efficiently.
The kinds of resources managed by the
FreeList class are all the fixed-length records within data files, and also the numeric identifiers used for RDF nodes. The name of the class is an historical holdover from when it was simply used to hold a list of items which had been allocated and then freed. It does a lot more now.
FreeList sits over the top of two other classes called
IntFile, both of which are relatively easy to describe. Unfortunately,
FreeList itself will take me some time to write about, and a late night is not the time to start. So I'll get into it in my next entry.
Tuesday, February 14, 2006