Tuesday, July 27, 2004

N3 Parsing
AN pointed me to the class he'd been talking about, which is RDFSyntaxLoader. He'd given me the impression that I'd be finding a class that did a significant portion of the work, but when I saw the name I realised that this was not the case. Instead, it demonstrated the use of an IntFile as a map from anonymous node IDs to internal node IDs, and a StringToLongMap as a map from blank node names to internal nodes. These are important, as they can be disk-backed, which means that the maps are not bound by memory so they can scale.

If I'm only having to load anonymous nodes from our own N3 files then I don't need the name mapping, though the IntFile map is still needed. If I'm loading more general N3 files then I'll need to map by name, but it will also depend on whether I can spot anonymous node formats.

I'm just using a regex to parse the first part of each line. It's easy enough to pick everything out from the < and > characters. Unescaping text is reasonably straight forward as well, particularly as many escaped characters are only legal for literals, meaning that they don't affect subjects and predicates. What I haven't done yet is the last part of the line, which can be either a resource or a literal. It shouldn't be too hard, given that the first and last characters tell you exactly what the type is (a resource or a literal).

Syntax and Semantics
I spent a little time responding to suggestions from AN about how we implement a NOT operator in iTQL. There's the semantic issue of negation vs. and inverse, and that is still being argued. SR pointed out that depending on the semantic we choose, then the word "not" may not even be appropriate.

There's also the syntactic issue of how to express the construct in iTQL. Should it be a unary operator, preceding constraints in brackets, or should it be a "magical" predicate like <tucana:is>? Of course, different syntactic choices imply different semantics, so this question is not completely divorced from the first. Until semantics are determined then syntax can't be decided either.

You'd think that this would mean the semantics should be settled first, and then syntax decided, but these things are never quite that straight forward. People often like to conceptualise a semantic in terms of some expression, ie. the syntax. So it can be very difficult to pin down exactly what semantic is to be used without expressing it in some kind of formal language. Instead of introducing something as formal as predicate logic, it makes more sense for most people to think of the semantics in terms of the iTQL which is likely to represent it. So the semantics and syntax end up getting decided in parallel.

AN has lots of suggestions for both, and I spent a little time giving him feedback on it.

Short Entry
I know it's not much, but that's it. I didn't get as much done as I'd have liked today as I had to spend some time with a physiotherapist treating an injury, and it took longer than I anticipated.

No comments: