Thursday, August 30, 2007

FOAF

Like David's post yesterday, there have been a number of discussions in recent months about the best practice for URIs that identify people. I've typically stayed out of the public debates, but have been involved in a number of offline conversations.

A popular approach to building these URIs is to configure an HTTP server such that when it receives a request for this URI it responds with an HTTP 303 (which means "See Other"). This lets the server respond with a document pertaining to that URI, but at the same time informs the client that this document is NOT the resolution of that URI. After all, the resolution of the URI is a person, and one can hardly respond with that (for a start, you'd need all that quantum state, and I haven't yet seen the internet protocols for quantum teleportation).

Another approach is to simply use the URI of a document describing the person, and tack an anchor onto the end. Typically this anchor is #me. Like the 303 approach, you can get a retrievable document that can be found from the person's URI, and again that document has a different URI to the URI of the person. The main problem cited with this second approach is that a #me anchor may exist in the document, meaning that the URI resolves to something other than the person (while I recently learned that URI ambiguity is not strictly illegal, it is a really bad idea. After all, we usually rely on these things to identify a unique thing). Other people suggest avoiding possible anchor ambiguity with a query (?key=value on the end of the URL). This is much less popular, and I'll let the public arguments against this stand for themselves.

While looking at the "303" approach the other day, I realized that both Safari and Firefox respond to a 303 as if it were a redirection. This makes sense in several ways. If a user has asked for "something" by address, then they'd like to see whatever data is associated with that address (as opposed to a response of "not here"). Also, the HTTP RFC says that this link should be followed. Even so, since the resulting document is NOT what was asked for, the user should at least be told that they are looking at the "Next Best Thing", rather than silently being redirected.

I came to all of this while updating my FOAF file the other day. While it is possible to describe all of your friends in minute detail, the normal practice is to include just enough information to uniquely identify them (plus a couple of things that are useful to keep locally, like the friend's name). Then when you and your friend's FOAF files are brought into the same store together, all that information will get linked up. This sounds great, until you realize that there is no defined way to find your friends' files. The various FOAF browsers, surfers, etc, that I've tried are all terrible at tracking down people's FOAFs, so whatever they're trying isn't working very well either.

Whether using anchor suffixes or 303s, the URI that people often use for themselves just happens to lead you to their own FOAF files. This would be the solution to the problem of finding your friends' files... if your friends happened to use this approach. While useful, it can't be relied upon for automatic FOAF file gathering. Because of this, I decided that I should try to put explicit links to all of my friends' FOAF URLs that I know about. This led me to tracking down the files of each of the people in my FOAF file (fortunately not many, as most of the people I know don't have a FOAF file), which had me following various 303 links, like the one to Tom's URI. I was using wget, which doesn't follow a "See Other" link automatically, and this was how I discovered that Tom was using a 303. I'm sure if I'd followed his URI with Firefox then I wouldn't have noticed the new address.

After following the links for all these people, I then wanted some way to describe the location of their FOAF in my own FOAF description of them. After some investigation of the FOAF namespace, I discovered that there is no specified way to do this. I suppose this is what led to the de facto standard that people have adopted where their person URI leads you (however indirectly) to their FOAF file. This actually makes perfect sense, as you don't want to invalidate people's links to you just because you chose to move the location of your file, but it's still annoying if you want to be able to link to other people's file. Perhaps everyone should get a PURL address?

The closest thing I could find to a property describing a FOAF file, is the more general <foaf:homepage>. This property lets you link a resource (like a person) to some kind of document describing that resource. This meets the criteria of what I was looking for, but it is also more general than I was after, as it can also be used to point to non-FOAF pages, like a person's home page (the original intent of this property). All the same, I went with it, since it was a valid thing to do. At least it will help any applications that I write to look at my own file. It's a shame that it's so manual.

While thinking about how to automate this process, it occurred to me that I could try the following:
  • If a person's URI ends in an anchor, then strip it off, and follow the URI. If the returned document is RDF then treat it as FOAF data (identifying RDF as being FOAF or not FOAF is another problem).
  • Follow the person's URI, and if the result is a 303, then follow that URI. If the resulting document is a RDF, the treat it as FOAF.
  • Iterate through each URI associated with the person (such as <foaf:homepage>) and if any of these return an RDF file then treat it as FOAF.
  • On each of the HTML pages returned from the previous iterations, check for <a href=...> tags to resources that don't end with .html, .jpg, .png, etc. If querying for any of these links returns an RDF file, then treat as FOAF.
Incidentally, Tom's FOAF file would only be picked up via the last message. You have to follow his URI to get a 303, which then leads you to his home page. Then on that page you'll find links to his FOAF file. Frankly, it was just easier to manually add a <foaf:homepage> tag to his file. :-)

Anachronism

During the various conversations I've had (mostly with Tom), it occurred to me that there is an underlying assumption that all URIs will be HTTP. This is particularly true for 303 responses, as this is an HTTP response code. However, nothing in RDF suggests that the protocol (or scheme, according to URI terminology) has to be HTTP. For instance, it isn't unheard of to find resources at the end of an ftp://... URL. It got me wondering how much it would break existing systems if the URIs used for and in a FOAF file were not in HTTP, but something different. If they handle anything else, then it's almost certain to be FTP (and possibly even HTTPS), so these weren't going to really test things. No, the protocol I chose was Gopher.

The GoFish server managed the details for me here, though it took me a bit of debugging to realize that it wasn't starting when it couldn't find a user/group of "gopher" on my system (Apple didn't retain that account on OS X. Go figure). Once I'd found that problem, it then took me a few minutes to discover that addresses for text file in the root are prefixed with 00/. But once that was done I was off and running.

I'm not a huge fan of running services from my home PC, so I can't say that I'll keep it up for a long time. But at the same time, it gives me some perverse pleasure to hand out my FOAF file as a gopher address. :-)

No comments: