Friday, September 10, 2004

Horrifying Accounts
I just read the latest blurb from Telstra on their online services. One of them is online space to set up your own web page. If I want to set up a web page with them then I can have a space allocation ranging from 5MB for $2.95 a month, up to 20MB for $19.95 a month.

To prevent people from putting up large files (like home movies) and having people download them all the time, Telstra have imposed limits on the amount of data that may be downloaded (100MB for the 5MB account, up to 300MB for the 20MB account). And in the fine print at the bottom of the page: "A per-MB fee will be charged if this allowance is exceeded."

Great. So if I don't like someone all I have to do is create a script to repeatedly download pages from their site. Give me long enough, and I could rack up a massive Telstra bill for someone.

I think that it is always a mistake to charge customers based on actions which they have no control over.

Kowari Tests
Everything was working smoothly today, and so it was all checked in. I spent some time writing some documentation for the code, and also on the ideas I have for writing entailment Horn clauses as RDF. The more I think about the idea, the more I like it. A set of ontology inferences can be written as an RDF document, and stored as a model. Then one of these ontology models can be applied to a model of base statements to create a model of inferred statements. It all seems rather clean.

The one thing I'm trying to decide, is how much structure should be in the iTQL used to select the data, and how much should be in RDF. For instance, a constraint conjunction could be a collection of constraints, with a "Conjunction" property, while disjunctions could be stored in a similar collection having a "Disjunction" property. The clause itself would be an anonymous node, which then describe the body property using collections like this, and the head as a separate property.

This would work fine, but we already have a parser which can construct WHERE clauses in this way from an iTQL string. Is there really a good reason to avoid storing the entire iTQL string as an RDF literal? The most compelling argument I have is that it is far from elegant to mix data types in this way. A file should hold one format of data only, and not embed another format within it. This is one of the reasons I dislike the current Drools file. It holds a significant amount of iTQL, which is in turn embedded in a string in some Java code, which is really part of an XML element. Nasty.

Back to TKS
I was waiting to talk to DM about distributed queries for TKS, but he was unavailable today. Since distributed queries is a TKS feature, I decided to run the full test suite for TKS to confirm that everything was at a stable point for me to move forward.

Unfortunately the TKS tests are failing in all sorts of places. So I agreed to see if there was anything in there that I could hopefully fix.

Shortly before leaving this afternoon I spotted something interesting. One of the exception traces complained about an I/O stream which could not be opened for a fake URL (http://mydomain.com/). It turned out that this domain is in a Node from a statement, and retrieving this node causes the system to try and open the URL and read it in as an RDF file. SR likened this to Lucene code, but this code is not in the Lucene areas.

I hope it won't take me too long to work it out on Monday.

SANE and TWAIN
Last night I managed to get SANE and the TWAIN-SANE driver going for Anne's and my Macs. I have SANE running on my Linux box with the scanner attached, so now we have network access to the scanner.

It wasn't as easy as I'd hoped. While SANE works fine on my Linux server, I was initially unable to see it with my Mac. I started using large amounts of logging for the "net" driver, and I was able to see that the server reported no scanners to be available when being accessed through the net. Everything seemed to be set up correctly, so I started looking through the SANE FAQs. It turned out that saned needed to have permission to access the scanner. This meant changing the group and permissions on the device node for the scanner to match those of saned. This also needed an update to the /etc/hotplug/usb/libusbscanner file to set the correct group.

Now it works, but there have been 2 problems. The first is that it is not quite stable. Because the shared library is loaded into another program (like Photoshop or Word) then a failure there causes the host program to quit, an event which has occurred several times now. The second is the preview function. It lets me zoom in on a preview, and to set a selection, but it seems to be impossible to zoom back out. Even when a program is restarted, the zoomed in area is remembered, and the larger picture cannot be seen.

I'm thinking of giving up and using a different application front end. So far I haven't been able to get xsane to compile for OSX, so I'm considering checking out Scanlite, as it is Java based.

MythTV
In an effort to conserve hard drive space, I've been converting my saved videos recorded by MythTV over to a DivX format. This gives a space saving of about 80% even when I have a very high bit rate set. Another advantage is that the files are readable by more decoders. I'm using Mencoder, which is my favourite transcoder.

Since source files are stored with file names based on channel and date, the MySQL database used by MythTV needs to be checked for this info. Similarly, when I go to clear out the disk space of the original file, I need to remove it from the data store.

My code to do this is a major hack, and written heavily for my own system. But if anyone is interested, then please feel free to use it. Just remember to make sure you have the MySQL DBI installed for Perl, and set up the transcoding line for your own system.

No comments: