Sunday, December 08, 2013

DDD on Mavericks

Despite everything else I'm trying to get done, I've been enjoying some of my time working on Gherkin. However, after merging in a new feature the other day, I caused a regression (dammit).


Until now, I've been using judiciously inserted calls to echo to trace what has been going on in Gherkin. That does work, but it can lead to bugs due to interfering with return values from functions, needs cleanup, needs the code to be modified each time something new needs to be inspected, and can result in a lot of unnecessary output before the required data shows up. Basically, once you get past a particular point, you really need to adopt a more scalable approach to debugging.

Luckily, Bash code like Gherkin can be debugged with bashdb. I'm not really familiar with bashdb, so I figured I should use DDD to drive it. Like most Gnu projects, I usually try to install them with Fink, and sure enough, DDD is available. However, installation failed, with an error due to an ambiguous overloaded + operator. It turns out that this is due to an update in the C++ compiler in OSX Mavericks. The code fix is trivial, though the patch hasn't been integrated yet.

Downloading DDD directly and running configure got stuck on finding the X11 libraries. I could have specified them manually, but I wasn't sure which ones fink likes to use, and the system has several available (some of them old). The correct one was /usr/X11R6/lib, but given the other dependencies of DDD I preferred to use Fink. However, until the Fink package gets updated then it won't compile and install on its own. So I had figured I should try to tweak Fink to apply the patch.

Increasing the verbosity level on Fink showed up a patch file that was already being applied from:
/sw/fink/dists/stable/main/finkinfo/devel/ddd.patch

It looks like Fink packages all rely on the basic package, with a patch applied in order to fit with Fink or OS X. So all it took was for me to update this file with the one-line patch that would make DDD compile. One caveat is that Fink uses package descriptions that include checksums for various files, including the patch file. My first attempt at using the new patch reported both the expected checksum and the one that was found, so that made it easy to update the .info file.

If you found yourself here while trying to get Fink to install DDD, then just use these 2 files to replace the ones in your system:

If you have any sense, you'll check that the patch file does what I say it does. :)

Note that when you update your Fink repository, then these files should get replaced.

Sunday, March 03, 2013

Clojurescript and Node.js

For a little while now I've been interested in JavaScript. JS was a strong theme at Strangeloop 2011, where it was being touted as the next big Virtual Machine. I'd only had a smattering of exposure to JS at that point, so I decided to pick up Douglas Crockford's JavaScript: The Good Parts, which I quite enjoyed. I practiced what I learned on Robusta, which is both a SPARQL client library for JS, and simple UI for talking to SPARQL endpoints. It was a good learning exercise and it's proven to be useful for talking to Jena. Since then, I've thought that it might be interesting to write some projects in JS, but my interests are usually in data processing rather than on the UI front end. So the browser didn't feel like the right platform for me.

When ClojureScript came out, I was interested, but again without a specific UI project in mind I kept putting off learning about it.

At this point I'd heard about node.js, but had been misled into thinking that it was for server-side JS, which again didn't seem that interesting when the server offered so many interesting frameworks to work with (e.g. Rails).

This last Strangeloop had several talks about JS, and about the efficiency of the various engines, despite having to continue supporting some unfortunate decisions early in the language's design (or lack of design in some cases). I came away with a better understanding of how fast this system can be and started thinking about how I should try working with it.

But then came Clojure/conj last year and it all gelled.

When I first heard Chris Granger speaking I'll confess that I didn't immediately see what was going on. He was targeting Node.js for Light Table, but as I said, I didn't realize what Node really was. (Not to mention that the social aspects of the conj had me a little sleep deprived). So it wasn't until after his talk that I commented to a colleague (@gtrakGT) that it'd be great it there was a JS engine that wasn't attached to a browser, but still had a library that let you do real world things. In retrospect, I feel like an idiot.  :-)

So I started on Node.

ClojureScript with Node.js

While JS has some nice features, in reality I'd much rather write Clojure, meaning that I finally had a reason to try ClojureScript. My first attempts were a bit of a bumpy ride, since I had to learn how to target ClojureScript to Node, how to access functions for getting program arguments, how to read/write with Node functions. Most of all, I had to learn that if I accidentally used a filename ending in .clj instead of .cljs then the compiler would silently fail and the program would print bizarre errors.

All the same, I was impressed with the speed of starting up a ClojureScript program. I found myself wondering about how fast various operations were running, in comparison to Clojure on the JVM. This is a question I still haven't got back to, but it did set me off in some interesting directions.

While driving home after the conj, the same colleague who'd told me how wrong I'd been about Node.js asked me about doing a simple-yet-expensive calculation like large factorials. It was trivial in Clojure, so I tried it with ClojureScript, and immediately discovered that ClojureScript uses JavaScript's numbers. These are actually floats, but for integers it means that they support 53 bits. Clojure automatically expands large number into the BigInteger class from Java, but JavaScript doesn't have the same thing to work with.

At that point I should have considered searching for BigIntegers in JavaScript out there on the web somewhere, but I was curious about how BigIntegers were implemented, so I started looking the code for java.math.BigInteger and seeing if it would be reproduced in Clojure (using protocols, so I can make objects that look a little like the original Java objects). The bit manipulations started getting tricky, and I put it aside for a while.

Then recently I had a need to pretty-print some JSON and EDN. I'd done it in Clojure before, but I used the same process on the command line, and starting the JVM is just too painful for words. So I tried writing it in JS for node, and was very happy with both the outcome and the speed of it. This led me back to trying the same thing in ClojureScript.

ClojureScript Hello World Process

The first thing I try doing with ClojureScript and Node is getting a Hello World program going. It had been months since I'd tried this, so I thought I should try again. Once again, the process did not work smoothly so I thought I'd write it down here.

Clojure can be tricky to use without Leiningen, so this is typically my first port of call for any Clojure-based project. I started out with "lein new hello" to start the new project. This creates a "hello" directory, along with various standard sub-directories to get going.

The project that has been created is a Clojure project, rather than ClojureScript, so the project file needs to be updated to work with ClojureScript instead. This means opening up project.clj and updating it to include lein-cljsbuild along with the configuration for building the ClojureScript application.

The lein-cljsbuild build system is added by adding a plugin, and a configuration to the project. I also like being able to run "lein compile" and this needs a "hook" to be added as well:

  :hooks [leiningen.cljsbuild]
  :plugins [[lein-cljsbuild "0.3.0"]]
  :cljsbuild {
    :builds [{
      :source-paths ["src/hello"]
      :compiler { :output-to "src/js/hello.js"
                  :target :nodejs
                  :optimizations :simple
                  :pretty-print true }}]}

Some things to note here:
  • The source path (a sequence of paths, but with just one element here) is where the ClojureScript code can be found.
  • The :output-to can be anywhere, but since it's source code and I wanted to look at it, I put it back into src, albeit in a different directory.
  • The :target has been set to :nodejs. Without this the system won't be able to refer to anything in Node.js.
  • The :optimizations are set to :simple. They can also be set to :advanced, but nothing else. (More on this later).
To start with, I ignored most of this and instead got it all running at the REPL. That meant setting up the dependencies in the project file to include cljs-noderepl:

  :dependencies [[org.clojure/clojure "1.5.0"]
                 [org.clojure/clojurescript "0.0-1586"]
                 [org.bodil/cljs-noderepl "0.1.7"]]

(Yes, Clojure 1.5.0 was released last Friday!)

Using the REPL requires that it be started with the appropriate classpath, which lein does for you when the dependencies are set up and you use the "lein repl" command. Once there, the ClojureScript compiler just needs to be "required" and then the compiler can be called directly, with similar options to those shown in the project file:

user=> (require 'cljs.closure)
nil
user=> (cljs.closure/build "src/hello" {:output-to "src/js/hello.js" :target :nodejs :optimizations :simple})
nil

Once you can do all of this, it's time to modify the source code:
  (ns hello.core)

  (defn -main []
    (println "Hello World"))

  (set! *main-cli-fn* -main)

The only trick here is setting *main-cli-fn* to the main function. This tells the compiler which function to run automatically.

The project starts with a core.clj file, which is what you'll end up editing. Compiling this appears to work fine, but when you run the resulting javascript you get an obscure error. The fix for this is to change the source filename to core.cljs. Once this has been changed, the command line compiler (called with "lein compile" will tell you which files were compiled (the compiler called from the REPL will continue to be silent). If the command line compiler does not mention which files were compiled by name, then they weren't compiled.

I wanted to get a better sense of what was being created by the compiler, so I initially tried using optimization options of :none and :whitespace, but then I got errors of undefined symbols when I tried to run the program. I reported this as a bug, but was told that this was known and an issue with Google's Closure tool (which ClojureScript uses). The :simple optimizations seem to create semi-readable code though, so I think I can live with it.

Interestingly, compiling created a directory called "out" that contained a lot of other code generated by the compiler. Inspection showed several files, including a core.cljs file that carries the Clojure core functions. I'm presuming that this gets fed into the clojurescript compiler along with the user program. The remaining files are mostly just glue for connecting things together. For instance, nodejscli.cljs contains:

(ns cljs.nodejscli
  (:require [cljs.nodejs :as nodejs]))

; Call the user's main function
(apply cljs.core/*main-cli-fn* (drop 2 (.-argv nodejs/process)))

This shows what ends up happening with the call to (set! *main-cli-fn* -main) that was required in the main program.

Where Now?

Since Node.js provides access to lots of system functions, I started to wonder just how far I could push this system. Browsing the Node.js API I found functions for I/O and networking, so there seems to be some decent scope in there. However, since performance is really important to V8 (which Node.js is built from) then how about fast I/O operations like memory mapping files?

I was disappointed to learn that there are in fact serious limits to Node.js. However, since the engine is native code, there is nothing to prevent extending it to do anything you want. Indeed, using "require" in JavaScript will do just that. It didn't take much to find a project that wraps mmap, though the author freely admits that it was an intellectual exercise and that users probably don't want to use it.

Using a library like mmap in ClojureScript was straight forward, but showed up another little bug. Calling "require" from JavaScript returns a value for the object containing the module's functions. Constant values in the module can be read using the Java-interop operation. So to see the numeric value of PROT_READ, you can say:


(defn -main []
  (let [m (js/require "mmap")]
    (println "value: " m/PROT_READ)))


However, a module import like this seems to map more naturally to Clojure's require, so I tried a more "global" approach and def'ed the value instead:


(def m (js/require "mmap"))
(defn -main []
  (println "value: " m/PROT_READ))


However, this leads to an error in the final JavaScript. Fortunately, the . operator will work in this case. This also leads to one of the differences with Clojure. Using the dot operator with fields requires that the field name be referenced with a dash leader:


(def m (js/require "mmap"))
(defn -main []
  (println "value: " (.-PROT_READ m)))


Finally, I was able to print the bytes out of a short test file with the following:

(ns pr.core
  (:use [cljs.nodejs :only [require]]))

(def u (require "util"))
(def fs (require "fs"))
(def mmap (js/require "mmap"))

(defn mapfile [filename]
  (let [fd (.openSync fs filename "r")
        sz (.-size (.fstatSync fs fd))]
    (.map mmap sz (.-PROT_READ mmap) (.-MAP_SHARED mmap) fd 0)))

(defn -main []
  (let [buffer (mapfile "data.bin")
        sz (alength buffer)]
   (println "size of file: " sz)
   (doseq [i (range sz)]
     (print " " (aget buffer i)))
   (println))) 

(set! *main-cli-fn* -main)

Since I've now seen that it's possible, it's inspired me to think about reimplementing Mulgara's MappedBlockFile and AVLNode classes for Clojure and ClojureScript. That would be a good opportunity to get a BTree implementation coded up as well.

I've been thinking of redoing these things for Clojure ever since starting on Datomic. Mulgara's phased trees are strikingly like Datomic's, with one important exception - in Mulgara we went to a lot of effort to reap old nodes for reuse. This is complex, and had a performance cost. It was important 13 years ago when we first started on it, but things have changed since then. More recent implementations of some of Mulgara have recognized that we don't need to be so careful with disk space any more, but Rich had a clearer insight: not only can we keep older transactions, but we should. As soon as I realized that, then I realized it would be easy to improve performance dramatically in Mulgara. However, to make it worthwhile we'd have to expose the internal phase information in the same way that Datomic does.

Unfortunately, Mulgara is less interesting to me at the moment, since it's all in Java, which is why I'm moving to re-implement so much RDF work in Clojure at the moment. A start to that can be found in crg, crg-turtle, and kiara. Mulgara isn't going away... but it will get modernized.

More in ClojureScript

Family is calling, so I need to wrap this post up. Funnily enough, I didn't get to write about the thing that made me think to blog in the first place. That's bit manipulations.

You'll recall that I mentioned the lack of a BigInteger in ClojureScript (since it's a Java class). As an exercise in doing more in ClojureScript, and learning about how BigInteger is implemented, I started trying to port this class in Clojure (and ultimately, ClojureScript). The plumbing is easy enough, as I can just use a record and a protocol, but some of the internals have been trickier. That's because BigInteger packs the numbers into an array of bits, that is in turn represented with an int array. More significantly, BigInteger does lots of bit manipulation on these ints.

Bit manipulation is supported in JavaScript and ClojureScript, but with some caveats. One problem is that the non-sign-extending right-shift operation (>>>) is not supported. I was surprised to learn that it isn't supported in Clojure either (this seems strange to me, since it would be trivial to implement). The bigger problem is that numbers are stored as floating point values. Fortunately, numbers of 32 bits or smaller can be treated as unsigned integers. However, integers can get larger than this limit, and if that happens, then some of the bit operations stop working. It's possible to do bit manipulation up to 53 bits, but signed values won't work as expected in that range, so it's basically off the table.

Anyway, I have a lot to learn yet, and a long way to go to even compare how a factorial in ClojureScript will compare to the same operation in Clojure, but in the meantime, it's fun. Searching for Node.js modules has shown a lot of early exploration is being performed by people from all over. Some of the recent threading modules are a good example of this.

I have a feeling that Node.js will get more important as time goes on, and with it, ClojureScript.

Tuesday, August 28, 2012

Tnetstrings

A Forward Reference

Having had a couple of responses to the last post, I couldn't help but revisit the code. There's not much to report, but since I've had some feedback from a few people who are new to Clojure I thought that there were a couple of things that I could mention.

First, (and you can see this in the comments to the previous post) I wrote my original code in a REPL and pasted it into the blog. Unfortunately, this caused me to miss a forward reference to the parse-t function. On my first iteration of the code, I wasn't trying to parse all the data types, to the map of parsers didn't need to recurse into the parse-t function. However, when I updated the map, the  parse-t function had been fully defined, so the references worked just fine.

Testing

That brings me to my second and third points: testing and Leiningen. As is often the case, I found the issues by writing and running tests. Setting up an environment for tests can be annoying for some systems, particularly for such a simple function. However, using Leiningen makes it very easy. The entire project was built using Leiningen, and was set up with the simple command:
  lein new tnetstrings.
I'll get on to Leiningen in a moment, but for now I'll stick with the tests.

Clojure tests are easy to set up and use. They are based on a DSL built out of a set of macros that are defined in clojure.test. The two main macros are deftest and is. deftest is used to define a test in the same way that defn is used to define a function, sans a parameter definition. In fact, a test is a function, and can be called directly (it takes no parameters). This is very useful to run an individual test from a REPL.

The other main macro is called "is" and is simply used to assert the truth of something. This macro is used inside a test.

My tests for tnetstrings are very simple:

(ns tnetstrings.test.core
  (:use [tnetstrings.core :only [parse-t]]
        [clojure.test]))

(deftest test-single
  (is (= ["hello" ""] (parse-t "5:hello,")))
  (is (= [42 ""] (parse-t "2:42#")))
  (is (= [3.14 ""] (parse-t "4:3.14^")))
  (is (= [true ""] (parse-t "4:true!")))
  (is (= [nil ""] (parse-t "0:~"))))

(deftest test-compound
  (is (= [["hello" 42] ""] (parse-t "13:5:hello,2:42#]")))
  (is (= [{"hello" 42} ""] (parse-t "13:5:hello,2:42#}")))
  (is (= [{"pi" 3.14, "hello" 42} ""] (parse-t "25:5:hello,2:42#2:pi,4:3.14^}"))))


Note that I've brought in the tnetstrings.core namespace (my source code), and only referenced the parse-t function. I always try to list the specific functions I want in a use clause, though I'm not usually so particular when writing test code. You'll also see clojure.test. As mentioned, this is necessary for the deftest and is macros. It is worth pointing out that both of these use clauses were automatically generated for me by Leiningen, along with a the first deftest.

I could have created a convenience function that just extracted the first element out of the returned tuple, thereby making the tests more concise. However, I intentionally tested the entire tuple, to ensure that nothing was being left at the end. I ought to create a string with some garbage at the end as well, to see that being returned, but the array and map tests have this built in... and I was being lazy.

Something else that caught me out was that when I parse a floating point number, I did it with java.lang.Float/parseFloat. This worked fine, but by default Clojure uses double values instead, and all floating point literals are parsed this way. Consequently the tests around "4:3.14^" failed with messages like:

expected: (= [3.14 ""] (parse-t "4:3.14^"))
  actual: (not (= [3.14 ""] [3.14 ""]))

What isn't being shown here is that the two values of 3.14 have different types (float vs. double). Since Clojure prefers double, I changed the parser to use java.lang.Double/parseDouble and the problem was fixed.

Leiningen

For anyone unfamiliar with Leiningen, here is a brief rundown of what it does. By running the new command Leiningen sets up a directory structure and a number of stub files for a project. By default, two of these directories are src/ and test. Under src/ you'll find a stub source file (complete with namespace definition) for the main source code, and under test/ you'll find a stub test file, again with the namespace defined, and with clojure.test already brought in for you. In my case, these two files were:

  • src/tnetstrings/core.clj
  • test/tnetstrings/test/core.clj

To get running, all you have to do is put your code into the src/ file, and put your tests into the test/ file. Once this is done, you use the command:
  lein test
to run the tests. Clojure gets compiled as it is run, so any problems in syntax and grammar can be found this way as well.

However, one of the biggest advantages to using this build environment, is the ease of bringing in libraries. Using Leiningen can be similar to using Maven, without much of the pain, and indeed, Leiningen even offers a pom command to generate a Maven POM file. It automatically downloads packages from both Clojars and Maven repositories, so this feature alone makes it valuable.

Leiningen is configured with a file called project.clj which is autogenerated when a project is created. This file is relatively easy to configure for simple things, so rather than delving into it here, I'll let anyone new to the system go the project page and sample file to learn more about it.

project.clj also works for some not-so-simple setups, but it gets more and more difficult the fancier it gets. It's relatively easy to update the source path, test path, etc, to mimic Maven directory structures, which can be useful, since the Maven structure allows different file types (e.g. Java sources, resources) to be stored in different directories. But since I always want this, it's annoying that I always have to manually configure it.

I'm also in the process of copying Alex Hall's setup for pre-compiling Antlr parser definitions so that I can do the same with Beaver. Again, it's great that I can do this with Leiningen, but it's annoying to do so. I shouldn't be too harsh though, as the way that extensions are done look more like they are derived from the flexibility of Clojure than Leiningen itself.

Wednesday, August 22, 2012

Clojure DC

Tonight was the first night for Clojure DC, which is a meetup group for Clojure users. It's a bit of a hike for me to get up there, but I dread getting too isolated from other professionals, so I decided it was worth making the trip despite the distance and traffic. Luckily, I was not disappointed.

Although I was late (have you ever tried to cross the 14th Street Bridge at 6pm? Ugh) not much had happened beyond som pizza consumption. The organizers, Matt and Chris, had a great venue to work with, and did a good job of getting the ball rolling.

After introductions all around, Matt and Chris gave a description of what they're hoping to do with the group, prompted us with ideas for future meetings, and asked for feedback. They suggested perhaps doing some Clojure Koans, and in that spirit they provided new users with an introduction to writing Clojure code (not an intro to Clojure, but an intro to writing code), by embarking on a function to parse tnetstrings. I'd never heard of these, but they're a similar concept to JSON, only the encoding and parsing is even simpler.

This part of the presentation was fun, since Matt and Chris had a banter that was reminiscent of Daniel Friedman and William Byrd presenting miniKanren at Clojure/Conj last year. While writing the code they asked for feedback, and I was pleased to learn a few things from some of the more experienced developers who'd shown up (notably, Relevance employee Craig Andera, and ex-Clojure.core developer, and co-author of my favorite Clojure book, Michael Fogus). For instance, while I knew that maps operate as functions where they look up an argument in themselves, I did not know that they can optionally accept a "not-found" parameter like clojure.core/get does. I've always used "get" to handle this in the past, and it's nice to know I can skip it.

While watching what was going on, I decided that a regex would work nicely. So I ended up giving it a go myself. The organizers stopped after parsing a string and a number, but I ended up doing the lot, including maps and arrays. Interestingly, I decided I needed to return a tuple, and after I finished I perused the reference Python implementation and discovered that this returned the same tuple. Always nice to know when you're on the right track. :-)

Anyway, my attempt looked like:

(ns tnetstrings.core)

(def type-map {\, identity
               \# #(Integer/parseInt %)
               \^ #(Float/parseFloat %)
               \! #(Boolean/parseBoolean %)
               \~ (constantly nil)
               \} (fn [m] (loop [mp {} remainder m]
                            (if (empty? remainder)
                              mp
                              (let [[k r] (parse-t remainder)
                                    [v r] (parse-t r)]
                                (recur (assoc mp k v) r)))))
               \] (fn [m] (loop [array [] remainder m]
                            (if (empty? remainder)
                              array
                              (let [[a r] (parse-t remainder)]
                                (recur (conj array a) r)))))})

(defn parse-t [msg]
  (if-let [[header len] (re-find #"([0-9]+):" msg)]
    (let [head-length (count header)
          data-length (Integer/parseInt len)
          end (+ data-length head-length)
          parser (type-map (nth msg end) identity)]
      [(parser (.substring msg head-length end)) (.substring msg (inc end))])))

There are lots of long names in here, but I wasn't trying to play "golf". The main reason I liked this was because of the if-let I introduced. It isn't perfect, but if the data doesn't start out correctly, then the function just returns nil without blowing up.

While this worked, it was bothering me that both the array and the map forms looked so similar. I thought about this in the car on the way home, and I recalled the handy equivalence:

(= (assoc m k v) (conj m [k v]))

So with this in hand, I had another go when I got home:

(ns tnetstrings.core)

(defn embedded [s f]
  (fn [m] (loop [data s remainder m]
            (if (empty? remainder)
              data
              (let [[d r] (f remainder)]
                (recur (conj data d) r))))))

(def type-map {\, identity
               \# #(Integer/parseInt %)
               \^ #(Float/parseFloat %)
               \! #(Boolean/parseBoolean %)
               \~ (constantly nil)
               \} (embedded {} (fn [m] (let [[k r] (parse-t m)
                                             [v r] (parse-t r)]
                                         [[k v] r])))
               \] (embedded [] (fn [m] (let [[a r] (parse-t m)]
                                         [a r])))})

(defn parse-t [msg]
  (if-let [[header len] (re-find #"([0-9]+):" msg)]
    (let [head-length (count header)
          data-length (Integer/parseInt len)
          end (+ data-length head-length)
          parser (type-map (nth msg end) identity)]
      [(parser (.substring msg head-length end)) (.substring msg (inc end))])))

So now each of the embedded structures is based on a function returned from "embedded". This contains the general structure of:

  • Seeing if there is anything left to parse.
  • If not, then return the already parsed data.
  • If so, then parse it, and add the parsed data to the structure before repeating on the remaining string to be parsed.
In the case of the array, just one element is parsed by re-entering the main parsing function. The result is just the returned data. In the case of the map, the result is a key/value tuple, obtained by re-entering the parsing function twice. By wrapping the key/value like this we not only get to return it as a single "value", but it's also in the form required for the conj function that is used on the provided data structure (vector or map).

The result looks a little noisy (lots of brackets and parentheses), but I think it abstracts out the operations much better. Exercises like this are designed to help you think about problems the right way, so I think it was a great little exercise.

Other than this code, I also got the chance to chat with a few people, which was the whole point of the trip. It's getting late, so I won't go into those conversations now, but I was pleased to hear that many of them will be going to Clojure/Conj this year.

Tuesday, May 22, 2012

Clojure Lessons

Recently I've been working with Java code in a Spring framework. I'm not a big fan of Spring, since the bean approach means that everything has a similar public face, which means that the data types don't document the system very well. The bean approach also means that most types can be plugged into most places (kind of like Lego), but just because something can be connected doesn't mean it will do anything meaningful. It can make for a confusing system. As a result, I'm not really have much fun at work.

To get myself motivated again, I thought I'd try something fun and render a Mandelbrot set. I know these are easy, but it's something I've never done for myself. I also thought it might be fun to do something with graphics on the JVM, since I'm always working on server-side code. Turned out that it was fun, keeping me up much later than I ought to have been. Being tired tonight I may end up rambling a bit. It may also be why I've decided to spell "colour" the way I grew up with, rather than the US way (except in code. After all, I have to use the Color class, and it's just too obtuse to have two different spellings in the same program).

To get my feet wet, I started with a simple Java application, with a plan to move it into Clojure. My approach gave me a class called Complex that can do the basic arithmetic (trivial to write, but surprising that it's not already there), and an abstract class called Drawing that does all of the Window management and just expects the implementing class to implement paint(Graphics). With that done it was easy to write a pair of functions:

  • coord2Math to convert a canvas coordinate into a complex number.
  • mandelbrotColor to calculate a colour for a given complex number (using a logarithmic scale, since linear shows too many discontinuities in colour).
Drawing this onto a graphical context is easy in Java:
for (int x = 0; x < gWidth; x++) {
  for (int y = 0; y < gHeight; y++) {
    g.setColor(mandelbrotColor(coord2Math(x, y)));
    plot(g, x, y);
  }
}

(plot(Graphics,int,int) is a simple function that draws one pixel at the given location).

A small image (300x200 pixels) on this MacBookPro takes ~360ms. A big one (1397x856) took ~11500ms. Room for improvement, but it'll do. So with a working Java implementation in hand, I turned to writing the same thing in Clojure.

Clojure Graphics

Initially I tried extending my Drawing class using proxy, with a plan of moving to an implementation completely in Clojure. However, after getting it working that way I realized that doing the entire thing in Clojure wasn't going to take much at all, so I did that straight away. The resulting code is reasonably simple and boilerplate:

(def window-name "Mandelbrot")
(def draw-fn)

(defn new-drawing-obj []
  (proxy [JPanel] []
    (paint [^Graphics graphics-context]
      (let [width (proxy-super getWidth)
            height (proxy-super getHeight)]
        (draw-fn graphics-context width height)))))

(defn show-window []
  (let [^JPanel drawing-obj (new-drawing-obj)
        frame (JFrame. window-name)]
    (.setPreferredSize drawing-obj (Dimension. default-width default-height))
    (.add (.getContentPane frame) drawing-obj)
    (doto frame
      (.setDefaultCloseOperation JFrame/EXIT_ON_CLOSE)
      (.pack)
      (.setBackground Color/WHITE)
      (.setVisible true))))

(defn start-window []
  (SwingUtilities/invokeLater #(show-window)))

Calling start-window sets off a thread that will run the event loop and then call the show-window function. That function uses new-drawing-obj to create a proxy object that handles the paint event. Then it sets the size of panel, puts it into a frame (the main window), and sets up the frame for display.

The only thing that seems worth noting from a Clojure perspective is the proxy object returned by new-drawing-obj. This is simple extension of java.swing.JPanel that implements the paint(Graphics) method of that class. Almost every part of the drawing can be done in an external function (draw-fn here), but the width and height are obtained by calling getWidth() and getHeight() on the JPanel object. That object isn't directly available to the draw-fn function, nor is it available through a name like "this". The object is returned from the proxy function, but that's out of scope for the paint method to access it. The only reasonable way to access methods that are inherited in the proxy is with the proxy-super function (I can think of some unreasonable ways as well, like setting a reference to the proxy, and using this reference in paint. But we won't talk about that kind of abuse).

While I haven't shown it here, I also wanted to close my window by pressing the "q" key. This takes just a couple of lines of code, whereby a proxy for KeyListener is created, and then added to the frame via (.addKeyListener the-key-listener-proxy). Compared to the equivalent code in Java, it's strikingly terse.

Rendering

The Java code for rendering used a pair of nested loops to generate coordinates, and then calculated the colour for each coordinate as it went. However, this imperative style of coding is something to explicitly avoid in any kind of functional programming. So the question for me at this point, was how should I think about the problem?

Each time the mandelbrotColor was to be called, it is mapping a coordinate to a colour. This gave me my first hint. I needed to map coordinates to colours. This implies calling map on a seq of coordinates, and ending up with a seq of colours. (Actually, not a seq, but rather a reducible collection). However, what order are the colours in? Row-by-row? That would work, but it would involve keeping a count of the offset while working over the seq, which seems onerous, particular when the required coordinates were available when the colour was calculated in the first place. So why not include the coordinates in the seq with the colour? Not only does that simplify processing, it makes the rendering of this map stateless, since any element of the seq could be rendered independently of any other.

Coordinates can be created as pairs of integers using a comprehension:

  (for [x (range width) y (range height)] [x y])

and the calculation can be done by mapping on a function that unpacks x and y and returns a triple of these two coordinates along with the calculated colour. I'll rename x and y to "a" and "b" in the mapping function to avoid ambiguity:

  (map (fn [[a b] [a b (mandelbrot-color (coord-2-math a b))])
       (for [x (range width) y (range height)] [x y]))

So now we have a sequence of coordinates and colours, but how do these get turned into an image? Again, the form of the problem provides the solution. We have a sequence (of tuples), and we want to reduce it into a single value (an image). Reductions like this are done using reduce. The first parameter for the reduction function will be the image, the second will be the next tuple to draw, and the result will be a new image with the tuple drawn in it. The reduce function isn't really supposed to mutate its first parameter, but we don't want to keep the original image without the pixel to be drawn, so it works for us here. The result is the following reduction function (type hint provided to avoid reflection on the graphical context):

  (defn plot [^Graphics g [x y c]]
    (.setColor g c)
    (.fillRect g x y 1 1)
    g)

Note that the original graphics context is returned, since this is the "new" value that plot has created (i.e. the image with the pixel added to it). Also, note that the second parameter is a 3 element tuple, which is just unpacked into x y and c.

So now the entire render process can be given as:

(reduce plot g
  (map (fn [[a b] [a b (mandelbrot-color (coord-2-math a b))])
       (for [x (range width) y (range height)] [x y])))

This works just fine, but there were performance issues, which was the part of this process that was most interesting. The full screen render (1397x856) took ~682 seconds (up from the 11.5 seconds it took Java). Obviously there were a few things to be fixed. There is still more to do, but I'll share what I came across so far.

Reflection

The first thing that @objcmdo suggested was to look for reflection. I planned on doing that, but thought I'd continue cleaning the program up first. The Complex class was still written in Java, so I embarked on rewriting that in Clojure.

The easiest way to do this was to implement a protocol that describes the actions (plus, minus, times, divide, absolute value), and to then define a record (of real/imaginary) that extends the protocol. It would have been nicer than the equivalent Java, but for one thing. Java allows method overloading based on parameter types, which means that a method like plus can be defined differently depending on whether it receives a double value, or another Complex number. My understanding is that Clojure only overloads functions based on the parameter count, meaning that different function names are required to redefine the same operation for different types. So for instance, the plus functions were written in Java as:

  public final Complex plus(Complex that) {
    return new Complex(real + that.real, imaginary + that.imaginary);
  }

  public final Complex plus(double that) {
    return new Complex(real + that, imaginary);
  }
But in Clojure I had to give them different names:
  (plus [this {that-real :real, that-imaginary :imaginary}]
        (Complex. (+ real that-real) (+ imaginary that-imaginary)))
  (plus-dbl [this that] (Complex. (+ real that) imaginary))

Not a big deal, but code like math manipulation looks prettier when function overloading is available.

It may be worth pointing out that I used the names of the operations (like "plus") instead of the symbolic operators ("+"). While the issue of function overloading would have made this awkward (+dbl is no clearer than plus-dbl) it has the bigger problem of clashing with functions of the same name in clojure.core. Some namespaces do this (the * character is a popular one to reuse), but I don't like it. You have to explicitly reject it from your current namespace, and then you need to refer to it by its full name if you do happen to need it. Given that Complex needs to manipulate internal numbers, then these original operators are needed.

So I created my protocol containing all the operators, defined a Complex record to implement it, and then I replaced all use of the original Java Complex class. Once I was finished I ran it again just to make sure that I hadn't broken anything.

To my great surprise, the full screen render went from 682 seconds down to 112 seconds. Protocols are an efficient mechanism, but they shouldn't be that good. At that point I realised that I hadn't used type hints around the Complex class, and that as a consequence the Clojure code had to perform reflection on the complex numbers. Just as @objcmdo had suggested.

Wondering what other reflection I may have missed, I tried enabling the *warn-on-reflection* flag in the repl, but no warnings were forthcoming. I suspect that this was being subverted by the fact that the code is all being run by a thread that belongs to the Swing runtime. I tried adding some other type hints, but nothing I added had any effect, meaning that the Clojure compiler was already able to figure out the types involved (or else it just wasn't in a critical piece of code).

Composable Abstractions

The next thing I wondered about was the map/reduce part of the algorithm. While it made for elegant programming, it was creating unnecessary tuples at every step of the way. Could these be having an impact?

Once you have a nice list comprehension, it's tough to break it out into an imperative-style loop. Aside from ruining the elegance of the original construct, once you've seen your way through to viewing a problem in such clear terms, it's difficult to reconceptualize it as a series of steps. Even when you do, how do you make Clojure work against itself?

Creating a loop without burning through resources can be done easily with tail recursion. Clojure doesn't do this automatically (since the JVM does not provide for it), but it can be emulated well with loop/recur. Since I want to loop between 0 (inclusive) and the width/height (exclusive), I decremented the upper limits for convenience. Also, the plot function is no longer constraint to just 2 arguments, so I changed the definition to accept all 4 arguments directly, thereby eliminating the need to construct that 3-tuple:

(let [dwidth (dec width)
                 dheight (dec height)]
  (loop [x 0 y 0]
    (let [[next-x next-y] (if (= x dwidth)
                              (if (= y dheight)
                                  [-1 -1]      ;; signal to terminate
                                  [0 (inc y)])
                              [(inc x) y])]
      (plot g x y (mandelbrotColor (coord-2-math x y)))
      (if (= -1 next-x)
        :end    ;; anything can be returned here
        (recur next-x next-y)))))

My word, that's ugly. The let that assigns next-x and next-y has a nasty nested if construct that increments x and resets it at the end of each row. It also returns a flag (could be any invalid number, such as the keyword :end) to indicate that the loop should be terminated. The loop itself terminates by testing for the termination value and returning a value that will be ignored.

But it all works as intended. Now instead of creating a tuple for every coordinate, it simply iterates through each coordinate and plots the point directly, just as the Java code did. So what's the performance difference here?

So far, the numbers I've provided are rounded to the nearest second. Repeated runs have usually taken a similar amount of time to the ones that I've reported here. However, there is always some jitter, sometimes by several seconds. Because of this, I was unable to see any difference whatsoever between using map/reduce on a for comprehension, versus using loop/recur.

That's an interesting result, since it shows that the Clojure compiler and JVM are indeed as clever as we're told, when we see that better abstractions are just as efficient as the direct approach. It's all well and good for a language to make it easy to write powerful constructs, but being able to perform more elegant code just as efficiently as more direct, imperative code that a language is really offering useful power.

Aside from the obvious clarity issues, the composability of the for/map/reduce makes an enormous difference. Because each element in the range being mapped is completely independent, we are free to use the pmap function instead of map. The documentation claims that this function is,

"Only useful for computationally intensive functions where the time of f dominates the coordination overhead."

Yup. That's us.

So how much does this change make for us? Using map on the current code, a full screen render takes 112 seconds. Changing map to pmap improves it to 75 seconds. That's a 33% improvement with no work, simply because the correct abstraction was applied. That's a very powerful abstraction.

Future Work

(Hmmm, that makes this sound like an academic paper. Should I be drawing charts?)

The final result is still a long way short of the 11.5 seconds the naïve Java code renders at. The single threaded version is particularly bad, taking about 10 times as long. I don't expect Clojure to be as fast as Java, but a factor of 10 suggests that there are some obvious things that I've missed, most likely related to reflection. If I can get it down to the same order of magnitude as the Java code, then using pmap could make the Clojure version faster due to being multi-threaded. Of course, Java can be multi-threaded as well, but the effort and infrastructure for doing this would be significant.

Sunday, September 04, 2011

SPARQL JSON

After commenting the other day that Fuseki was ignoring my request for a JSON response, I was asked to submit a bug report. It's easy to cast unsubstantiated stones in a blog, but a bug report is a different story, so I went back to confirm everything. In the process, I looked at the SPARQL 1.1 Query Results JSON Format spec (I can't find a public copy of it, so no link, sorry. UPDATE: Found it.) and was chagrined to discover that it has its own MIME type of "application/sparql-results+json". I had been using the JSON type of "application/json", and this indeed does not work, but the corrected type does. I don't think it's a good idea to ignore "application/json" so I'll report that, but strictly speaking it's correct (at least, I think so. As Andy said to me in a back channel, I don't really know what the + is supposed to mean in the subtype). So Fuseki got it right. Sorry.

When I finally get around to implementing this for Mulgara I'll try to handle both. Which reminds me... I'd better get some more SPARQL 1.1 implemented. My day job at Revelytix needs me to do some Mulgara work for a short while, so that may help get the ball rolling for me.

separate

I commented the other day that I wanted a Clojure function that takes a predicate and a seq, and returns two seqs: one that matches the predicate and the other that doesn't match. I was thinking I'd build it with a loop construct, to avoid recursing through the seq twice.

The next day, Gary (at work) suggested the separate function from Clojure Contrib. I know there's some good stuff in contrib, but unfortunately I've never taken the time to fully audit it.

The implementation of this function is obvious:
(defn separate [f s]
[(filter f s) (filter (complement f) s)])
I was disappointed to learn that this function iterates twice, but Gary pointed out that there had been a discussion on exactly this point, and the counter argument is that this is the only way to build the results lazily. That's a reasonable point, and it is usually one of the top considerations for Clojure implementations. I don't actually have cause for complaint anyway, since the seqs I'm using are always small (being built out of query structures).

This code was also a good reminder that (complement x) offers a concise replacement for the alternative code:
  #(not x %)
By extension, it's a reminder to brush up on my idiomatic Clojure. I should finish the book The Joy of Clojure (which is a thoroughly enjoyable read, by the way).

Thursday, September 01, 2011

ASTs


After yesterday's post, I noticed that a number of references came to me via Twitter (using a t.co URL). After looking for it, I realized that I'm linked to from the Planet Clojure page. I'm not sure why, though I guess it's because I'm working for Revelytix, and we're one of the larger Clojure shops around. Only my post wasn't about Clojure - it was about JavaScript. So now I feel obligated to write something about Clojure. Besides, I'm out of practice with my writing, so it would do me good.

So the project I spend most of my time on (for the moment) is called Rex. It's a RIF based rules engine written entirely in Clojure. It does lots of things, but the basic workflow for running rules is:
  1. Parse the rules out of the rule file and into a Concrete Syntax Tree. We can parse the RIF XML format, the RIF Presentation Syntax, and the RIF RDF format, and we plan on doing others.
  2. Run a set of transformations on the CST to convert it into an appropriate Abstract Syntax Tree (AST). This can involve some tricky analysis, particularly for aggregates (which are a RIF extension).
  3. Transformation of the AST into SPARQL 1.1 query fragments.
  4. Execute the engine, by processing the rules to generate more data until the capacity to generate new data has been exhausted. (My... that's a lot of hand waving).
It's step 3 that I was interested in today.

The rest of this post is about how Rex processes a CST into an AST using Clojure, and about some subsequent refactoring that went on. You have been warned...

When I first wrote the CST to AST transformation step, it was to do a reasonably straight forward analysis of the CST. Most importantly, I needed to see the structure of the rule so that I could see what kind of data it depends on, thereby figuring out which other rules might need to be run once a given rule was executed. Since the AST is a tree structure, this made for relatively straight forward recursive functions.

Next, I had to start identifying some CST structures that needed to be changed in the AST. This is where it got more interesting. Again, I had to write recursive functions, but instead of simply analyzing the data, it had to be changed. It turns out that this is handled easily by having a different function for each type of node in the tree. In the normal case the function then recurses on all of its children, and constructs an identical node type using the new children. The leaf nodes then just return themselves. The "different function" for each type is actually accessed with the same name, but dispatches on the function type. In Java that would need a visitor pattern, or perhaps a map of types to functors, but in Clojure it's handled trivially with multimethods or protocols. Unfortunately, the online resources for describing the multi-dispatch aspects of Clojure protocols are not clear, but Luke VanderHart and Stuart Sierra's book Practical Clojure covers it nicely.

As an abstract example of what I mean, say I have an AST consisting of Conjunctions, Disjunctions and leaf nodes. Both Conjunctions and Disjunctions have a single field that contains a seq of the child nodes. These are declared with:
(defrecord Conjunction [children])

(defrecord Disjunction [children])
The transformation function can be called tx, and I'll define it with multiple dispatch on the node type using multimethods:
(defmulti tx  class)


(defmethod tx Disjunction [{c :children}]
(Disjunction. (map tx c)))

(defmethod tx Conjunction [{c :children}]
(Conjunction. (map tx c)))

(defmethod tx :default [n] n)
This will reconstruct an identical tree to the original, though with all new nodes except the leaves. Now, the duplication in the Disjunction and Conjunction methods should be ringing alarm bells, but in real code the functions have more specific jobs to do. For instance, the Conjunction may want to group terms that meet a certain condition (call the test "special-type?") into a new node type (call it "Foo"):
;; A different definition for tx on Conjunction

(defmethod tx Conjunction [{c :children}]
(let [new-children (map tx c)
special-nodes (filter special-type? new-children)
other-nodes (filter (comp not special-type?) new-children)]
(Conjunction. (conj other-nodes (Foo. special-nodes)))))
Hmmm... while writing that example I realized that I regularly run into the pattern of filtering out everything that meets a test, and everything that fails the test. Other than having to test everything twice, it seems too verbose. What I need is a function that will take a seq and a predicate and return a tuple containing a seq of everything that matches the predicate, and a second seq of everything that fails the predicate. I'm not seeing anything like that right now, so that may be a job for the morning.

I should note, that there is no need for a function to return the same type that came into it. There are several occasions where Rex returns a different type. For example, a conjunction between a Basic Graph Pattern (BGP) and the negation of another BGP becomes a MINUS operation between the two BGPs (a Basic Graph Pattern comes from SPARQL and is just a triple pattern for matching against the subject/predicate/object of a triple in an RDF store).

Overall, this approach works very well for transforming the CST into the full AST. As I've needed to incorporate more features and optimizations over time, I found that I had two choices. Either I could expand the complexity of the operation for every type in the tree processing code, or I could perform different types of analysis on the entire tree, one after another. The latter makes the process far easier to understand, making the design more robust and debugging easier, so that's how Rex has been written. It makes analysis slightly slower, but analysis is orders of magnitude faster than actually running the rules, so that is not a consideration.

Threading

The first time through, analysis was simple:
(defn analyse [rules]

(process rules))
Adding a new processing step was similarly easy:
(defn analyse [rules]

(process2 (process1 rules)))
But once a third, then a fourth step appeared, it became obvious that I needed to use the Clojure threading macro:
(defn analyse [rules]

(-> rules
process1
process2
process3
process4))
So now it's starting to look nice. Each step in the analysis process is a single function name implemented for various types. These names are then provided in a list of things to be applied to the rules, via the threading macro. There's a little more complexity (one of the steps picks up references to parts of the tree, and since each stage changes the tree, then these references will be pointing to an old and unused version of the tree. So that step has to be last), but it paints the general picture.

Partial

Another thing that I've glossed over is that each rule is actually a record that contains the AST as one of its members. The rules themselves are a seq which is in turn a member of a record containing a "Rule Program". So each process step actually ended up being like this:
(defn process1 [rule-program]

(letfn [(process-rule [rule] (assoc rule :body (tx (:body rule))))]
(assoc rule-program :rules (map process-rule (:rules rule-prog)))))
Did you follow that?

The bottom line is replacing the :rules field with a new one. It's mapping process-rule onto the seq of rules, and storing the result in a new rules-program, which is what gets returned. The process-rule function is defined locally as associating the :body of a rule with the tx function applied to the existing body. This creates a new rule that has has the tx applied to it.

This all looked fine to start with. A new rule program is created by transforming all the rules in the old program. A transformed rule is created by transforming the AST (the :body) in the old rule. But after the third analysis it became obvious that there was duplication going on. In fact, it was all being duplicated except for the tx step. But that was buried deep in the function. What was the best way to pull it out?

To start with, the embedded process-rule function came out. After all, it was just inside to hide it, and not because it had to pick up a closure anywhere. This function then accepts the kind of transformation that it needs to do as a parameter:
(defn convert-rule

[convert-fn rule]
(assoc rule :body (convert-fn (:body rule))))
Next, we want a general function for converting all the rules, which can accept a conversion function to pass on to convert-rule. It does all the rules, so I just pluralized the name:
(defn convert-rules

[conversion-fn rule-prog]
(assoc rule-prog :rules (map #(convert-rule conversion-fn %) (:rules rule-prog)))))
That works, but now the function getting mapped is looking messy (and messy leads to mistakes). I could improve it by defining a new function, but I just factored a function out of this function. Fortunately, there is a simpler way to define this new function. It's a "partial" application of convert-rule. But I'll still move it into a let block for clarity:
(defn convert-rules

[conversion-fn rule-prog]
(let [conv-rule (partial convert-rule conversion-fn)]
(assoc rule-prog :rules (map conv-rule (:rules rule-prog)))))
So now my original process1 definition becomes a simple:
(defn process1 [rule-program]

(convert-rules tx rule-program))
That works, but the rule-program parameter is just sticking out like a sore thumb. Fortunately, we've already seen how to fix this:
(def process1 (partial convert-rules tx))
Indeed, all of the processing functions can be written this way:
(def process1 (partial convert-rules tx))

(def process2 (partial convert-rules tx2))
(def process3 (partial convert-rules tx3))
(def process4 (partial convert-rules tx4))
It may seem strange that a function is now being defined with a def instead of a defn, but it's really not an issue. It's worth remembering that defn is just a macro that uses def to attach a symbol to a call to (fn ...).

Documenting

Of course, my functions don't appear as sterile as what I've been typing here. I do indeed use documentation. That means that the process1 function would look more like:
(defn process1 [rule-program]

"The documentation for process1"
(convert-rules tx rule-program))
One of the nice features of the defn macro is the ease of writing documentation. This isn't as trivial with def since it's a special form, rather than a macro, but it's still not too hard to do. You just need to attach some metadata to the object, with a key of :doc. Unfortunately, I couldn't remember the exact syntax for this today, and rather then go trawling through books or existing code, Alex was kind enough to remind me:
(def ^{:doc "The documentation for process1"}

process1 (partial convert-rules tx))


Framework

The upshot of all of this is a simpler framework for adding new steps to the existing analysis system. Adding a new analysis step just needs a new function thrown into the thread. I could put a call to (partial convert-rules ...) directly into the thread, but by using a def I get to name and document that step of the analysis. the only real work of the analysis is then done in the single multi-dispatch function, which is just as it should be.

So right now my evening "hobby" has been JavaScript, while my day job is Clojure. I have to tell you, the day job is much more fun.

Robusta

I was just posting this into Google+ when I realized I was typing more than I'd intended. It read more like a short blog post. Which reminded me.... "Oh yeah. I have a blog out there somewhere! Maybe I should write this in there."

After spending a lot of recent nights on JavaScript I think I'm starting to get a feel for it. I started with Douglas Crockford's "JavaScript: The Good Parts", and am now plowing through David Flanagan's "JavaScript: The Definitive Guide". (I met David when he came to Brisbane to run a short course on Java programming back in 1996. Nice guy. He said that he'd preferred to have called his book "Java in a Demitasse", but O'Reilly wanted it in their "Nutshell" series).

After using languages like Ruby, Erlang, Scala and Clojure, I'm finding it a little frustrating, but it's hard to argue with the ubiquity of the platform. Fortunately it has closures and first class functions, though the variable scoping is bizarre. I've been enjoying the callback approach to asynchronous function calls, though the syntax tends to make the resulting code confusing to read. I'm mostly sticking to Crockford's subset of the language, and this does make things a little more sensible. Flanagan's book has been filling in the gaps for me, but it's especially useful for documenting libraries like HTML5 Canvas and the File API.

As with all first attempts at using a language, mine is a little messy and inconsistent. However, it can't hurt to put it out there. I've built a simple tool for working with SPARQL endpoints (specifically aimed at Jena/Fuseki, but it should mostly work on others too). The important piece is the SPARQL connection object that it comes with (found in sparql.js). I'm hoping that this will be a useful object for more general application. It can even convert XML responses into a SPARQL-JSON structure (I wrote this after discovering Fuseki was ignoring my Content-type settings on queries).

Unfortunately, it's missing one important piece, which is the ability to upload a file from a browser. In general, it's possible to upload a file using a form submission, but that encodes all the parameters into the request body, and the SPARQL HTTP protocol requires that the graph URI appear as a parameter in the URL of the request. In an attempt to get the graph parameter out of the body and into the URL, I even tried dynamically constructing the URL for the form submission, but the browser "cleverly" saw what I was doing and pushed the parameter back into the body. So I can't use form submission. Alternatively, JavaScript makes it easy to submit an HTTP POST operation with everything set the way you want it. However, the only way to read a local file is through the form submission process, which means I still can't do a file upload. In the end, I just used Fuseki's file upload servlet, but this has the problem of being non-standard, and it also doesn't like URIs that aren't http (yes Andy, that's why I asked you about this restriction - though I'd already run into it at work).

The resulting system needed a name, so I called it Robusta. Everyone seems to be enamored with Arabica beans, but no one ever talks about Robusta beans. Don't get me wrong.... if I had to choose between the two I'd definitely go for the Arabica. But by blending in a small portion of Robusta beans you add a richness to the flavor of your coffee (it's also used as a cheap "filler" and promotes crema in espresso, but I like the flavor aspect). At the time, I came up with the name because I really needed some caffeine, and Arabica was too obvious. But in retrospect, I like the name, since adding a bit of SPARQL to your scripts can really enhance a system (OK, that's tacky. I probably need another one of those coffees). It wasn't after I'd had some coffee that I thought to look for other projects with the same name, but by that point it was already up there.

Robusta is still a work in progress, and it's mostly a late night project that fits around everything else that I'm doing. But I'm using it at work, and it makes my life easier. I'd like to know if anyone has ideas for it, or can point out errors, inefficiencies, or potential improvements. It's posted at GitHub as a part of the Revelytix project group, at:
  http://github.com/revelytix/robusta