Sunday, March 03, 2013

Clojurescript and Node.js

For a little while now I've been interested in JavaScript. JS was a strong theme at Strangeloop 2011, where it was being touted as the next big Virtual Machine. I'd only had a smattering of exposure to JS at that point, so I decided to pick up Douglas Crockford's JavaScript: The Good Parts, which I quite enjoyed. I practiced what I learned on Robusta, which is both a SPARQL client library for JS, and simple UI for talking to SPARQL endpoints. It was a good learning exercise and it's proven to be useful for talking to Jena. Since then, I've thought that it might be interesting to write some projects in JS, but my interests are usually in data processing rather than on the UI front end. So the browser didn't feel like the right platform for me.

When ClojureScript came out, I was interested, but again without a specific UI project in mind I kept putting off learning about it.

At this point I'd heard about node.js, but had been misled into thinking that it was for server-side JS, which again didn't seem that interesting when the server offered so many interesting frameworks to work with (e.g. Rails).

This last Strangeloop had several talks about JS, and about the efficiency of the various engines, despite having to continue supporting some unfortunate decisions early in the language's design (or lack of design in some cases). I came away with a better understanding of how fast this system can be and started thinking about how I should try working with it.

But then came Clojure/conj last year and it all gelled.

When I first heard Chris Granger speaking I'll confess that I didn't immediately see what was going on. He was targeting Node.js for Light Table, but as I said, I didn't realize what Node really was. (Not to mention that the social aspects of the conj had me a little sleep deprived). So it wasn't until after his talk that I commented to a colleague (@gtrakGT) that it'd be great it there was a JS engine that wasn't attached to a browser, but still had a library that let you do real world things. In retrospect, I feel like an idiot.  :-)

So I started on Node.

ClojureScript with Node.js

While JS has some nice features, in reality I'd much rather write Clojure, meaning that I finally had a reason to try ClojureScript. My first attempts were a bit of a bumpy ride, since I had to learn how to target ClojureScript to Node, how to access functions for getting program arguments, how to read/write with Node functions. Most of all, I had to learn that if I accidentally used a filename ending in .clj instead of .cljs then the compiler would silently fail and the program would print bizarre errors.

All the same, I was impressed with the speed of starting up a ClojureScript program. I found myself wondering about how fast various operations were running, in comparison to Clojure on the JVM. This is a question I still haven't got back to, but it did set me off in some interesting directions.

While driving home after the conj, the same colleague who'd told me how wrong I'd been about Node.js asked me about doing a simple-yet-expensive calculation like large factorials. It was trivial in Clojure, so I tried it with ClojureScript, and immediately discovered that ClojureScript uses JavaScript's numbers. These are actually floats, but for integers it means that they support 53 bits. Clojure automatically expands large number into the BigInteger class from Java, but JavaScript doesn't have the same thing to work with.

At that point I should have considered searching for BigIntegers in JavaScript out there on the web somewhere, but I was curious about how BigIntegers were implemented, so I started looking the code for java.math.BigInteger and seeing if it would be reproduced in Clojure (using protocols, so I can make objects that look a little like the original Java objects). The bit manipulations started getting tricky, and I put it aside for a while.

Then recently I had a need to pretty-print some JSON and EDN. I'd done it in Clojure before, but I used the same process on the command line, and starting the JVM is just too painful for words. So I tried writing it in JS for node, and was very happy with both the outcome and the speed of it. This led me back to trying the same thing in ClojureScript.

ClojureScript Hello World Process

The first thing I try doing with ClojureScript and Node is getting a Hello World program going. It had been months since I'd tried this, so I thought I should try again. Once again, the process did not work smoothly so I thought I'd write it down here.

Clojure can be tricky to use without Leiningen, so this is typically my first port of call for any Clojure-based project. I started out with "lein new hello" to start the new project. This creates a "hello" directory, along with various standard sub-directories to get going.

The project that has been created is a Clojure project, rather than ClojureScript, so the project file needs to be updated to work with ClojureScript instead. This means opening up project.clj and updating it to include lein-cljsbuild along with the configuration for building the ClojureScript application.

The lein-cljsbuild build system is added by adding a plugin, and a configuration to the project. I also like being able to run "lein compile" and this needs a "hook" to be added as well:

  :hooks [leiningen.cljsbuild]
  :plugins [[lein-cljsbuild "0.3.0"]]
  :cljsbuild {
    :builds [{
      :source-paths ["src/hello"]
      :compiler { :output-to "src/js/hello.js"
                  :target :nodejs
                  :optimizations :simple
                  :pretty-print true }}]}

Some things to note here:
  • The source path (a sequence of paths, but with just one element here) is where the ClojureScript code can be found.
  • The :output-to can be anywhere, but since it's source code and I wanted to look at it, I put it back into src, albeit in a different directory.
  • The :target has been set to :nodejs. Without this the system won't be able to refer to anything in Node.js.
  • The :optimizations are set to :simple. They can also be set to :advanced, but nothing else. (More on this later).
To start with, I ignored most of this and instead got it all running at the REPL. That meant setting up the dependencies in the project file to include cljs-noderepl:

  :dependencies [[org.clojure/clojure "1.5.0"]
                 [org.clojure/clojurescript "0.0-1586"]
                 [org.bodil/cljs-noderepl "0.1.7"]]

(Yes, Clojure 1.5.0 was released last Friday!)

Using the REPL requires that it be started with the appropriate classpath, which lein does for you when the dependencies are set up and you use the "lein repl" command. Once there, the ClojureScript compiler just needs to be "required" and then the compiler can be called directly, with similar options to those shown in the project file:

user=> (require 'cljs.closure)
nil
user=> (cljs.closure/build "src/hello" {:output-to "src/js/hello.js" :target :nodejs :optimizations :simple})
nil

Once you can do all of this, it's time to modify the source code:
  (ns hello.core)

  (defn -main []
    (println "Hello World"))

  (set! *main-cli-fn* -main)

The only trick here is setting *main-cli-fn* to the main function. This tells the compiler which function to run automatically.

The project starts with a core.clj file, which is what you'll end up editing. Compiling this appears to work fine, but when you run the resulting javascript you get an obscure error. The fix for this is to change the source filename to core.cljs. Once this has been changed, the command line compiler (called with "lein compile" will tell you which files were compiled (the compiler called from the REPL will continue to be silent). If the command line compiler does not mention which files were compiled by name, then they weren't compiled.

I wanted to get a better sense of what was being created by the compiler, so I initially tried using optimization options of :none and :whitespace, but then I got errors of undefined symbols when I tried to run the program. I reported this as a bug, but was told that this was known and an issue with Google's Closure tool (which ClojureScript uses). The :simple optimizations seem to create semi-readable code though, so I think I can live with it.

Interestingly, compiling created a directory called "out" that contained a lot of other code generated by the compiler. Inspection showed several files, including a core.cljs file that carries the Clojure core functions. I'm presuming that this gets fed into the clojurescript compiler along with the user program. The remaining files are mostly just glue for connecting things together. For instance, nodejscli.cljs contains:

(ns cljs.nodejscli
  (:require [cljs.nodejs :as nodejs]))

; Call the user's main function
(apply cljs.core/*main-cli-fn* (drop 2 (.-argv nodejs/process)))

This shows what ends up happening with the call to (set! *main-cli-fn* -main) that was required in the main program.

Where Now?

Since Node.js provides access to lots of system functions, I started to wonder just how far I could push this system. Browsing the Node.js API I found functions for I/O and networking, so there seems to be some decent scope in there. However, since performance is really important to V8 (which Node.js is built from) then how about fast I/O operations like memory mapping files?

I was disappointed to learn that there are in fact serious limits to Node.js. However, since the engine is native code, there is nothing to prevent extending it to do anything you want. Indeed, using "require" in JavaScript will do just that. It didn't take much to find a project that wraps mmap, though the author freely admits that it was an intellectual exercise and that users probably don't want to use it.

Using a library like mmap in ClojureScript was straight forward, but showed up another little bug. Calling "require" from JavaScript returns a value for the object containing the module's functions. Constant values in the module can be read using the Java-interop operation. So to see the numeric value of PROT_READ, you can say:


(defn -main []
  (let [m (js/require "mmap")]
    (println "value: " m/PROT_READ)))


However, a module import like this seems to map more naturally to Clojure's require, so I tried a more "global" approach and def'ed the value instead:


(def m (js/require "mmap"))
(defn -main []
  (println "value: " m/PROT_READ))


However, this leads to an error in the final JavaScript. Fortunately, the . operator will work in this case. This also leads to one of the differences with Clojure. Using the dot operator with fields requires that the field name be referenced with a dash leader:


(def m (js/require "mmap"))
(defn -main []
  (println "value: " (.-PROT_READ m)))


Finally, I was able to print the bytes out of a short test file with the following:

(ns pr.core
  (:use [cljs.nodejs :only [require]]))

(def u (require "util"))
(def fs (require "fs"))
(def mmap (js/require "mmap"))

(defn mapfile [filename]
  (let [fd (.openSync fs filename "r")
        sz (.-size (.fstatSync fs fd))]
    (.map mmap sz (.-PROT_READ mmap) (.-MAP_SHARED mmap) fd 0)))

(defn -main []
  (let [buffer (mapfile "data.bin")
        sz (alength buffer)]
   (println "size of file: " sz)
   (doseq [i (range sz)]
     (print " " (aget buffer i)))
   (println))) 

(set! *main-cli-fn* -main)

Since I've now seen that it's possible, it's inspired me to think about reimplementing Mulgara's MappedBlockFile and AVLNode classes for Clojure and ClojureScript. That would be a good opportunity to get a BTree implementation coded up as well.

I've been thinking of redoing these things for Clojure ever since starting on Datomic. Mulgara's phased trees are strikingly like Datomic's, with one important exception - in Mulgara we went to a lot of effort to reap old nodes for reuse. This is complex, and had a performance cost. It was important 13 years ago when we first started on it, but things have changed since then. More recent implementations of some of Mulgara have recognized that we don't need to be so careful with disk space any more, but Rich had a clearer insight: not only can we keep older transactions, but we should. As soon as I realized that, then I realized it would be easy to improve performance dramatically in Mulgara. However, to make it worthwhile we'd have to expose the internal phase information in the same way that Datomic does.

Unfortunately, Mulgara is less interesting to me at the moment, since it's all in Java, which is why I'm moving to re-implement so much RDF work in Clojure at the moment. A start to that can be found in crg, crg-turtle, and kiara. Mulgara isn't going away... but it will get modernized.

More in ClojureScript

Family is calling, so I need to wrap this post up. Funnily enough, I didn't get to write about the thing that made me think to blog in the first place. That's bit manipulations.

You'll recall that I mentioned the lack of a BigInteger in ClojureScript (since it's a Java class). As an exercise in doing more in ClojureScript, and learning about how BigInteger is implemented, I started trying to port this class in Clojure (and ultimately, ClojureScript). The plumbing is easy enough, as I can just use a record and a protocol, but some of the internals have been trickier. That's because BigInteger packs the numbers into an array of bits, that is in turn represented with an int array. More significantly, BigInteger does lots of bit manipulation on these ints.

Bit manipulation is supported in JavaScript and ClojureScript, but with some caveats. One problem is that the non-sign-extending right-shift operation (>>>) is not supported. I was surprised to learn that it isn't supported in Clojure either (this seems strange to me, since it would be trivial to implement). The bigger problem is that numbers are stored as floating point values. Fortunately, numbers of 32 bits or smaller can be treated as unsigned integers. However, integers can get larger than this limit, and if that happens, then some of the bit operations stop working. It's possible to do bit manipulation up to 53 bits, but signed values won't work as expected in that range, so it's basically off the table.

Anyway, I have a lot to learn yet, and a long way to go to even compare how a factorial in ClojureScript will compare to the same operation in Clojure, but in the meantime, it's fun. Searching for Node.js modules has shown a lot of early exploration is being performed by people from all over. Some of the recent threading modules are a good example of this.

I have a feeling that Node.js will get more important as time goes on, and with it, ClojureScript.