Friday, July 13, 2007

CodePoints

I was just asked about the Code Point code that I discussed last year. This is essentially a fix for java.lang.Character, since Unicode can require more than 16 bits.

I've put the Java file up here. There's also an example class which takes a string on the command line, and returns the characters sorted, with duplicates removed (the original task that led me to write this class). For instance, if the command line is:
$ java SortUnicode bacdaffaabzr
Then the output is:
abcdfrz
Sure, this can be done relatively easily in Java. The advantage of the CodePoint class is that it provides a useful utility interface, and allows a more functional approach. In this case, it's possible to use a single (verbose) line:
CodePoint.toString(
CodePoint.toArray(
new TreeSet(
CodePoint.toList(inputStr)
)
)
)
There's nothing fancy here, but I hope it's useful.

No comments: