Thursday, August 31, 2006

Nothing

I didn't want to embarrass myself in public, so I looked at the whole owl:Thing vs. owl:Nothing again. Something I'd forgotten is that owl:Nothing cannot be instantiated. So it's OK if owl:Nothing seems to be a contradiction (I think), since it is only the instances of a class which create problems, and you can't have an instance of owl:Nothing.

It's after midnight. I'd better get to bed before my head hurts any more.

Wednesday, August 30, 2006

OWL

I'm finally getting off my rear end and writing some code to do OWL in Mulgara. (I hear the gasps of disbelief from the peanut gallery).

So I needed to start with some OWL axioms. You know, owl:Class is an owl:Class, it's a rdfs:subClass of owl:Class, etc. So I went to a handy little file I have named owl.rdf. Now I distinctly recall that the file was not originally named owl.rdf but I wasn't sure why I'd renamed it. All the same, I've had it for some time, and it looks official, so I decided to work with it.

The easiest way to convert it into Krule axioms was to load this file into Mulgara, and then have an iTQL client build the RDF for me. I had a one line bug (why is it always one line?) which took me a while to track down to a typo in a constructor where I effectively said:
  parameter = this.parameter;
But finally my output was as expected.

At this stage I noticed something odd in the axioms. Out of curiosity I went back to the owl.rdf file and discovered this:
  <Class rdf:ID="Thing">
<rdfs:label>Thing</rdfs:label>
<unionOf rdf:parseType="Collection">
<Class rdf:about="#Nothing"/>
<Class>
<complementOf rdf:resource="#Nothing"/>
</Class>
</unionOf>
</Class>

<Class rdf:ID="Nothing">
<rdfs:label>Nothing</rdfs:label>
<complementOf rdf:resource="#Thing"/>
</Class>
So on one hand it says that owl:Thing is the union of owl:Nothing and everything, and on the other hand, owl:Thing is the complement of owl:Nothing. I can see a rational behind both points of view, but they can't both be true at one. owl:Thing either includes owl:Nothing or it doesn't. (Or am I missing something really big here?)

So where did this file come from? Unfortunately, I picked it up with wget. If I'd used Safari, then it would store the URL in metadata which I could view with mdls. Apple metadata might be a nice patch to work on for wget. Fortunately, when I went to the W3C to search for the file, I suddenly remembered where it came from.

RDF uses Uniform Resource Identifiers (URI) to identify resources. Note that these are Identifiers and not the more familiar Uniform Resource Locators (URL). URLs are a subset of URIs, but their purpose is for location rather than purely identification. A URI may look like a URL, but it may not be specifying the location of anything at all. This is often confusing for first-time users of RDF, who expect that the identifiers they see should be taking them somewhere useful.

Now just because RDF says that you don't need something at the address of a given URI, that doesn't mean you can't put something there. In fact, many people do, including the W3C. For this reason, one day I decided to put the URI for the OWL domain into wget and see if I got anything. The result was this RDF file called owl. Once I remembered that, I also remembered changing the name to the more useable owl.rdf.

So here is this file sitting at the W3C, hiding in plain sight. I find it amusing that it can have such a contradiction in it.

Friday, August 25, 2006

Am I Getting Old?


This story has been written elsewhere, but I have my own contributions to make to it. Hopefully the catharsis will help me sleep better. :-)

I've been helping to interview several programmers recently, and I've really been surprised at the results.

To be clear, I do not have a degree in Computer Science. My first degree was in Computer Engineering. That means that I spent many long hours debugging breadboard with a logic analyzer, and the code I was writing in my final year was usually for device drivers to communicate with the hardware I'd built. That isn't to say that I didn't have subjects in computer science - I did.

I did the basic first year CS course, which taught us about records, how to build linked lists, recursive functions, simple sorting algorithms (bubble sort!), and the other essentials. I did subjects on networking, operating systems, concurrency (anyone care to dine with philosophers?), modular programming, complexity theory, and databases. Funnily enough, in my first database course we were not given access to computers, and never learnt SQL. Instead, Terry Halpin taught us how to model data using a system called NIAM. It seemed very theoretical at the time, but it ended up being a vital foundation for my later work in OWL. I also had Computer Engineering subjects which taught me Assembler (in several flavours) and C, how to use them to talk to hardware and/or the operating system, how to model hardware in software, sockets programming, and so on.

I never covered more advanced data structures or algorithms, but then I never did a degree in CS, did I? My physics degree gave me the opportunity to do a couple of extra subjects in programming, but these were vocational, and hardly worth the effort. Instead, I had to gain an understanding of professional programming in my own time. I had a very good mentor to start with, but his main role was in teaching me how to teach myself. Consequently, I now have a large bookshelf full of books covering topics from MFC and GTK, through to Knuth, Type Theory, and Category Theory. There are still holes in my knowledge, but I've covered a lot, and I'm still progressing. In fact, learning after university has meant that I've learnt better, as I learnt out of genuine interest, and not because some lecturer said I needed to learn something in order to pass an exam. For this reason, I have learnt to have less respect for certified courses, and a greater appreciation for enthusiastic amateurs.

The same willingness to keep learning also applies to most of my friends and colleagues. This is important in our industry, both because the field is too large to know it all, and because the systems and protocols implementing the theory are constantly expanding. Fortunately, we don't need to know it all. The fundamental principles in computer science always hold, and form a solid foundation for all future learning. Once you have these down, then any new language or paradigm can be understood and applied quite rapidly.

So when we interview candidates, I look to make sure that they have a grasp of the fundamentals. Instead I keep discovering people with years of experience in J2EE, who are completely unaware of the underlying principles of computer science. It is almost universal that these people have solid vocational training, but very little ability to work outside of established paradigms.

Why do I consider these things important? Isn't it acceptable to simply work with EJBs, SOAP and JDBC? Why should I expect more?

There are several answers to these questions.

To start with, staying within these narrow APIs has left many of these people unaware of significant APIs in Java proper. Most developers we have met do not even know what file mapping is, yet alone the fact that it is included in the java.nio package. Concurrency is a subject with subtle complexities, yet these developers are often unaware of the attributes which make up a Thread in Java, including the stack that each thread owns. How can they hope to correctly use a ThreadLocal object, or know how to debug a deadlock?

Another problem with vocational training with a narrow focus is that almost all systems have a chance to interact with each other in some way. Not understanding the surrounding systems leads to an inability to solve some problems. I have yet to meet a candidate who understands what significance a 64 bit operating system has for a JVM, when compared to running on a 32 bit system. This is so important that Mulgara has a completely different configuration for file management when operating on 64 bit systems, with vastly different performance. How can programmers working on scalable "Enterprise" applications be effective at their jobs if they can't understand this difference? The level of ignorance is surprising, given the general interest in 64 bit chips and operating systems.

This extends down to simpler issues as well. As with many languages, some understanding of 2's complement binary arithmetic is required for working with all of the numeric types in Java. For instance, for every signed integer type in Java, the abs(MIN_VALUE) is always 1 greater than the MAX_VALUE. Even when shown this fact, the people I've spoken to don't understand the significance of the expression -MIN_VALUE. This could easily cause problems in code that said:
  if (-x > 0) {...}

We also find an amazing ignorance of complexity. How can a programmer write a scalable application if they believe that linear complexity is to be preferred over logarithmic? We even had one person think that polynomial was best!

The next issue is that Java will not be the dominant paradigm for the rest of our careers. Computing moves fast. Java may be around for some time yet (look at C), but something new is guaranteed to usurp it, and it will happen in the next decade (I feel uncomfortable making a statement with a guaranteed timeframe, but I have history on my side). For instance, many clients of mission critical systems are looking to developers using functional languages with the ability to mathematically prove the correctness of a system. I'm only mentioning it briefly, but the importance of functional languages in this respect cannot be underestimated. Proof trees make it possible to prove the correctness of an imperative program, but the results are messier than practical in a commercial environment.

Finally, and this is one of the most important principles, other languages teach programmers to think differently. This allows programmers to write important functionality in Java that would not be possible with standard J2EE practices. Mulgara needed a C programmer like David in order to have data structures on disk which are as efficient as the ones we have. Standard J2EE techniques would not store anything directly to disk, but would use a relational database. Even if a programmer had chosen to write directly to disk, standard serialization techniques would have been used. Instead, Mulgara uses densely packed data structures that serialize and deserialize quickly, and are structured in complex ways that are made cleanly opaque to the calling object oriented code... and it is all done in Java. David's understanding of how operating systems use write-behind algorithms, which hard drive operations are atomic, and how best to interact with the operating system's buffer cache were all vital in Mulgara's performance.

Google needed Lisp programmers to write MapReduce, which is one of the most important and fundamental tools in their system. It was Lisp programmers trying to get some semblance of Lambda Calculus into C++ who introduced the templates, and ultimately the STL. This then led to template classes and functions in Java, again highlighting the need to understand these other systems in order to better program in Java.

If we want to write important code that does anything beyond standard business logic for commercial applications, then we need people who understand computer science, and not people who have received vocational training in the popular language of the day. They need to understand the mathematics behind programming, the pros and cons of both functional and imperative languages, the features of underlying operating systems, along with many other things.

Finally, if they want to be really good, they need a desire to go out and learn some of this stuff for themselves, because no course can teach it all. It's just a shame that they don't seem to teach the basics any more. And it's the fact that I typed that last statement that makes me realise that I'm getting old. :-)

Thursday, August 24, 2006

One at a Time


I'd love to be writing more here, but instead I'm putting my Mulgara ideas up on the Mulgara Wiki. Anyone interested should look under Development/Indexes.

I can only really manage the time for writing in one place at a time. Not to say that I'm abandoning this blog, but it is more appropriate to put Mulgara design details on the Wiki.

Saturday, August 19, 2006

Mulgara v1.0.0


Mulgara v1.0.0 has been released!

All my spare time has gone into this recently, which should explain the dearth of blogs posts. Now that the first release is out, I should have more time to:
  • Write some code.
  • Write more blog entries.
  • Get OWL working in Mulgara.
I hope that future releases will require less effort, since the project is off the ground, and we've established several procedures for this first release.

Thanks to the people who helped me get it out. In particular:
  • Brian, who coded and debugged.
  • David, who reviewed everything from legal files through to web pages, and without whom I'd find myself in a world of trouble.
  • Thomas, who volunteered out of nowhere to design the web pages for us.


For anyone missing my technical musings on the structure of Mulgara, then have a look at the Mulgara Wiki. I've just started using this area, so others can contribute, and also because it allows me to structure some of the details.