Friday, August 25, 2006

Am I Getting Old?

This story has been written elsewhere, but I have my own contributions to make to it. Hopefully the catharsis will help me sleep better. :-)

I've been helping to interview several programmers recently, and I've really been surprised at the results.

To be clear, I do not have a degree in Computer Science. My first degree was in Computer Engineering. That means that I spent many long hours debugging breadboard with a logic analyzer, and the code I was writing in my final year was usually for device drivers to communicate with the hardware I'd built. That isn't to say that I didn't have subjects in computer science - I did.

I did the basic first year CS course, which taught us about records, how to build linked lists, recursive functions, simple sorting algorithms (bubble sort!), and the other essentials. I did subjects on networking, operating systems, concurrency (anyone care to dine with philosophers?), modular programming, complexity theory, and databases. Funnily enough, in my first database course we were not given access to computers, and never learnt SQL. Instead, Terry Halpin taught us how to model data using a system called NIAM. It seemed very theoretical at the time, but it ended up being a vital foundation for my later work in OWL. I also had Computer Engineering subjects which taught me Assembler (in several flavours) and C, how to use them to talk to hardware and/or the operating system, how to model hardware in software, sockets programming, and so on.

I never covered more advanced data structures or algorithms, but then I never did a degree in CS, did I? My physics degree gave me the opportunity to do a couple of extra subjects in programming, but these were vocational, and hardly worth the effort. Instead, I had to gain an understanding of professional programming in my own time. I had a very good mentor to start with, but his main role was in teaching me how to teach myself. Consequently, I now have a large bookshelf full of books covering topics from MFC and GTK, through to Knuth, Type Theory, and Category Theory. There are still holes in my knowledge, but I've covered a lot, and I'm still progressing. In fact, learning after university has meant that I've learnt better, as I learnt out of genuine interest, and not because some lecturer said I needed to learn something in order to pass an exam. For this reason, I have learnt to have less respect for certified courses, and a greater appreciation for enthusiastic amateurs.

The same willingness to keep learning also applies to most of my friends and colleagues. This is important in our industry, both because the field is too large to know it all, and because the systems and protocols implementing the theory are constantly expanding. Fortunately, we don't need to know it all. The fundamental principles in computer science always hold, and form a solid foundation for all future learning. Once you have these down, then any new language or paradigm can be understood and applied quite rapidly.

So when we interview candidates, I look to make sure that they have a grasp of the fundamentals. Instead I keep discovering people with years of experience in J2EE, who are completely unaware of the underlying principles of computer science. It is almost universal that these people have solid vocational training, but very little ability to work outside of established paradigms.

Why do I consider these things important? Isn't it acceptable to simply work with EJBs, SOAP and JDBC? Why should I expect more?

There are several answers to these questions.

To start with, staying within these narrow APIs has left many of these people unaware of significant APIs in Java proper. Most developers we have met do not even know what file mapping is, yet alone the fact that it is included in the java.nio package. Concurrency is a subject with subtle complexities, yet these developers are often unaware of the attributes which make up a Thread in Java, including the stack that each thread owns. How can they hope to correctly use a ThreadLocal object, or know how to debug a deadlock?

Another problem with vocational training with a narrow focus is that almost all systems have a chance to interact with each other in some way. Not understanding the surrounding systems leads to an inability to solve some problems. I have yet to meet a candidate who understands what significance a 64 bit operating system has for a JVM, when compared to running on a 32 bit system. This is so important that Mulgara has a completely different configuration for file management when operating on 64 bit systems, with vastly different performance. How can programmers working on scalable "Enterprise" applications be effective at their jobs if they can't understand this difference? The level of ignorance is surprising, given the general interest in 64 bit chips and operating systems.

This extends down to simpler issues as well. As with many languages, some understanding of 2's complement binary arithmetic is required for working with all of the numeric types in Java. For instance, for every signed integer type in Java, the abs(MIN_VALUE) is always 1 greater than the MAX_VALUE. Even when shown this fact, the people I've spoken to don't understand the significance of the expression -MIN_VALUE. This could easily cause problems in code that said:
  if (-x > 0) {...}

We also find an amazing ignorance of complexity. How can a programmer write a scalable application if they believe that linear complexity is to be preferred over logarithmic? We even had one person think that polynomial was best!

The next issue is that Java will not be the dominant paradigm for the rest of our careers. Computing moves fast. Java may be around for some time yet (look at C), but something new is guaranteed to usurp it, and it will happen in the next decade (I feel uncomfortable making a statement with a guaranteed timeframe, but I have history on my side). For instance, many clients of mission critical systems are looking to developers using functional languages with the ability to mathematically prove the correctness of a system. I'm only mentioning it briefly, but the importance of functional languages in this respect cannot be underestimated. Proof trees make it possible to prove the correctness of an imperative program, but the results are messier than practical in a commercial environment.

Finally, and this is one of the most important principles, other languages teach programmers to think differently. This allows programmers to write important functionality in Java that would not be possible with standard J2EE practices. Mulgara needed a C programmer like David in order to have data structures on disk which are as efficient as the ones we have. Standard J2EE techniques would not store anything directly to disk, but would use a relational database. Even if a programmer had chosen to write directly to disk, standard serialization techniques would have been used. Instead, Mulgara uses densely packed data structures that serialize and deserialize quickly, and are structured in complex ways that are made cleanly opaque to the calling object oriented code... and it is all done in Java. David's understanding of how operating systems use write-behind algorithms, which hard drive operations are atomic, and how best to interact with the operating system's buffer cache were all vital in Mulgara's performance.

Google needed Lisp programmers to write MapReduce, which is one of the most important and fundamental tools in their system. It was Lisp programmers trying to get some semblance of Lambda Calculus into C++ who introduced the templates, and ultimately the STL. This then led to template classes and functions in Java, again highlighting the need to understand these other systems in order to better program in Java.

If we want to write important code that does anything beyond standard business logic for commercial applications, then we need people who understand computer science, and not people who have received vocational training in the popular language of the day. They need to understand the mathematics behind programming, the pros and cons of both functional and imperative languages, the features of underlying operating systems, along with many other things.

Finally, if they want to be really good, they need a desire to go out and learn some of this stuff for themselves, because no course can teach it all. It's just a shame that they don't seem to teach the basics any more. And it's the fact that I typed that last statement that makes me realise that I'm getting old. :-)

1 comment:

Rob said...
This comment has been removed by a blog administrator.