I recently wrote about the impending centenary of applied computing; a time when we could reflect on the first hundred years to make it easier for people to progress beyond our position into the second hundred years. This necessitates looking at the things we’ve tried, the things that succeeded and the things that failed. It involves recalling and describing the good ideas and the bad ideas.
So, did the bad ideas fail and the good ideas succeed? Can we declare that because something worked, it must have been a success? Is length of service a great proxy for quality of principle?
Let’s start by looking at the lifetime of some of the trappings of applied computing. I’m writing this on the smartphone shown in the picture below. It is, among the many computers I own that claim to be computers and could reasonably be described as modern, one of only two that is not running a recent variant of a minicomputer game–loading system.
Now is that a fair assessment? Certainly all the Macs, iOSes, Androids (and even routers and television streamy box things) in the house are based on Unix, and Unix is the thing of the 1970s minicomputer. I’ve even used that idea to explain why we still have to deal with PDP-8 problems in iPhones. But is it fair to assume that because the name has lasted, then the idea has been preserved? Did Unix succeed, or has it been replaced by different things with the same name? That happens a lot; is today’s ethernet really the same ethernet that Bob Metcalfe and colleagues at PARC invented? Conversely, just because the name changed is everything new? Does Windows NT really represent a clean break in 1993?
There’s certainly some core, a kernel (f’nar) of the modern Unix that, whether in code or philosophy, can be traced back to the original system (and indeed beyond). But is that there because it’s still a good idea, or because there’s no impetus to remove it? Or even because it’s a bad idea, but removing it would be expensive?
As we’re already talking about Unix, let’s talk about C. In his talk Null References: The Billion-Dollar Mistake, Tony Hoare describes his own mistake as being the introduction of a null reference. He then says that C’s mistake (C follows Algol in having null references, but it also lacks have subscript bounds checking) is an order of magnitude worse. In fact, Hoare also identified a third problem: he says that it’s a good idea to permit a program failure to be diagnosed just from the error message and the high-level program source text. However, runtime failures in C usually end up with a core dump and/or a stack trace through the instructions of the target machine environment.
We can easily wonder just how much (expensive) programmer time has been lost disassembling stack traces, matching up debugger symbols and interpreting core dumps, but without figures for that I’ll generously assume that it’s an order of magnitude smaller than the losses due to buffer overflows. Now that’s only a tens-of-billions-of-dollars value of mistake, and C is the substrate for trillions of dollars of value of industry. So do we say that on balance, C is 99% a Good Thing™? Is it a bad idea that nonetheless enabled plenty of good ones?
[Incidentally, and without wanting to derail the central thesis of this post, I disagree with Hoare’s numbers. Symantec is merely one of the largest companies in the information security sector, with annual revenue in their most recent report of $6.9B. That’s a small part of the total value sunk into that sector, which I’ll guess has an annual magnitude of multiple tens of billions. A large fraction of the problems addressed by infosec can be attributed to C’s lack of bounds checking, so that there’s probably just an annual impact of around ten billion dollars working on fixing the problem. Assuming those businesses have sustainable revenues over multiple years, the integrated cost is well into the hundreds of billions. That only revises the estimated impact on the C software industry from ‘fractions of a per cent’ to ‘a per cent’ though.]
Perhaps it’s fair to say that C was a good idea when it arose, and that it’s since been found to have deficiencies that haven’t yet become expensive enough to warrant decommissioning it. There’s an assumption of rational action in there that I think it’s fair to question, though: am I assuming that C is not worth replacing just because it has not been replaced? Might there actually be other factors involved?
Yes, there might. It’s possible that there are organisations out there for whom C is more expensive than its worth, but where the sunk cost fallacy stops them from moving on. Or organisations who stick with C because their platform vendor gives them a C toolset, even where free or paid alternatives would be cheaper [in fact that would point to a difficulty with any holistic evaluation: that the cost to the people who provide development environments and the cost to the people who consume development environments depends on different factors, and the power in the market is biased towards a few large providers. Welcome to economics]. Or organisations who stick with C because of a perception of a large community of users, which is (perceived to be) more useful than striking out alone with better tools.
It’s also possible that moves in the other direction are based on non-rational factors: organisations that seek novelty rather than improvement, or who move away from C because a vendor convinces them that their alternative is better regardless of objective truth.
It turns out that the simple question we wanted to ask about applied computing: “What works?” leads to such a complex and maybe even chaotic system of forces acting in multiple dimensions that answering it will be very difficult. This doesn’t mean that an answer should not be sought, but that finding the answer will combine expertise from many different fields. Particularly, something that survives for a long time doesn’t necessarily work: it could just be that people are afraid of the alternatives, or haven’t really considered them.