### Floating Point Numbers You can think of floating point numbe...
Completed in 1951, the *Whirlwind I* was a Cold War-era vacuum tube...
The **Newton-Ralphson** method is an iterative root-finding algorit...
There are a few reasons why the **Newton-Ralphson** method can fail...
This trend mostly continued until today: ![](https://i.imgur.com/f...
Here is a photo of Corbató himself with MIT's IBM 7090 in 1961. ![...
### Why are rocket trajectories intrinsically serial? Usually, p...
To give you a sense of the price of a mainframe back in the 60s, th...
[Reminiscences on the history of time sharing](https://web.archive....
The [NBS DYSEAC](https://bit.ly/2lFxSa4 ) was the first computer to...
To Fernando J. Corbato
for his work in organizing the
concepts and leading the development
of the general-purpose large-scale
time-sharing and resource-sharing
computer systems CTSS and MULTICS
" i
t is an honor and a pleasure to
accept the Alan Turing
Award. My own work has
been on computer systems,
and that will be my theme.
The essence of systems is that
they are integrating efforts, requir-
ing broad knowledge of the prob-
lem area to be addressed, and the
detailed knowledge required is
rarely held by one person. Thus the
work of systems is usually done by
teams. Hence I am accepting this
award on behalf of the many with
whom I have worked as much as for
myself. It is not practical to name all
the individuals who contributed.
Nevertheless, I would like to give
special mention to Marjorie Dag-
gett and Bob Daley for their parts
in the birth of CTSS and to Bob
Fano and the late Ted Glaser for
their critical contributions to the
development of the Multics System.
Let me turn now to the title of
this talk: "On Building Systems
That Will Fail." Of course the title I
chose was a teaser. I considered and
discarded some alternate titles: "On
Building Messy Systems," but it
seemed too frivolous and suggests
there is no systematic approach.
"On Mastering System Complexity"
sounded like I have all the answers.
The title that came closest, "On
Building Systems that are likely to
have Failures" did not have the
nuance of inevitability that I
wanted to suggest.
What I am really trying to ad-
dress is the class of systems that for
want of a better phrase, I will call
"ambitious systems." It almost goes
without saying that ambitious sys-
tems never quite work as expected.
Things usually go wrong--
sometimes in dramatic ways. And
this leads me to my main thesis,
namely, that the question to ask
when designing such systems is not:
"/f something will go wrong, but
it will go wrong?"
Some Examples
Now, ambitious systems that fail are
really much more common than we
may realize. In fact in some circum-
stances we strive for them, revelling
in the excitement of the unex-
pected. For example, let me remind
you of our national sport of foot-
ball. The whole object of the game
is for each team to play at the limit
of its abilities. Besides the sheer
physical skill required, one has the
strategic intricacies, the ability to
audibilize, and the quickness to
react to the unexpected--all a deep
part of the game. Of course, occa-
sionally one team approaches per-
fection, all the plays work, and the
game becomes dull.
Another example of a system
that is too ambitious for perfection
is military warfare. The same ele-
ments are there with opposing sides
having to constantly improvise and
deal with the unexpected. In fact
we get from the military that won-
derful acronym, SNAFU, which is
politely translated as "situation nor-
mal, all fouled up." And if any of
you are still doubtful, consider how
rapidly the phrases "precision
bombing" and "surgical strikes" are
replaced by "the fog of war" and
"casualties from friendly fire" as
soon as hostilities begin.
On a somewhat more whimsical
note, let me offer driving in Boston
as an example of systems that
fail. Automobile traffic is an excel-
lent case of distributed control with
a common set of protocols called
traffic regulations. The Boston area
is notorious for the free interpreta-
tions drivers make of these pesky
regulations, and perhaps the epit-
ome of it occurs in the arena of the
traffic rotary. A case can be made
for rotaries. They are efficient.
There is no need to wait for slug-
gish traffic signals. They are direct.
And they offer great opportunities
for creative improvisation, thereby
adding zest to the sport of driving.
One of the most effective strate-
gies is for a driver approaching a
rotary to rigidly fix his or her head,
staring forward, of course, secretly
using peripheral vision to the limit.
It is even more effective if the
driver on entering the rotary,
speeds up, and some drivers embel-
lish this last step by adopting a look
of maniacal glee. The effect is, of
course, one of intimidation, and a
pecking order quickly develops.
The only reason there are not
more accidents is that most drivers
have a second component to the
strategy, namely, they assume
everyone else may be crazy--they
are often correct--and every driver
is really prepared to stop with
inches to spare. Again we see an
example of a system where ambi-
tious tactics and prudent caution
lead to an effective solution.
So far, the examples I have given
may suggest that failures of ambi-
tious systems come from the human
element and that at least the techni-
cal parts of the system can be built
correctly. In particular, turning to
computer systems, it is only a mat-
ter of getting the code debugged.
Some assume rigorous testing will
do the job. Some put their hopes in
proving program correctness. But
unfortunately, there are many cases
for which none of these techniques
will always work [1]. Let me offer a
modest example illustrated in Fig-
ure 1.
Consider the case of an elaborate
numerical calculation with a vari-
able, f, representing some physical
value, being calculated for a set of
points over a range of a parameter,
t. Now the property of physical
variables is that they normally do
not exhibit abrupt changes or dis-
So what has happened here? If
we look at the expression for f, we
see it is the result of a constant, k,
added to the product of two other
functions, g and h. Looking further,
we see that the function g has a be-
havior that is exponentially increas-
ing with t. The function h, on the
other hand, is exponentially de-
creasing with t. The resultant prod-
uct of g and h is almost constant
with increasing t until an abrupt
jump occurs and the curve for f
goes flat.
What has gone wrong? The an-
swer is that there has been floating-
point underflow at the critical point
in the curve, i.e., the representation
of the negative exponent has ex-
ceeded the field size in the floating-
1991/Vol.34, No.9 7
A Subtle Bug
f(t)=k+ g(t).h(t)
t -.--I1,,.-
•.. Why Mishaps?
1950 1970 1990
point representation for this partic-
ular computer, and the hardware
has automatically set the value for
the function h to zero. Often this is
reasonable since small numbers are
correctly approximated by zero--
but not in this case, where our re-
sults are grossly wrong. Worse yet,
since the computation off might be
internal, it is easy to imagine that
the failure shown here would not
be noticed.
Because correctly handling the
pathology that this example repre-
sents is an extra engineering
bother, it should not be surprising
that the problem of underflow is
frequently ignored. But the larger
lesson to be learned from this ex-
ample is that subtle mistakes are
very difficult to avoid and to some
extent are inevitable.
I encountered my next example
when I was a graduate student pro-
gramming on the pioneering
Whirlwind computer. One night
while awaiting my turn to use it, the
graduate student before me began
complaining of how "tough" some
of his calculations were. He said he
was computing the vibrational fre-
quencies of a particular wing struc-
ture for a series of cases. In fact, his
equations were cubics, and he was
using the iterative Newton-Raph-
son method. For reasons he did not
understand, his method was find-
ing one of the roots, but not "con-
verging" for the others. He was try-
ing to fix this situation by changing
his program so that when he en-
countered one of these tough roots,
the program would abandon the
iteration after a fixed number of
Now there were several things
wrong: First, the coefficients to his
cubic equations were based on ex-
Debugging the Code
Nonconverglng Iteratlve Method
Caused by Poor Root Value
Performance of a Top-of-the-Line
Computer by Decade
September 1991/Vol.34, No.9/COMMUNIOATIONS OF THE AGM