Seven Original Sins of K&R

by Philip J. Erdelsky

pje@acm.org

September 22, 1990

The creation of C approximately two decades ago was a wondrous event, even if it did not seem so at the time. Like all human creations, C was imperfect. I have identified seven Original Sins--minor flaws in C for which K&R will eventually have to answer, in this world or the next. I call them original sins because they were present when C originated, not because K&R were the first to commit them. Some of these sins have been purged from later versions of C, but others remain with us.

I am not the first to decry these sins, nor will I be the last. I am merely another in a long series of prophets crying in the wilderness.

I

The First Original Sin was pitifully weak typing. There is no Boolean type in C, so generations of programmers have erroneously written something like "if (x=5)" instead of "if (x==5)", only to wonder why x always seems to be 5, regardless of what has gone before. The "char" type was not specified as either signed or unsigned. This sin has probably wasted more CPU time than any other, as savvy programmers learn to put a defensive "&0xFF" after every "char" expression that needs to be unsigned. The default type for functions should have been "void", not "int", but there was originally no "void" type.

Modern compilers have provided partial redemption from this sin, usually by issuing warning messages when the program appears to be tainted. But these warnings are often false alarms and go unheeded. There is still no Boolean type, and "char" may be either signed or unsigned. Even the new enumeration types are merely integers in disguise, just as willing to be mixed as matched.

II

The Second Original Sin was the failure to make "NULL" a keyword. Beginning C programmers wonder why you have to "#include <stdio.h>" in a program that doesn't use standard I/O. Some compilers don't even object when you assign an integer constant to a pointer without a typecast, especially when the constant happens to be zero. Don't blame the compiler. The poor thing can't tell the difference between a zero integer constant and "NULL".

Redemption from this sin is on its way. Modern compilers define "NULL" as "(void *) 0", so there's at least some hope of distinguishing it from a plain old zero.

III

The Third Original Sin was the use of the keyword "static" to mark a function or variable as local to particular source file. This is really a trinity of sins. The word "static" doesn't mean local. It conflicts with the other use of the word "static"--to mark a variable inside a function as one that actually is static, in an accepted meaning of the word. Finally, even if the word "local" had been used instead, it would have been marking the wrong thing. The word "public", or some similar word, should have been used to mark the few functions and variables that must be made available to the code in other files. Other functions and variables should have been local by default. That's how it's done in assembly language and other high-level languages, and the reason for it is obvious.

From this sin, however, no redemption is in sight.

IV

The Fourth Original Sin is the mandatory use of the "break" keyword to terminate a "case" clause in a "switch" statement. Omitting it is natural for beginning programmers, and sometimes even for experienced programmers who have been dabbling in more tightly structured languages. Of course, this causes control to fall through to the next case, which is occasionally useful but nearly always a mistake, like a double exposure in photography. But the evil goes even further. Often, the "switch" statement is enclosed in a "for" or "while" loop. You want to finish up a "case" clause by breaking out of the loop? You can't do it in C, not without breaking out of the "switch" statement first!

The solution, not likely to be adopted even in C+++, would be to have the compiler put an implicit "break" at the end of every "case" clause, and reserve the "break" keyword for breaking out of loops, the way God intended.

V

The Fifth Original Sin was the way functions are defined. The entire parameter list has to be written twice. That's something no programmer should have to do unless it's absolutely necessary. And to compound the evil, an untyped parameter defaults to type "int". Most programmers have written something like "strcmp(s,t)", forgetting the declaration "char *s,*t;". What you wind up with in most cases is, not a function that fails, but something worse--a function that works as long as pointers and integers are the same size, and then fails when you try to port it. Fortunately, ANSI C permits prototype definitions, but the old way is still permitted, at least during a transitional period. Let's hope the transition is brief.

VI

The Sixth Original Sin was the way conflicts among the names of members of different structures were neither forbidden nor resolved. The original K&R said that different structures could have members with identical names as long as they had identical offsets. The way early compilers implemented this dictum varied. Some compilers would check to see that the offsets were indeed identical. Others simply generated erroneous code when they weren't. Most programmers took the safest course by including the structure name--usually abbreviated--in every member name.

Modern compilers have atoned for this sin completely by keeping a separate member list for each structure type. This resolves the conflicts, but a reminder of past iniquities persists in the awkward names of structure members in UNIX source code and other old C scriptures.

VII

The Seventh Original Sin was the eight-character limit on distinguishable names, or even fewer than eight for externally defined names. Of course, some such limitation was required for efficient implementation, but eight characters are not enough. C was much better than Fortran, which allowed only six, but there are many pairs of English words with distinct meanings whose first eight letters are identical. The minimum number depends on the language, but for English about 20 should be sufficient. German programmers need more.

Most modern compilers do have a reasonable limit, but some compiler developers have apparently forgotten that virtue lies in moderation. One compiler allows at least several hundred characters, maybe more. That's too long. Compilers are supposed to compile, not test the limits of computability by allowing single labels to occupy practically the entire computer memory (and disk swap area). An unprintable name--one that won't fit on a single line--should also be uncompilable.

Epilogue

None of these sins is inconsistent with the philosophy of C. We needn't embrace heresies like Pascal, Modula 2 or Ada. But we must abandon the false god of 100% upward compatibility. We must tear down the old temple to build a new one. Then, and only then, will our redemption be at hand.

Note

This jeremiad is not copyrighted. You are welcome to copy it and pass it on. I only ask you to leave my name and account number on it. Let me take the credit--and the heat.