Those who cannot remember the past are condemned to repeat it.
--- George Santayana
Those who do not understand Unix are condemned to reinvent it, poorly.
--- Henry Spencer
Hershey Heaven
This month, we take you to Hershey Heaven.
In the flurry of irrelevancy with which the press
inundates Nobel Prize winners, Al Hershey, the 1969 Nobel
Laureate in Physiology or Medicine, was asked what he thought
heaven would be like. He answered that he thought in heaven he'd
finally get an experiment that worked, and be able to do it over
and over again.
In 1976, Brian Kernighan and P.J. Plauger published
Software Tools, (ISBN: 0-201-03669-X) a book we recommend
to everyone. Here's the description supplied by the second
author, Peter Plauger:
Describes a number of small programs made popular by the UNIX operating system. Contains complete source code of all the programs in Ratfor, a structured dialect of Fortran that strongly resembles C. This classic pioneered the term ``software tools.''
This book, probably the clearest exposition of the Unix
tools philosophy, provides the source code for complete
implementations of a several Unix tools, together with running
commentary on every point of their design and implementation.
But why in Ratfor--something that ``strongly resembles
C'' and not C itself? The publication date for the first edition
of Kernighan and Ritchie's The C Programming Language is
1978. At the time, the two universally available programming
languages were COBOL and FORTRAN-66 (often as the implementation
FORTRAN IV). Kernighan and Plauger had a message to get out
about how to program, to an audience who had never heard of Unix
or C, and who -- at least the way it looked then -- never would.
As a transition vehicle, they created a language that
looked like C, but could be pre-processed into FORTRAN-66. (Most
of you have never seen FORTRAN-66, but it lacks nearly everything
you take for granted in programming languages: ``if-then-else,''
data structures, ``while'' loops, character I/O, even strings.
The primary control flow structures are the ``logical if'' and
the ``goto.'')
Here's an example of relatively easy-to-read FORTRAN-66,
from page 1 of the original Bell Labs Ratfor documentation:
IF (X.LE.100) GOTO 10 CALL ERROR(5HX>100) ERR = 1 RETURN 10 ...
The equivalent in Ratfor?
if (X>100) {
call error("x>100"); err = 1;
return }
The authors' idea was to use a language that would
permit easy-to-read examples, in a book that showed folks how to
write code that improved their programming environments.
To close the loop, the final chapter of Software
Tools designs and implements an entire Ratfor-to-FORTRAN
preprocessor.
The book quickly spurred the formation of the Software
Tools User Group. Formed at Lawrence Berkeley Labs, this group
began distributing tapes that contained all the code from the
book, together with an ever-growing body of contributed tools,
all of which could be installed on any computer with a FORTRAN
compiler (which meant, at the time, pretty much any computer): a
Unix-like environment, not just for non-Unix systems, but for a
world that had never heard of Unix or C.
If you want to see what sorts of things were done, you
can still find the Software Tools on the web. One site is http://www.geocities.com/SiliconValley/Lab/9247/#compilers.
An advantage of the Unix ``one tool, one job''
philosophy is that you can attack each command separately, and
one at a time. Most are bite-sized; individual contributors can
write a useful tool in a few days -- or less -- and make a real
contribution to the larger whole. Yoked to this is the idea that
when you write a little program, instead of a giant, monolithic
one, you can really make it yours; you can get your arms around
it, and put in the work you need to get it just right.
And the reason we can recommend a twenty-five-year-old
book, full of code for programs that you won't need to write, in
a language that you'll never use, is that it remains the
clearest, best written, most entertaining and practical treatise
we know on how to get programs just right.
(Do not, by the way, be fooled into buying Software
Tools in PASCAL, by the same authors. Kernighan's 1981
technical report, ``Why PASCAL Is Not My Favorite Programming
Language,'' http://cm.bell-labs.com/cm/cs/cstr/100.ps.gz, gives great insights into both why this book
didn't turn out the way it could have, and why PASCAL, once a hot
contender for the programming language of choice, eventually lost
out to C.)
It's hard to imagine, nowadays, just how revolutionary
the book's approach was. It changed lives.
Most folks reading this column have probably never even
seen a punched card or written a FORTRAN program. To help put
things in context, try to imagine working as a programmer in a
world in which neither you nor anyone you know has ever heard of
a filter or a ``software tool,'' and the only tools available to
you as a programmer are a compiler, an assembler, and a linker.
(And a world where nothing is off-the-shelf. We know someone who
began his career writing a payroll-system in FORTRAN for a movie
studio.)
One chapter of Software Tools contains the
complete design and implementation of an editor. Not a screen
editor, mind you. After all, no one had cursor-addressable terminals.
What happened to STUG? One finds occasional fossils of
STUG, such as Usenix's Software Tools User Group award,
http://www.usenix.org/directory/stug.html, but the group
died of its own success. People who joined STUG learned about
Unix, help popularize Unix and the Unix philosophy, eventually
demanded Unix, and switched to Unix when it became available.
In the mid-1980s, the Unix-tool-set story was replayed
more than once, to the great advantage of a new generation of
computer users. Mortice-Kern Systems (MKS), a Canadian company
rewrote the entire basic Unix command set, from scratch, for MS-DOS, and later ported the same suite to a variety of legacy
systems.
In the same time frame, the Free Software Foundation
coordinated the contribution of an army of volunteers, who
created freely-redistributable versions of nearly all common Unix
tools, which eventually made up the bulk of the command-line
utilities for Linux.
A third great source of rewritten Unix tools are the BSD
releases, coordinated by The University of California at
Berkeley's Computer Science Research Group, and found in a wide
variety of freely-available BSD-based Unixes. As with STUG and
the FSF, CSRG's work was the coordinated effort of an unruly army
of individual volunteers.
Everyone eventually caught on to the software toolbox
approach.
Tom Christiansen Becomes
Irked
Well, everyone except the Windows world.
Those of you who read the comp.lang.perl.misc
newsgroup know that Tom Christiansen, co-author of many O'Reilly
perl books, is a frequent contributor.
When a question irks Tom, he speaks right up, often
chastising the questioner. Sometimes people don't like this, so
Tom is occasionally a source of discussion on the newsgroup in
his own right.
Like grains of sand in an oyster, though, these
irritants sometimes spur Tom to create something beautiful. (In
the distance, we hear groaning; to quote Jo Haemer, "A cheap shot
is a terrible thing to waste.")
Spurred on by irksome questions, Tom has written Perl
man pages, Perl FAQs, Perl tools, and even a series of Perl
FMTYEWTK (Far More Than You Ever Wanted To Know) essays.
Several months ago, Tom was going through a stretch of
irritation at people asking for complete Perl solutions to
problems that could be solved with simple calls to basic Unix
utilities. Too often, his pointing this out didn't help the
requester, who would turn out to be running on some Microsoft
platform that didn't have the basic utilities to solve the
problem.
For a while, Tom's reaction was to declare that such
lacunae were God's wrath visited upon anyone sinful enough to run
something other than Unix, and that we in the Unix community had
no obligation to help.
Those of you who've been reading the newsgroup, or this
column, for some time will even remember a parody posting, by Nat
Torkington -- Tom's co-author for The Perl Cookbook, ISBN
1-56592-243-3
Tim.Bunce@ig.co.uk (Tim Bunce) writes: > The problem is to find the full list of > names and the original order. You INSTALL a FULL SET OF TOOLS, like THE LORD GOD ALMIGHTY intended. REPENT, ye PRISONER of BILL! The DAY of JUDGEMENT is AT PERL! Your MESSENGERS are obviously just POOR substitutes for RELIABLE PIPE COMMUNICATION which you'd have if you had a REAL OPERATING SYSTEM and not a SCURRILOUS PIECE OF TOOL-CHALLENGED COPROPHILIA! Tom^WNat :-)
This ultimately led us to write a Perl version of
tsort(1) (see http://swexpert.com/C9/SE.C9.SEP.98.pdf
and http://swexpert.com/C9/SE.C9.OCT.98.pdf)
More recently, though, Tom seems to have decided that
the problem wasn't going to go away, and has organized a project
to rewrite all the basic Unix utilities in Perl, so that any
system with Perl can have the full, basic Unix command set for
free.
Note the word ``organized.'' Tom has written some
utilities himself, but what he's really doing is coordinating
contributions from all over the Perl world. Tom calls it the
``Perl Power Tools'' project.
We prefer ``Software Ptools'': the `P' is psilent.
An example:
asa(1)
This looked like enormous fun, and we jumped in with
both feet.
Both Jeffs have worked in the printer industry, so we
decided we'd chip in by contributing a traditional Unix utility
that no one else would be silly enough to write: asa(1), a
program that interprets traditional FORTRAN carriage-control
commands.
Here, a little history will help. A couple of decades
ago, printers were all impact line printers that produced great
stacks of accordion-folded, green-and-white-lined, 14-inch-wide
paper, which was actually 15 inches wide if you counted the tear-off strips on the sides, which were perforated so teeth on the
printer carriage could advance the paper.
These printers were simple; most couldn't even do
graphics. (Graphics output was provided by another kind of
printing device, called a ``plotter.'') Indeed, in addition to
printing alphabetic characters, almost the only thing they could
do was to move to the top of a new page, to backspace (in order
to underline) and to overprint lines, (to produce bold
characters).
But at this point in history, not even character sets
were portable -- non-IBM machines often used ASCII, but IBM
machines, which were in the majority, used EBCDIC -- which meant
that your programs couldn't assume that backspace was a ^H, form-feed was a ^F, or carriage return was a ^M.
A convention was born: all printers agreed to look at
the first character of each output line, and interpret a small
number of special characters as special ``carriage-control''
commands. For example, a `1' in the first column of an output
line told the printer to eject the current page and move to the
top of a new page.
These conventions were made a part of the American
Standards Association FORTRAN standard. (You read that
correctly: printer carriage controls were part of a programming
language standard.) ASA was later renamed ANSI.
In the C/Unix world, there are no such conventions.
Moreover, neither contemporary terminals nor newer printers, such
as laser printers, interpret output in this way. Old FORTRAN
programs, ported from other operating systems, began finding
themselves assuming these conventions on systems that didn't
recognize them.
To handle this, early Unix systems included a program called asa(1), which translated FORTRAN carriage-controls. Here is our implementation of asa(1).
1 #!/usr/local/bin/perl -w
2 # $Id: asa,v 1.1 1999/05/31 22:03:15 jsh Exp jsh $
3 use strict;
4 exit 1 if grep {!-r} @ARGV;# traditional
5 if (grep /-/, @ARGV) {
6 $0 =~ s(.*/)();
7 warn "usage: $0 [filename ...]\n";
8 exit 2;# traditional
9 }
10 while (<>) {
11 chomp;
12 s/^$/ /;
13 s/^[^10+-]/\n/;
14 s/^1/\f/;
15 s/^\+/\r/;
16 s/^0/\n\n/;
17 s/^-/\n\n\n/;
18 print
19 or exit 1;# traditional
20 }
21 =head1 NAME
22 asa - interpret ASA/FORTRAN carriage-controls
23 =head1 SYNOPSIS
24 asa [I<filename> ...]
25 =head1 DESCRIPTION
26 =over 2
27 Traditional FORTRAN programs put carriage-control characters
28 in the first columns of their output,
29 which were interpreted by older lineprinters
30 according to the ASA vertical format control standard.
31 (ASA was the American Standards Association -- now ANSI.)
32 Under this standard, the first character of each printable record (line)
33 determines vertical spacing, as follows:
34 =over 2
35 I<blank> carriage return
36 0 two carriage returns
37 1 Formfeed
38 + overprint
39 - three carriage returns (IBM extension)
40 =back
41 All other characters are discarded, and empty lines behave as though
42 they have a leading blank.
43 B<asa> interprets these characters.
44 =back
45 =head1 EXIT VALUES
46 =over 2
47 0 normal exit
48 1 inability to write on stdout or to read an input file
49 2 bad argument
50 Exit status values chosen from MKS toolkit.
51 =back
52 =head1 AUTHOR
53 Jeffrey S. Haemer
54 =head1 BUGS
55 Currently, B<asa> just looks at the readability of its input files
56 at startup time. It should really do it a file at a time,
57 but that makes the code look gross.
58 The carriage-control '-' is an IBM extension.
59 Perhaps the default should ignore it
60 and there should be a '-i' option to interpret it.
61 =head1 SEE ALSO
62 I<Communications of the ACM>, Vol 7, No. 10,
63 p. 606, October 1964.
64 NWG/RFC 189, Appendix C
65 =cut
And now, for our dramatic reading: The meat of the
program is 10 lines of code: lines 10-20. Everything else is
professionalism.
Lines 1-3 are our usual boilerplate. The shebang line
(1) invokes the Perl interpreter, and gives it the -w
flag, which questions a variety of questionable usages. The
third line requires the still more picky strict pragma.
As long as we're going to write a utility, we might as well catch
as many silly errors as we can.
The second line says we're keeping our code under
revision control.
Lines 4-9 do argument parsing. The comment
``traditional'' means that it's traditional for this command to
exit with an exit status of 2 if the arguments are mis-specified.
Lines 21-65 are documentation. Perl lets you keep your
documentation in the same file as your code, so they don't get
out of synch.
The meat of the program is the loop begun on line 10 and
finished on line 20, which reads and prints the file a line at a
time.
Carriage-control is specified entirely by the first
character in the line, so line 11 begins by removing any special,
ASCII carriage controls at the ends of the lines. The printer
will never see them.
The standard says that a line beginning with anything
except one of the special ASA carriage-control characters
should trigger a new line and a carriage return, and lines 12 and
13 give us that. Line 12 prints blank lines as blank lines.
Line 13 consumes any other character that begins a line, and
performs the default action: terminating the preceding line.
(Yes, that's right -- if you want a character at the beginning of
a line, you have to precede it with something else. The first
character is always interpreted as carriage control.)
Lines 14 through 17 interpret the ASA, beginning-of-line carriage-control codes:
There it is. A ten-line program. Just an oddly shaped
brick in the ziggurat of free Unix tools for the non-Unix world.
Want to chip in? It's fun. Go to http://language.perl.com/ppt/
and take a look at what's done and what's not.
Make the world a better place by spending a few hours in
Hershey Heaven.
Until next time, happy trails.