Work: Back to Basic(s)

Jeffreys Copeland & Haemer

(Server/Workstation Expert, February 2000)



Unix ... is not so much a product as it is a painstakingly compiled oral history of the hacker subculture. It is our Gilgamesh epic.
                        --- Neal Stephenson, In the Beginning was the Command Line, 1999


We have been doing UNIX so long that we sometimes take it too much for granted.

This column is for people who have been so immersed in non-UNIX systems, that they haven't yet advanced to the 1970's.

Yes, we mean in systems like Windows and MVS.



Lottery Numbers

A couple of months ago, we got a note from a Romanian friend, who wanted to dip her toes into UNIX. She had spent most of her professional career working on DOS/Windows boxes, and was ready to try something new. She wrote, asking for advice.

A good place to start, we decided, was to attack a simple problem she'd already tried elsewhere, to get the flavor of the UNIX programming environment.

Why a simple problem? Kernighan and Ritchie explain, in the first section of the first chapter of The C Programming Language:


1.1 Getting Started

The only way to learn a new programming language is by writing programs in it. The first program to write is the same for all languages: Print the words
hello, world

This is the basic hurdle; to leap over it you have to be able to create the program text somewhere, compile it successfully, load it, run it, and find out where your output went. With these, mechanical details mastered, everything else is comparatively easy.


And it's not even as easy as they make it sound. Our friend recounts an early experience:

I remember when I first met Visual C++, I only wanted to make a simple random lottery-number generator. My first steps into the IT world were FORTRAN, COBOL, assembly, all of them on a mainframe. After many years spent far from the IT world, I came across this PC with this idiotical Windows system. I had to learn C, so I thought of this little program. OK, I thought of it, read a book of C (Ritchie and Kernighan) -- of course I didn't read it all, I tried to follow the steps. Alas! By no means could I have guessed where the C compiler was. I had to first make a project, then change all kind of settings that were "Chinese" to me, or better said I was "Chinese" to them also. In order to get that beast to write them lottery numbers on the screen (ON A BLACK SCREEN!!!!), I had to waste days on end, helpless and clueless, thinking I must be the last idiot on the planet. Finally I managed: using BorlandC++ 3.0 of course, because there was no rand() function in VC++.

This problem was well-defined, so it seemed worth trying on UNIX, to show her what a UNIX-y solution would look like.

Our first question was, of course, ``What's a lottery ticket look like?'' Romanian lottery tickets, it turns out, contain three picks, Each lottery pick is half a dozen, unique, random numbers from the set {1, ..., 49}, and each ticket requires three picks. Here's an example:

15 42 16 28 7 40
8 13 34 31 20 17
18 16 38 49 10 12
(We suspect that any single, matching pick is a winner, but since we're not going to actually buy any tickets, not having the Lei to spread around, we don't care if we're wrong about that.)



/bin/sh: UNIX's Integrated Development Environment

Here's what we really did, step-by-step:

Voila.

You need THREE sets? Sure. We just start the process again.


(Oh heck, we just looked back at the problem specification, and we only want 6 random numbers between 1 and 49, not 7. Whatever shall we do? We leave this as an exercise to even the least experienced of our readers.)


On-The-Fly Programming

We could do all this interactively, and nearly instantly, because the UNIX shell lets you recall and edit command lines. (Although POSIX only guarantees vi-like editing commands, the shells we've used also provide an emacs-like mode, in case you like that better.)

When this feature first became widely available, in the late 1980s, it quickly changed the way we interacted with the shell.

Our typical approach is now to do much of our shell-level programming on the fly. At each step, we recall and edit a previous command, mostly just appending a new filter to do something new to the data.

Because a good UNIX filter takes its input from stdin and writes to stdout, we often test what we write directly from the keyboard, and watch the results on the screen.

When we're finally satisfied, we capture what we've been doing, and turn it into an executable shell script.

(We use this same process in a lot of our perl programming.)

UNIX is full of tiny tools that filter and transform text. The sense we get when we're programming is often one of popping together little, existing tools, looking at the output, and then adding some new transformation to get to the next stage.


Production Code

But what if we wanted ``production'' code? The real answer is: we'd do the same thing. For a straightforward task like this, the most important factor to consider is development time. As Tom Christiansen says:

Q: What's the difference in speed between an application in Perl and an application in C++?

A: About three weeks. :-)

It is, however, worth enhancing our program a bit. Here's a production version that adds comments, does its own formatting, and takes an optional command line argument to specify how many picks to generate.

#!/bin/sh
# Romanian lottery program:
#    print lottery tickets with NPICK picks (default 3)
#    each pick is 6 random, non-repeated integers from 1..49
# $Id: lottery,v 1.6 1999/11/27 00:24:46 jsh Exp $

pick1() {
 RANDOM=$RANDOM          # reset the seed
 for i in $(seq 49)      # 49 random numbers
 do
  echo $RANDOM
 done |
 nl |               # index the numbers
 sort -n +1 |       # randomize indices by sorting random data
 head -6 |               # grab the first 6
 awk '{print $1}' |      # now throw away the sort key
 fmt                # and put all 6 on one line
}

case $# in
 0) N=3 ;;
 1) N=$1 ;;
 *) echo "usage: $0 [npicks]" 1>&2 ; exit 1 ;;
esac

for j in $(seq $N)
do
 pick1
done

exit

##############################################################

=head1 NAME

lottery - print "6 from 49" lottery picks

=head1 SYNOPSIS

B<lottery [npicks]>

=head1 DESCRIPTION

B<lottery> prints B<seq> lottery picks, one per line (default: 3)

Each pick is six, non-repeated integers out of 1..49.

=head1 SEE ALSO

sh(1)

=head1 AUTHORS

 Jeffrey L. Copeland <copeland@alumni.caltech.edu>
 Jeffrey S. Haemer <jsh@usenix.org>


One noteworthy feature of this code is the internal documentation. We have simply co-opted Perl's ``pod'' format, The various pod tools, which encourage you to keep documentation and code in the same file, so they'll stay in synch, produce a wide variety of documentation formats from a single, well-defined input format. They work for our shell script, too, because the exit statement, at the end of our executable code, prevents the shell from trying to interpret the documentation, while the various pod tools -- pod2html, pod2latex, pod2man, pod2text, pod2usage, and podselect -- will ignore everything before the first pod directive.


`A good FORTRAN programmer can write FORTRAN in any language'

If you've tried running this code, you may see that we've glossed over a step.

From the start, in our examples, we've used a utility called seq. This trivial, yet amazingly useful UNIX tool isn't found in the POSIX standard, or even on most UNIX distributions.

It should be clear, from the code above, what it does: seq just counts. We first saw seq used in Kernighan and Pike's The UNIX Programming Environment, but we now use it to so much, that we'd be lost without it.

In fact, we were, just the other day.

A few weeks ago, we were asked to help judge a practice ACM programming contest. While waiting for the first contestants' entries to be submitted, we did what we do compulsively: write code. We sat down with another of the judges, Dan Crawl, to explore how we might solve one of the programs in the shell (more on this anon), and immediately ran into a brick wall: the machine we were working on didn't have seq(1).

When you're hacking, getting there is all the fun. We shifted gears and wrote an seq.

Here's what we came up with.

#!/usr/bin/perl -w
# $Id: seq,v 1.3 1999/12/13 19:38:13 jeff Exp jeff $

use strict;

my $usage = "usage: $0 [start] end"
  unless @ARGV == 2;
unshift @ARGV, 1 if @ARGV == 1;
die $usage unless (@ARGV == 2);

foreach ($ARGV[0]..$ARGV[1]) {
 print "$_\n";
}

=head1 NAME

 seq - print a sequence

=head1 SYNOPSIS

 seq [start] end

=head1 DESCRIPTION

C<seq> generates sequential integers

By default, it starts counting at 1.
For example,

 $ seq 3 5
 3
 4
 5

=head1 BUGS

By little more than a lucky accident,
C<seq>
does odd, sometimes useful things
with non-integer arguments.
Not obvious that this is a bug.
Try this, for example:

 $ seq cat dog

=head1 AUTHORS

Just a re-implementation of something used
in Kernighan and Pike, but never written
in there.
Maybe it's part of Plan 9 or something.

 Dan Crawl <crawl@cs.colorado.edu>
 Jeffrey S. Haemer <jsh@usenix.org>
 Jeffrey L. Copeland
  <copeland@alumni.caltech.edu>

Here, too, we show our colors. For us, programming in UNIX feels like bolting together prefabricated parts from a brilliantly designed erector set. Many of our workaday tasks can be done in minutes with the pieces at hand. Occasionally, however, in the midst of our work, we discover we don't have something we need. Rather than go back to the drawing board to redesign a custom solution -- say, in Visual C++. -- we simply pause to manufacture a missing part, then pop it into place.

Occasionally, this approach pushes us over the edge. One of the problems in the programming contest was to count the atoms in a formula, such as tetraethyl lead, Pb(CH3CH2)4, and produce output something like this:

C: 8
H: 20
Pb: 1

Well. Our first reaction was to consider a pipeline with the following steps:

  1. Replace each distinct element with a distinct prime, turn the numbers into exponents, and everything else into multiplications, like this: 2*(3*5^3*3*5^2)^4
  2. Pipe the expression to bc for evaluation.
  3. Pipe the result to /usr/games/factor for prime factorization
  4. Reverse the transformation in step 1.

That's deeply warped.

When drunk enough on Green Chartreuse, Haemer will confess to having written an entire object-oriented language in the shell [Jeffrey S. Haemer, ``A New Object-oriented Programming Language: sh,'' Proceedings of the USENIX Summer 1994 Technical Conference, Boston, Massachusetts, USA.]


Two (or More) Can Play at This Game

Here's a chance to try your hand.

Last week, we got a note from another Windows-bound friend, Michael Mendelson, who said he, too, wanted to start playing with UNIX. When we sent him our Romanian friend's lottery story, he responded with a similar story of his own, culminating with a 154-line, C program that he'd written, early in his career, to randomize the lines of a file. In his note to us, he says


I'm sure you could write this one in half
the lines from the shell.  I'd like to
learn some of those tools!

His program reads in a file, shuffles the lines into a random order, overwrites the original file with the result, and announces the file has been randomized.

How small a shell solution can you come up with? Our first cut took about fifteen lines. Let us know how much better you can do. We'll tell Michael.

Lastly, we had a note from an observant reader who points out that the anonymous ``A picture is worth a thousand words,'' with which we began our November and December columns, is actually a quote from Fred R. Ballard in Printer's Ink, March 10, 1927. We're always pleased when our humble efforts reach the desks of the literate and well-read.

Happy trails!