We will encourage you to develop the three great virtues of a programmer: laziness, impatience, and hubris.
--- Larry Wall
Work is the curse of the drinking classes.
--- Oscar Wilde
If you hate doing unnecessary work, then you should read
this month's column. We'll give you some tips on sloth. On the
other hand, if you like doing unnecessary work, then you should
read this month's column because you'll ignore our advice, and
therefore, reading it would be unnecessary work.
Bertrand Russell would be proud of us.
The quotation from Larry Wall that introduces this column is from Programming Perl, his book about a language designed to save us all a lot of work. Some programmers miss this point. Given an assignment by management to write a Perl script to format and print a bunch of files, one of our co-workers produced this:
#!/usr/bin/perl
# Allow files to be written over
system("unset noclobber");
# Setup some environment variables
$ENV{"PRINTER"} = "Viper";
# Create the files
system("groff file1 > /tmp/file1.ps");
system("groff file2 > /tmp/file2.ps");
system("groff file3 > /tmp/file3.ps");
system("groff file4 > /tmp/file4.ps");
system("groff file5 > /tmp/file5.ps");
system("groff file6 > /tmp/file6.ps");
system("groff file7 > /tmp/file7.ps");
# Print the files
system("lpr /tmp/file1.ps");
system("lpr /tmp/file2.ps");
system("lpr /tmp/file3.ps");
system("lpr /tmp/file4.ps");
system("lpr /tmp/file5.ps");
system("lpr /tmp/file6.ps");
system("lpr /tmp/file7.ps");
# Cleanup
system("/bin/rm /tmp/file1.ps");
system("/bin/rm /tmp/file2.ps");
system("/bin/rm /tmp/file3.ps");
system("/bin/rm /tmp/file4.ps");
system("/bin/rm /tmp/file5.ps");
system("/bin/rm /tmp/file6.ps");
system("/bin/rm /tmp/file7.ps");
What's wrong with this picture?
First, while Perl eschews many sacred Computer-Science
cattle -- nowhere, for example, can you find a Backus-Naur form
grammar for Perl -- loops and subroutines are both venerable
control structures that even Perl provides. All those nearly
identical system calls could have been replaced with this:
sub one_file {
$file = shift;
system("groff $file > /tmp/$file.ps");
system("lpr /tmp/$file.ps");
system("/bin/rm /tmp/$file.ps");
}
foreach $file (file1 file2 file3 file4 file5 file6 file7) {
one_file($file);
}
Indeed, though Perl lacks grammatical simplicity, a lot
of us use it because it has power and elegance of expression.
For example, many Perl programmers might at least pare the code
down to this:
sub one_file {
$file = shift;
system "groff $file > /tmp/$file.ps";
system "lpr /tmp/$file.ps";
system "/bin/rm /tmp/$file.ps";
}
foreach (1..7) {
one_file "file$_";
}
And how about making the program more flexible by
replacing the hard-wired list of files by a list supplied on the
command line:
while (@ARGV) {
one_file shift;
}
We Roll Up Our Sleeves to Do
Even Less Work
Not enough, not enough. Each of those system()
calls forks a sub-shell. We could cut it down from 21 sub-shells
(7x3) to seven by making each sub-shell perform more than one
action.
sub one_file {
$file = shift;
system "
groff $file > /tmp/$file.ps &&
lpr /tmp/$file.ps
";
die "one_file($file) failed" if $? != 0;
unlink "/tmp/$file.ps" or die "can't unlink /tmp/$file.ps";
}
while (@ARGV) {
one_file shift;
}
Here, each call to one_file() creates a single
sub-shell, which performs both the groff and the
lpr for the source file to be printed.
We've also done a couple of pieces of noteworthy sanity
checking:
First, the double ampersand, &&. The sequence
means ``do command_1; if that succeeds, then do command_2''$ command_1 && command_2
This idiom can be a surprisingly useful safeguard. We recently helped a co-worker restore his home directory from tape after he did the following in his home directory:
We recommend typing this instead:$ cd bogus_directory_name; rm -rf * bogus_directory_name not found
which only does the rm -rf if the cd command succeeds.$ cd bogus_directory_name && rm -rf *
Thus, in our re-written script, we only try to print the
file when our invocation of groff succeeds.
We've also taken the reasonable precaution of checking
the exit status of the system() call, which is held in
the predefined Perl variable, $?.
Moreover, instead of calling the shell-level rm
command, we use the Perl function, unlink().
As long as we're trying to be careful, we ought to turn
into#!/usr/bin/perl
#!/usr/bin/perl -w
But when we do so, we get a sharp reprimand:
Can't exec "unset": No such file or directory at 1 line 4.
Here, what we're seeing is that unset is not an external executable command, but a shell builtin.
$ type unset unset is a shell builtin
Just as a cd in one shell doesn't effect the
current working directory of another, unrelated shell,
set and unset change the values of shell
variables for a particular shell (and, potentially, its
children).
Perl is warning you that
system("unset noclobber");
is useless, because it only effects the value of
noclobber for the subshell invoked by that specific call
to system(); it has no effect on the subshells invoked
by any of the other system() calls.
Each system() call executes a miniature shell
script; the original program is nothing more than a shell script
broken into 23 little, one-line shell scripts, sequentially
invoked by the Perl interpreter.
Even experienced programmers occasionally forget that rules in a Makefile have this same property. The production
will change directories to /tmp in one subshell and then, in a separate subshell, remove all the files in your current directory. The correct way to write this production isclean_tmp: cd /tmp rm *
clean_tmp: cd /tmp; rm *
Lazier and lazier
So this brings us to the crux of the matter: why fork
all of these subshells?
Here's the more-or-less-equivalent shell script
#!/bin/sh # Allow files to be written over unset noclobber; # Setup some environment variables PRINTER=Viper; # Create the files groff file1 > /tmp/file1.ps groff file2 > /tmp/file2.ps groff file3 > /tmp/file3.ps groff file4 > /tmp/file4.ps groff file5 > /tmp/file5.ps groff file6 > /tmp/file6.ps groff file7 > /tmp/file7.ps # Print the files lpr /tmp/file1.ps lpr /tmp/file2.ps lpr /tmp/file3.ps lpr /tmp/file4.ps lpr /tmp/file5.ps lpr /tmp/file6.ps lpr /tmp/file7.ps # Cleanup /bin/rm /tmp/file1.ps /bin/rm /tmp/file2.ps /bin/rm /tmp/file3.ps /bin/rm /tmp/file4.ps /bin/rm /tmp/file5.ps /bin/rm /tmp/file6.ps /bin/rm /tmp/file7.ps
If nothing else, it's a third less typing (639
characters instead of 900). But it isn't just ``nothing else.''
If we do timing tests, this version runs about 12% faster.
Of course, just as we did with the first script above, we can make the program more general and shorter by adding loops and subroutines.
#!/bin/sh
unset noclobber
PRINTER=Viper
one_file() {
groff $1 > /tmp/$1.ps
/usr/bin/lpr /tmp/$1.ps
/bin/rm /tmp/$1.ps
}
for i in file*
do
one_file $i;
done
Oh, but that's still far too much work. Since lpr and rm will both take more than one filename, we can drop their invocations to a single call apiece, like this:
We eliminate the extra variable assignment by using the -P argument to lpr,#!/bin/sh set noclobber for i in file* do groff $i > $i.ps done /usr/bin/lpr -PViper /tmp/file*.ps /bin/rm /tmp/file*.ps
We're down to one invocation of the shell, one of
lpr, one of rm, and a handful of
groffs.
Still, the restriction on filenames means we might have
to rename our input files to suit the script before invoking it.
Moreover, in all these versions we've been assuming that the
files are in the current directory, and that the only files named
file*.ps in /tmp are the ones we've put there.
We fix these problems by again pulling the filenames
from the command line. and by constructing our own temporary
directory for output, removing it when we're done.
In the end, we arrive at this 12-line script
1: #!/bin/sh
2: PRINTDIR=/tmp/$$.dir
3: mkdir $PRINTDIR || exit 1
4:
5: for i in $*
6: do
7: OUTFILE=$PRINTDIR/${i##*/}.ps
8: groff $i > $OUTFILE || rm -f $OUTFILE
9: done
10: cd $PRINTDIR
11: lpr -P${PRINTER:-Viper} *
12: rm -r $PRINTDIR
Here's what it means:
Line 1 makes Unix pass the script to the correct shell,
even if it isn't the user's login shell.
Lines 2 and 3 create directory for our temporary files,
using the process id, $$, to generate a unique name.
Lines 5 through 9 are a loop that formats all the input
files named on the command line. Line 7 strips any directory
information from the input filename, in case it was specified
with an absolute path, so that if the filename is
$HOME/project/printfiles/foo then $OUTFILE
becomes $PRINTDIR/foo.ps
In older shells, this sort of manipulation was done with
the stand-alone executable /usr/bin/basename, but POSIX-conforming shells let users extract substrings of shell variables
with a built-in facility called parameter expansion.
In this case, the expression ${i##*/} means
``Give me the value of $i stripped of the longest
possible prefix from that matches the glob (shell) expression
*/'' -- in other words, the filename, but with any
directory name stripped off.
Line 8 formats an input file, taking care to remove any
unsuccessful attempts.
Lines 10 and 11 print all successful formatting efforts.
In line 11, we see another example of parameter expansion.
The expression ${PRINTER:-Viper} means ``use the value
of $PRINTER if it's set in the environment (after all,
you might, someday, want to print to another printer). If it
isn't set, use the default value "Viper" as a default.''
(You can find a lot more information about other kinds
of parameter expansion in the shell manual page.)
Line 12 removes all of our temporary files by removing
the whole, temporary directory. (You may wonder whether you can
remove a directory while you're still in it. You can, and since
we're completely done, we do.)
In a more robust version, we might use a signal handler
to remove the temporary directory if the script terminates
prematurely, but that seems like overkill, here.
Finally, since the call to set noclobber seemed
questionable to us in the first place, we've eliminated the call
altogether.
Okay, where are we? We've gone from a 34-line Perl
script to a 12-line shell script that's both more robust and more
general. At the same time, we've sped it up by reducing an
invocation of the Perl compiler/interpreter plus 22 separate
invocations of the shell, to a single invocation of the shell.
Unfortunately, all of this presupposes that we can go to
our management and say, ``Please don't micro-manage me, by
forcing me to use Perl for a job better done with a simple shell
script.''
If you can't do that, we suggest an easier, alternative approach.
#!/usr/bin/perl -w $real = "/usr/local/real_code"; $0 =~ s(.*/)(); # basename system "$real/$0 @ARGV"; # call real version
And now we rest.
That's all for this month. Next time, we'll eschew sloth and
take as the text for our sermon a sentence from the introduction
to The Art of Computer Programming, by Donald Knuth: ``The
process of preparing programs for a digital computer is
especially attractive, not only because it can be economically
and scientifically rewarding, but also because it can be an
aesthetic experience much like composing poetry or music.''
Until then, happy trails.