I beheld the wretch -- the miserable monster whom I created.
--- Mary Wollestonecraft Shelly, Frankenstein
How much easier it is to be critical than to be correct.
--- Benjamin Disraeli
Ah, May. We can't help but think of the late Bill
Rotsler's cartoon cat sitting in the window distracted by a
butterfly above the caption ``if cats had a longer attention
span, they could rule the world.'' Just so we don't compete with
the short attention span engendered by spring fever, we'll be
covering a set of topics we've had kicking around in the attic
for a while, none of which are enough to fill a complete column.
Thus, we present you with a Franken-column.
But first, we found your reaction to our February column
educational. (See ``Differences Among Women,'' SunExpert,
page 38, or http://swexpert.com/C9/SE.C9.FEB.99.pdf.)
Differences among
correspondents.
Sometimes, life imitates that
simple harmonic motion experiment from freshman physics. When we
wrote our November column on technology and reading, we were
surprised that the first two notes we had about it were both from
women. We used this as a jumping-off point for our February
column. (As you know, there's sufficient publication offset that
our observations and counter-observations occur in waves with a
period of three months.)
The level of reader interest in the February column was
higher than we'd anticipated. We seem to have struck a nerve --
or a pair of nerves, as it turns out.
One reader, Pete Kernan, now has a web page about these
four-tuples, http://theory2.phys.cwru.edu/~pete/sequence.html. There is also a related
entry, A045794, in the ``On-Line Encyclopedia of Integer
Sequences,'' http://www.research.att.com/~njas/sequences/index.html
(look for ``Haemer,'' ``Copeland,'' or ``1 1 1 3 3 4 9'').
We promised to report on the sex ratio of the responses
to our column, and here it is: within a month, we got 61 pieces
of email from 34 unsolicited readers. Of these, nine respondants
were women, (including Ann Janssen, one of our correspondents on
the November column), and 25 were men. The correspondents even
included the husband-and-wife pair of Shelly Shumway and Arthur
Smith. One (male) reader, Sal Mamone, sent us a pointer to some
statistics he'd gathered about sex differences among his computer
science students. (See ``Empirical Study of Motivation in a
Entry Level Programming Course,'' ACM SIGPLAN Notices,
March 1992.) We aren't sure Sal's statistics completely apply,
since he was teaching COBOL and we think that puts an entirely
different skew into the results, but they're interesting
nonetheless.
All the responses were interesting and gratifying, but
what jumped out at us was the sexual dimorphism. Women sent mail
saying, ``Interesting column, here's my opinion''; men sent mail
saying, ``interesting column, here's my code/math.'' We suspect
that we could write a perl script to sort the responses by sex.
One woman sent a technical response (containing math or
code); three men sent non-technical responses. The fraction of
cross-dressed mail for the two sexes is identical to two decimal
places.
But we've still gotten no responses from Antarctica.
Monopolies and
You.
It should be apparent by now that we're open-source bigots. We firmly believe in open systems, with commodity
hardware and for the most part, with non-proprietary software.
But there are forces in the world that disagree with us. The
largest of those is currently (and probably still will be, by the
time you read this) on trial for violations of the anti-trust
laws. We speak, of course, about Microsoft.
We won't go into detail about the trial, because whatever we say will be out of date by the time this sees print, but we'll note some interesting reactions:
Off By One and Other Odd
Calculations.
We've tripped over a variety of off-by-one errors in our time. In fact, we've complained about some
of these in this column before. How do they show up and how do
we prevent them? Some examples of obsfucated code, and the fixes
for them, may be instructive.
Taking our cue from Disraeli, we provided an example back in October, 1996, complete with fix, of the %U and %W specifiers to the date command and the strftime() interface. These two specifiers return the week number; in the case of %W, it's the number of weeks beginning on Sunday since January 1st of the current year. In many (nay, most) implementations, these are calculated incorrectly. Given a populated tm structure, and the realization that the number of weeks since the beginning of the year is the same as the number of Sundays, it's pretty easy to calculate:
sun_week (tm)
struct tm *tm;
{
int lastsun = tm->tm_yday -
tm->tm_wday;
return (lastsun+7)/7;
}
On the other hand, we've been known to get things wrong,
too. We built a routine to over-write a section of a file with
nuls a while back. Since the files could be large, we
wanted the program to print a status bar to tell us how far along
it was. Certainly, it could print a dot for each block it wrote,
but it would be far more effective to print a line of fixed
length, and then add a dot for each 5% of the write completed.
The code for writing the blocks is pretty obvious:
fprintf(stderr, "-20s (%07ld) ",
filename, size);
/* insert [set up for status bar] here */
while( size > 0L )
{
if( size >= BUFSIZ )
write(fp,nullbuf,BUFSIZ);
else
write(fp,nullbuf,size);
size -= BUFSIZ;
/* insert [show status] here */
}
But how do we print the status? Our first cut was something like:
#define REPORT 20
/* set up for status bar */
osize = size;
nn = size / REPORT;
cnt = nn * (REPORT-1);
...
/* show status */
while( size < cnt )
{
cnt -= nn;
fprintf(stderr, ".");
}
But this, of course, results in incorrect bar length if
size is less than 20, or if rounding makes the initial
value of cnt odd. The correct code is more like:
#define REPORT 20
/* set up for status bar */
osize = size;
nn = REPORT;
...
/* show status */
while( nn > 0 && size < (osize*nn/REPORT) )
{
nn--;
fprintf(stderr,".");
}
An equally odd calculation occurs in the TeX macros for Graham, Knuth and Patashnik's Concrete Mathematics. (Addison-Wesley, 1994, ISBN 0-201-55802-5.) TeX provides the time of day in minutes since midnight. (We'll leave alternate implementations as an exercise.) To convert that to traditional hours, colon, minutes format requires a bit of fiddling. Usually, we use code such as the following:
\def\formattedtime{\hrs = \time
\divide \hrs by 60
\mins = \time
\divide \mins by 60
\multiply \mins by -60
\advance \mins by \time
\number \hrs
:\ifnum \mins < 10 0\fi\number \mins
}
On the other hand, we spent a bit of head scratching over the following fragment from the Concrete Mathematics macros before the inevitable ``aha!'':
\def\hours{\count0=\time
\divide\count0 by60 % find the o'clock
\multiply\count0 by40
\advance\count0\time % convert to hhmm
\advance\count0 10000
\expandafter\gobbleone\number\count0\relax
}
\def\gobbleone1{}
The calculation of time divided by 60 times 40 provides 40 times
the hours. Since the number of minutes since midnight already
contains the hour times 60, this has the effect of leaving the
hours multiplied by 100 in the result. Thus we are left with
hours times 100, plus minutes. Adding 10000 guarantees that
there is a leading zero, if necessary. Unfortunately, it's
preceeded by a leading one; fortunately that character is eaten
by gobbleone in a bit of TeX macro legedermain.
HTML and troff.
Let's change gears now: By virtue of our being open-source
bigots, we're also in favor of open formats. This means that the
proprietary documents produced by the likes of Microsoft Word and
the Excel spreadsheet make us see various shades of red. (Okay,
they make Haemer see red: Copeland's color blind, so he just
sees a darker shade of gray.) It also means we really like
markup languages such as troff and HTML. In fact, we
generally write this column in the first, and then convert it to
the other for later consumption.
There are a number of tricks we could use for this
conversion, including a variety of public domain tools for
conversion. But, we do something that may not be as obvious: we
convert our troff source to HTML by running it through
nroff with a special macro package.
This all came to mind a few weeks ago when Softway
Systems colleague John McMullen was converting a variety of
troff documentation to on-line web pages, and asked for
some assistance. We won't show you the whole macro package, but
just some interesting pieces.
Our replacement for the -mm list macros had been the following:
.\" ===== LISTS .de AL \" numbered list .nr list_type 1 <OL> .. .de BL \" bullet list .nr list_type 2 <UL> .. .de LE .if \\n[list_type]=1 </OL> .if \\n[list_type]=2 </UL> .nr list_type 0 .. .de LI .if \\n[list_type]=1 <LI> .if \\n[list_type]=2 <LI> ..
John pointed out that we didn't support nested lists, and supplied the following replacement code, which you'll note actually has comments in it. (For ease of reading, @br is a macro that replaces troff's br directive; br itself becomes a macro that produces an HTML <BR> tag.)
.\" ===== LISTS
.\" When we enter a new list, we prepend the
.\" correct termination tag to the string
.\" list_end. When we end a list, we use that
.\" string as the argument list to the .LE
.\" macro, print the first argument and redefine
.\" the string If the string length is zero,
.\" we know there's a problem.
.de AL \" numbered list
.@br
<OL>
.ds list_end "</OL> \\*[list_end]
..
.\" we could specify bullets versus dashes
.\" (HTML 3.2) but it's not a vital issue in my
.\" experience, but with .AL people care.
.de BL \" bullet list
.@br
<UL>
.ds list_end "</UL> \\*[list_end]
..
.de DL \" dash list
.BL
..
.de end_list
.ie \\n[.$]=0 \{\
. tm ".LE: List ending without being in a list
.\}
.el \{\
\\$1
.shift
.rm list_end
.ds list_end "\\$@
.\}
..
.de LE
.@br
.end_list \\*[list_end]
.if "\\$1"1" <P>
..
.de LI
.@br
<LI>
..
It's just not possible to provide a macro to handle every eventuality in our text, so the HTML macros define .ds HTML@Printing xx Since groff provides a way to test the existence of a string -- amp;.if d HTML@Printing... -- we can provide different coding for the troff and HTML versions. For example,
.ds rr re\*'sume\*' .if d HTML@Printing .ds rr résumé
Since most of the use of the HTML@Printing flag
are related to accents, we finally wrote an accent filter.
#! /usr/local/bin/perl -p # Accent filter for -mm to HTML conversion. # Note this only works for valid combinations. s/([AEIOUaeiouy])\\\*:/\&$1uml;/g; s/([AEIOUaeiouy])\\\*;/\&$1uml;/g; s/([AEIOUaeiou])\\\*`/\&$1grave;/g; s/([AEIOUYaeiouy])\\\*'/\&$1acute;/g; s/([AEIOUaeiou])\\\*\^/\&$1circ;/g; s/([ANOano])\\\*\~/\&$1tilde;/g; s/([Cc])\\\*,/\&$1cedil;/g; s/\\\(AE/\Æ/g; s/\\\(ae/\æ/g;
This nicely converts input such as
intoU\*:ber, u\*:ber, ha\*^t, nin\*~o, fac\*,ade, \(aeon.
for printing as ``Über, über, hât, ninnbsp;o, façade, æon.''Über, über, hât, niño, façade, æon.
We leave it as an exercise to fill in the other
interesting troff special characters with HTML/8859-1
escape sequences, such as inverted exclamation points, and the
common fractions.
Finishing up.
Next time, we'll write a review of I18N tricks and techniques.
By the time you read that, the Microsoft trial may be in appeal,
all of your off-by-one bugs may be gone, and you may have
finished converting all your troff documents to HTML.
Until then, happy trails.