Work: Odds and Ends

Jeffreys Copeland & Haemer

(Server/Workstation Expert, May 1999)



I beheld the wretch -- the miserable monster whom I created.
                        --- Mary Wollestonecraft Shelly, Frankenstein


How much easier it is to be critical than to be correct.
                        --- Benjamin Disraeli


Ah, May. We can't help but think of the late Bill Rotsler's cartoon cat sitting in the window distracted by a butterfly above the caption ``if cats had a longer attention span, they could rule the world.'' Just so we don't compete with the short attention span engendered by spring fever, we'll be covering a set of topics we've had kicking around in the attic for a while, none of which are enough to fill a complete column. Thus, we present you with a Franken-column.

But first, we found your reaction to our February column educational. (See ``Differences Among Women,'' SunExpert, page 38, or http://swexpert.com/C9/SE.C9.FEB.99.pdf.)



Differences among correspondents.
Sometimes, life imitates that simple harmonic motion experiment from freshman physics. When we wrote our November column on technology and reading, we were surprised that the first two notes we had about it were both from women. We used this as a jumping-off point for our February column. (As you know, there's sufficient publication offset that our observations and counter-observations occur in waves with a period of three months.)

The level of reader interest in the February column was higher than we'd anticipated. We seem to have struck a nerve -- or a pair of nerves, as it turns out.

One reader, Pete Kernan, now has a web page about these four-tuples, http://theory2.phys.cwru.edu/~pete/sequence.html. There is also a related entry, A045794, in the ``On-Line Encyclopedia of Integer Sequences,'' http://www.research.att.com/~njas/sequences/index.html (look for ``Haemer,'' ``Copeland,'' or ``1 1 1 3 3 4 9'').

We promised to report on the sex ratio of the responses to our column, and here it is: within a month, we got 61 pieces of email from 34 unsolicited readers. Of these, nine respondants were women, (including Ann Janssen, one of our correspondents on the November column), and 25 were men. The correspondents even included the husband-and-wife pair of Shelly Shumway and Arthur Smith. One (male) reader, Sal Mamone, sent us a pointer to some statistics he'd gathered about sex differences among his computer science students. (See ``Empirical Study of Motivation in a Entry Level Programming Course,'' ACM SIGPLAN Notices, March 1992.) We aren't sure Sal's statistics completely apply, since he was teaching COBOL and we think that puts an entirely different skew into the results, but they're interesting nonetheless.

All the responses were interesting and gratifying, but what jumped out at us was the sexual dimorphism. Women sent mail saying, ``Interesting column, here's my opinion''; men sent mail saying, ``interesting column, here's my code/math.'' We suspect that we could write a perl script to sort the responses by sex.

One woman sent a technical response (containing math or code); three men sent non-technical responses. The fraction of cross-dressed mail for the two sexes is identical to two decimal places.

But we've still gotten no responses from Antarctica.



Monopolies and You.
It should be apparent by now that we're open-source bigots. We firmly believe in open systems, with commodity hardware and for the most part, with non-proprietary software. But there are forces in the world that disagree with us. The largest of those is currently (and probably still will be, by the time you read this) on trial for violations of the anti-trust laws. We speak, of course, about Microsoft.

We won't go into detail about the trial, because whatever we say will be out of date by the time this sees print, but we'll note some interesting reactions:




Off By One and Other Odd Calculations.
We've tripped over a variety of off-by-one errors in our time. In fact, we've complained about some of these in this column before. How do they show up and how do we prevent them? Some examples of obsfucated code, and the fixes for them, may be instructive.

Taking our cue from Disraeli, we provided an example back in October, 1996, complete with fix, of the %U and %W specifiers to the date command and the strftime() interface. These two specifiers return the week number; in the case of %W, it's the number of weeks beginning on Sunday since January 1st of the current year. In many (nay, most) implementations, these are calculated incorrectly. Given a populated tm structure, and the realization that the number of weeks since the beginning of the year is the same as the number of Sundays, it's pretty easy to calculate:

sun_week (tm)
 struct tm *tm;
{
 int lastsun = tm->tm_yday -
  tm->tm_wday;
 return (lastsun+7)/7;
}

On the other hand, we've been known to get things wrong, too. We built a routine to over-write a section of a file with nuls a while back. Since the files could be large, we wanted the program to print a status bar to tell us how far along it was. Certainly, it could print a dot for each block it wrote, but it would be far more effective to print a line of fixed length, and then add a dot for each 5% of the write completed.

The code for writing the blocks is pretty obvious:

fprintf(stderr, "-20s (%07ld) ",
 filename, size);
/* insert [set up for status bar] here */
while( size > 0L )
{
 if( size >= BUFSIZ )
  write(fp,nullbuf,BUFSIZ);
 else
  write(fp,nullbuf,size);
 size -= BUFSIZ;
 /* insert [show status] here */
}

But how do we print the status? Our first cut was something like:

#define REPORT 20
/* set up for status bar */
osize = size;
nn = size / REPORT;
cnt = nn * (REPORT-1);

  ...

/* show status */
while( size < cnt )
{
 cnt -= nn;
 fprintf(stderr, ".");
}
But this, of course, results in incorrect bar length if size is less than 20, or if rounding makes the initial value of cnt odd. The correct code is more like:
#define REPORT 20
/* set up for status bar */
osize = size;
nn = REPORT;

  ...

/* show status */
while( nn > 0  && size < (osize*nn/REPORT) )
{
 nn--;
 fprintf(stderr,".");
}

An equally odd calculation occurs in the TeX macros for Graham, Knuth and Patashnik's Concrete Mathematics. (Addison-Wesley, 1994, ISBN 0-201-55802-5.) TeX provides the time of day in minutes since midnight. (We'll leave alternate implementations as an exercise.) To convert that to traditional hours, colon, minutes format requires a bit of fiddling. Usually, we use code such as the following:

\def\formattedtime{\hrs = \time
 \divide \hrs by 60
 \mins = \time
 \divide \mins by 60
 \multiply \mins by -60
 \advance \mins by \time
 \number \hrs
  :\ifnum \mins < 10 0\fi\number \mins
}

On the other hand, we spent a bit of head scratching over the following fragment from the Concrete Mathematics macros before the inevitable ``aha!'':

\def\hours{\count0=\time
  \divide\count0 by60 % find the o'clock
  \multiply\count0 by40
  \advance\count0\time % convert to hhmm
  \advance\count0 10000
  \expandafter\gobbleone\number\count0\relax
}
\def\gobbleone1{}
The calculation of time divided by 60 times 40 provides 40 times the hours. Since the number of minutes since midnight already contains the hour times 60, this has the effect of leaving the hours multiplied by 100 in the result. Thus we are left with hours times 100, plus minutes. Adding 10000 guarantees that there is a leading zero, if necessary. Unfortunately, it's preceeded by a leading one; fortunately that character is eaten by gobbleone in a bit of TeX macro legedermain.



HTML and troff.
Let's change gears now: By virtue of our being open-source bigots, we're also in favor of open formats. This means that the proprietary documents produced by the likes of Microsoft Word and the Excel spreadsheet make us see various shades of red. (Okay, they make Haemer see red: Copeland's color blind, so he just sees a darker shade of gray.) It also means we really like markup languages such as troff and HTML. In fact, we generally write this column in the first, and then convert it to the other for later consumption.

There are a number of tricks we could use for this conversion, including a variety of public domain tools for conversion. But, we do something that may not be as obvious: we convert our troff source to HTML by running it through nroff with a special macro package.

This all came to mind a few weeks ago when Softway Systems colleague John McMullen was converting a variety of troff documentation to on-line web pages, and asked for some assistance. We won't show you the whole macro package, but just some interesting pieces.

Our replacement for the -mm list macros had been the following:

.\" ===== LISTS
.de AL \" numbered list
.nr list_type 1
<OL>
..
.de BL \" bullet list
.nr list_type 2
<UL>
..
.de LE
.if \\n[list_type]=1 </OL>
.if \\n[list_type]=2 </UL>
.nr list_type 0
..
.de LI
.if \\n[list_type]=1 <LI>
.if \\n[list_type]=2 <LI>
..

John pointed out that we didn't support nested lists, and supplied the following replacement code, which you'll note actually has comments in it. (For ease of reading, @br is a macro that replaces troff's br directive; br itself becomes a macro that produces an HTML <BR> tag.)

.\" =====  LISTS
.\" When we enter a new list, we prepend the
.\" correct termination tag to the string
.\" list_end.  When we end a list, we use that
.\" string as the argument list to the .LE
.\" macro, print the first argument and redefine
.\" the string If the string length is zero,
.\" we know there's a problem.
.de AL \" numbered list
.@br
<OL>
.ds list_end "</OL> \\*[list_end]
..
.\" we could specify bullets versus dashes
.\" (HTML 3.2) but it's not a vital issue in my
.\" experience, but with .AL people care.
.de BL \" bullet list
.@br
<UL>
.ds list_end "</UL> \\*[list_end]
..
.de DL \" dash list
.BL
..
.de end_list
.ie \\n[.$]=0 \{\
. tm ".LE: List ending without being in a list
.\}
.el \{\
\\$1
.shift
.rm list_end
.ds list_end "\\$@
.\}
..
.de LE
.@br
.end_list \\*[list_end]
.if "\\$1"1" <P>
..
.de LI
.@br
<LI>
..

It's just not possible to provide a macro to handle every eventuality in our text, so the HTML macros define .ds HTML@Printing xx Since groff provides a way to test the existence of a string -- amp;.if d HTML@Printing... -- we can provide different coding for the troff and HTML versions. For example,

.ds rr re\*'sume\*'
.if d HTML@Printing .ds rr r&eacute;sum&eacute;

Since most of the use of the HTML@Printing flag are related to accents, we finally wrote an accent filter.

#! /usr/local/bin/perl -p
#  Accent filter for -mm to HTML conversion.
#  Note this only works for valid combinations.

s/([AEIOUaeiouy])\\\*:/\&$1uml;/g;
s/([AEIOUaeiouy])\\\*;/\&$1uml;/g;
s/([AEIOUaeiou])\\\*`/\&$1grave;/g;
s/([AEIOUYaeiouy])\\\*'/\&$1acute;/g;
s/([AEIOUaeiou])\\\*\^/\&$1circ;/g;
s/([ANOano])\\\*\~/\&$1tilde;/g;
s/([Cc])\\\*,/\&$1cedil;/g;
s/\\\(AE/\Æ/g;
s/\\\(ae/\æ/g;

This nicely converts input such as

U\*:ber, u\*:ber,
ha\*^t, nin\*~o,
fac\*,ade, \(aeon.
into
&Uuml;ber, &uuml;ber,
h&acirc;t, ni&ntilde;o,
fa&ccedil;ade, &aelig;on.
for printing as ``Über, über, hât, ninnbsp;o, façade, æon.''

We leave it as an exercise to fill in the other interesting troff special characters with HTML/8859-1 escape sequences, such as inverted exclamation points, and the common fractions.



Finishing up.
Next time, we'll write a review of I18N tricks and techniques. By the time you read that, the Microsoft trial may be in appeal, all of your off-by-one bugs may be gone, and you may have finished converting all your troff documents to HTML.

Until then, happy trails.