It's not the voting that's democracy; it's the counting.
--- Tom Stoppard
He sighed wearily, ``I just accidentally wrote an S-F novel, okay? I didn't mean to apply for citizenship in the Twilight Zone.''
``I don't think you can apply, Jay. I think fandom takes hostages.''
--- Zombies of the Gene Pool, by Sharyn McCrumb
For a long time, one of the ideals of western
civilization has been the democratic election. And the ideal of
the democratic election has been the majority winner. But what
happens to your election if you have more than two candidates?
How can you ensure a majority winner? And how can you ensure
that the winner really reflects the wishes of the voters?
We've talked before in these pages about different types
of elections. Our final article for RS/Magazine and our
first for SunExpert were a pair of columns about using a
CGI script to tally United States electoral college votes.
(February and April 1997.)
As you may recall from your high school civics class,
each state in the US has a number of electoral votes equal to the
number of its federal legislators. Today, in most states, all
those votes are cast for whoever wins the plurality of votes in
the state. The new president is the one who wins a majority of
the 538 electoral votes. Unfortunately, this means that to
become President of the United States one merely has to win a
plurality of the votes in the eleven most populous states, and
not one single additional vote.
Is there a better solution? The problem is old, well-studied, and answered. That answer is ``No.''
In 1951, Nobel-Prize winning economist Kenneth Arrow
proved, in his Ph.D. dissertation, that no ``best'' voting scheme
exists, as soon as there's more than one voter. (You can always
satisfy a dictator perfectly.) Arrow's Impossibility Theorem had
roughly the same impact on political science that Godel's
Incompleteness Theorem had on mathematics.
Basically, the thing to do is pick a scheme that's
mostly fair, and get on with politics as usual.
That fact doesn't stop people from coming up with
interesting and useful alternatives. For illustrations, let's
consider two examples from studies by Jean Borda and the Marquis
de Condorcet for the French Academy in the eighteenth century,
Borda's is probably the more obvious method. The first-ranked candidate on an n-person ballot is given n
points, the second-ranked candidate n-1 points, and so on,
with the last ranked candidate being given one point. The points
for each candidate are totaled, and the winner is the one with a
plurality. (This may be familiar to the sports-literate among
our readers: Borda's is the method used to determine the annual
ranking of college football teams by poll of sports writers and
coaches. Why this method is used instead of simply seeing who
wins the most games is a mystery to Copeland, who is a football
anti-fan.)
Condorcet came up with a different technique. He
suggested using the rankings on the ballot to figure out the
results of elections between each pair of candidates. The winner
of the election is the ``most preferred'' candidate in these
pair-wise tallies.
Some election activists suggest that changing voting
schemes -- say, to this sort of preferential balloting -- would
break up our two-party system and decrease the polarization of
our political process. Our counter-examples would be Israel and
Italy. Both are parliamentary democracies with multi-party
elections. However, in the former, the political process is
polarized to the point of preventing a peace settlement; in the
latter, the political process is so unstable that the average
lifetime of a ruling coalition since the end of the second world
war has been less than eighteen months.
Enough about geo-politics, let's change gears for a
moment.
A Practical
Application.
Time flies like Ken Arrow.
Fruit flies like a bananna.
--- Anonymous
At the eleventh World Science Fiction Convention in
1953, the membership first voted on what has become a staple of
the science fiction world: the Hugo awards. The awards are named
after the Luxembourg-born writer and editor Hugo Gernsback, who
founded Amazing Stories, the first of the pulp SF
magazines. That first year, the winners included Alfred Bester
for his novel The Demolished Man and Phillip José
Farmer as best new SF author. Nearly half-a-century later, there
are thirteen categories of Hugos, and the ``Best New Author''
award has been institutionalized as The John W. Campbell Award,
named for the editor of Astounding, and the man who shaped
short SF for the middle part of the century.
We remembered the Hugos a couple of weeks ago when the
nominees for the 1999 awards were announced, and we found our
friend Guy Lillian on the ballot for his fanzine,
Challenger. This all reminded us that a column on
tallying the Hugo ballots has been rattling around in our topic
drawer for a while.
(In yet another sign of how far in advance these columns
are written, the nominees were announced in mid-April, but the
final voting deadline is July 31st. This leaves a month before
the World Science Fiction Convention -- in Chicago this year;
http://www.chicon.org
for details -- for Mike Nelson, the Hugo administrator, to tally
the ballots.)
In any case, this brings us to the Hare method, which
dates to the middle of the nineteenth century. Because Hare's
method is used in Australia, it's sometimes referred to as an
Australian ballot, even though it's more properly referred to as
a preferential ballot. (The Australian ballot is in contrast to
the Austrian ballot, in which you hold an election and the Holy
Roman Emperor gets as many votes as he wants.)
In a Hare count, the first-ranked choices on each ballot
are tallied. If no candidate has a majority, the candidate with
the lowest vote count is eliminated, and ballots for that
candidate are redistributed based on their second-ranked choices.
We continue until one candidate has a majority of the votes
currently available.
Let's begin by positing a file of ballots looking something like this:
We have a list of the candidates, a separator, and then a line containing each ballot as a list of ranks. The first voter ranked Pride and Prejudice first, followed by War and Peace, Tom Sawyer, Prisoner of Zenda, and No Award. ``No award'' is a special case: for Hugo voting, it's provided in case the voter feels nothing deserves the award that year.Moby Dick Prisoner of Zenda Pride and Prejudice Tom Sawyer War and Peace No Award /* 0 4 1 3 2 5 0 0 0 0 0 1 0 0 0 0 0 1 4 3 2 5 1 0 0 2 1 3 0 0 1 4 3 6 2 5 0 0 1 0 2 0 0 0 1 0 2 0
We can begin a Perl script to count the ballots in the usual way:
Because we've used strict, we need to declare variables. We need a hash of the candidates who have not yet been eliminated, candidates. We'll need a similar hash, by candidate, of the votes at each elimination round, tally. There are two interesting features of note here. First, we haven't said what's stored against the candidate's name in the hash candidates. It doesn't matter: Once the candidate is eliminated, we delete the name from the hash, so the presence of a candidate means it's still active. Second, tally is actually a hash of arrays, or more accurately a hash of pointers to arrays, or ``arrayrefs'' in Perl-speak. How we add and extract data from this structure will become apparent as we go along.#! /usr/local/bin/perl -w # tally a Hugo-like preferential ballot # $Id: tally,v 1.12 2000/05/10 17:09:58 jeff Exp $ use strict; my %candidates; # hash of candidates names still alive my %tally; # hash of arrayrefs of votes per # candidate for each runoff round
We're assuming the input file is on our standard input, even though we could have provided a named file on the command line instead. We need to grab the contents of the file nonetheless. We could build a very C-like loop to grab do this, such as:
# read candidate names
while( <> ) {
chomp;
last if( /\/\*/ );
push(@candidates, $_);
}
# now read the ballots themselves
$count = 0;
while( <> ) {
chomp;
$i = 0;
while( s/\s*(\d)(.*)/$2/ ) {
$ballots{$candidates[$i++]}[$count] = $1;
}
$count++;
}
Instead, we'd rather do this in a Perl-like way. For example, to
read the candidates' names, we use:
my @candidates;
# ordered list of all candidates
# grab the candidate list...
{
local $/ = "/*\n";
chomp ($_ = <>);
@candidates = split "\n";
}
We redefine the record separator, $/, to be the marker
between the candidates and the ballots. Once we've read the
whole candidate list -- chomping the record separator in
the process -- we split the list into our
candidates array. We do this whole operation inside a
block to insulate the $/ redefinition. Notice that
we've defined an array @candidates which we shouldn't
confuse with the hash %candidates. Perl keeps them
distinct by the way they're referred to; so can we.
We then invoke a similar loop to read the ballots themselves from the file.
my @ballots;
# ordered list of all ballots
# now grab the ballots
while(<>) {
my %ballot;
@ballot{@candidates} = split;
# use the hash slice
push @ballots, {%ballot};
}
We use a hash slice to split up each line of ballot into its
component rankings. What's a hash slice? In effect, the line
@ballot{@candidates} = split; is a Perl multiple
assignment that says
($ballot{$candidate[0]},
$ballot{$candidate[1]}, ... ) =
split($_, / /);
This means that we have values like $ballot{'Moby Dick'} =
4. We end the loop by pushing the hash %ballot
onto the array @ballots.
We talked about the hash %candidates earlier. We now need to populate it for the candidates still active -- at this point, all of them. Again we use a hash slice:
@candidates{@candidates} = @candidates;
Again, this is a multiple assignment that says
($candidates{$candidates[0]},
$candidates{$candidates[1]}, ...) =
($candidates[0], $candidates[1], ...);
(Notice how we can tell when we're referring to the hash
%candidates and when we really mean the array by
checking what brackets we use.)
Counting the votes is a simple matter as we outlined above. We count the first-place votes, and if there is no candidate with a majority, we eliminate the lowest-ranked candidate, redistribute its second-place votes to the remaining candidates, and repeat. Reduced to code, we say:
while (1) {
my %results = one_round @ballots;
if ($results{Winner}) {
print "We have a winner! ",
"$results{Winner}\n";
last;
}
warn "We have a loser! $results{Loser}\n";
foreach (@ballots) {
$_->{$results{Loser}} = 0;
# throw away votes for eliminated
# candidate: make them "don't rank"
}
delete $candidates{$results{Loser}};
# keeps @candidates the list
# of ACTIVE candidates
die "No more candidates!"
unless %candidates;
}
This infinite loop tallies each round of ballots. We'll look at
the subroutine that does that in a moment. For the moment, you
need to know only that one_round() returns a hash
containing two possible key values, either ``Winner'' or
``Loser''. If there was a majority winner, it is contained in
$results{Winner}. If there was not, the lowest vote-getter is $results{Loser}. As a side-effect,
one_round() also populates the %tally array, as
we'll see anon.
If we actually have a winner, we print a message and
break out of the infinite loop. If not, we print the name of the
candidate to be eliminated, and then we set that candidate's rank
to zero on each ballot. In other words, we assert that no one
voted for the candidate. Then, we eliminate the candidate in our
hash %candidates by using delete. If there are
no more candidates for the next round, we have a real problem,
and we take a fatal exit.
The loop is fairly simple in concept, and made simple in
expression by our notation, but a lot of the action is hidden.
For example, what magic happens in one_round? Quite a
bit, it turns out.
Magic Routines and
Results.
The one_round routine is also deceptively simple, once we've looked at it:
# return a winner or loser
sub one_round {
my @pref_list = pref_list @_;
my $min = @pref_list;
my $winner;
my $loser = "No losers!";
foreach my $candidate (keys %candidates) {
my @votes =
grep {$_ eq "$candidate"} @pref_list;
# pref_list is a list of
# highest-ranked active
# candidates from each ballot;
# we "grep" to get just
# the current candidate.
$winner = $candidate
if(@votes > @pref_list/2);
($loser, $min) =
($candidate, scalar @votes)
if @votes < $min;
# "scalar" ensures that count
# is used
push(@{$tally{$candidate}}, scalar @votes);
}
return $winner ?
(Winner=>$winner) : (Loser=>$loser);
}
The largest magic, which we'll defer examining, happens in the
first line, where we use the routine pref_list to create
a similarly-named array containing an array of the highest-ranked
candidate on each ballot. That is, @pref_list contains
something like ``Moby Dick, Tom Sawyer, Moby Dick, No Award, War
and Peace, War and Peace, ...'' We set the minimum number of
votes -- that is the lowest tally we've seen so far -- to the
length of that array. We declare local variables for winner and
loser.
We need to determine the tally for each remaining candidate, that is any candidate which still has a key in %candidates. If we had @pref_list, in a file, we would say something like
In Perl, we do the same thing with the grep line in the code fragment. The count of entries in the resulting @votes array gives us the number of votes received. If that number of votes is greater than the half the length of @pref_list, that is, the number of ballots still active, we have a winner. Why not just return here? Because we really need to know the vote tallies for the remaining candidates in this runoff round. If the count of votes for this candidate is less than the previous lowest tally, we save that data with a multiple assignment. Notice that we're explicitly saying scalar @votes which gives us the number of entries in the array here: if we didn't, we'd just get the first entry in that array assigned to $min.grep "Moby Dick" pref_list | wc -l
Lastly, we add the count of votes to the tally
hash entry for the candidate. The odd bit of syntax
@{$tally{$candidate}} is the array referred to by the
hash entry. Remember that tally is actually a hash
containing arrayrefs, which we need to dereference before adding
the new votes.
Finally, falling out of that loop, we return one of
Winner or Loser.
We finish up our subroutines by looking at what happens under the covers of pref_list.
# make a preference list
sub pref_list {
my @ballots = @_;
my @pref_list;
# an array of ballots
# each, an array of active candidate
# names, in preference order
foreach (@ballots) {
my %rank = reverse(%$_);
delete $rank{0};
# delete any "don't rank"s
push @pref_list, $rank{(sort keys %rank)[0]}
if(%rank);
}
@pref_list;
}
We begin with the array of ballots passed as an argument.
Remember that this is actually an array of hashes indexed by
candidate. For each ballot in the list we are presented, we
invert the sense of the hash using reverse(%$_), which
gives us a list of candidates keyed by rank. In other words, we
started with entries in ballots like $ballot{'Moby
Dick'} = 4. which we've turned into entries like
$rank{4} = 'Moby Dick'. In the inversion, any candidate
with a rank of zero -- that is, one who received no vote or was
eliminated -- is overlaid into $rank{0}, which we
delete. Then if there are still entries in the hash
%rank, we add the entry with the lowest-sorting key --
the highest-ranked candidate -- to the end of pref_list.
It's pref_list which we return once we've examined all
the ballots.
This highest-ranked business bears a moment's
examination. In the normal course of events, when a candidate is
eliminated, we want to promote the ranks of all the other
candidates. If Moby Dick were eliminated, we'd promote
everything ranked lower by one position, even if Moby Dick
weren't in first place. But that doesn't really matter. All we
really need to do is change Moby Dick to ``unvoted,''
since we're using the highest remaining rank in the
pref_list subroutine. Put another way, the absolute
rank is unimportant, it's the relative rank remaining that
matters.
But what about the results? Even though we've printed out the name of the winner in our main while loop, it would be nice to see the totals at each runoff. We could use a simple loop like,
foreach my $name (@candidates) {
print $name;
while( my $n = pop(@${tally{$name}}) ) {
print $n;
}
print "\n";
}
There are two problems with that scheme. It's in candidate order
rather than in some order related to the vote totals, and it
doesn't format the tally grid in a very appealing way.
We can solve the first problem by sorting the tally hash. But we want to sort the hash by the last entry in the array, that is, the last vote total each candidate received before being eliminated. We can use a convenient bit of Perl syntax, and say @{$tally{$x}}[-1] to get the last vote count for candidate $x. Alternately, we can use a different bit of syntactic sugar and say $tally{$x}->[-1] as we've actually done here. We want to sort the tally list in reverse order of that entry. In code,
sort {
$tally{$b}->[-1] <=> $tally{$a}->[-1];
} keys %tally
This uses a slightly more complicated comparison than the normal
Perl sort example,
sort { $a <=> $b } @array
To print out the full results, then, we say,
my $sort =
sub { $tally{$b}->[-1] <=> $tally{$a}->[-1] };
foreach (sort $sort keys %tally) {
my @a = @{$tally{$_}};
printf "%-20s" . "%5d"x@a . "\n", $_, @a;
}
We could have printed the vote counts by shifting them
out of the array, but this shows them printed in a single
statement. Note how we've constructed the printf format
specifier out of pieces, the central one of which (which prints
the numbers) depends on the size of the array @a. We
also extract the sort into an anonymous subroutine just
to keep our foreach line uncluttered.
This gives us output like:
Pride and Prejudice 355 359 408 455 573 Prisoner of Zenda 235 236 264 321 414 War and Peace 178 179 215 272 Tom Sawyer 197 202 211 Moby Dick 147 152 No Award 56
Notice that we're counting on the array's being in
chronological order, which we got by using the earlier
push().
Notice also that Tom Sawyer, which finished well-ahead of War and Peace in the first two rounds of voting,
was eliminated by the very next round. Despite our good
intentions and careful programming, it's possible to concoct
input data that produce even more egregious outcomes: Arrow's
Impossibility Theorem in action. And -- now that you have the
code -- we leave it to you to convince yourself of this.
Finishing Up.
The program we've presented is fully functional as far as it goes. Using idiomatic Perl tightens up the notation over the more complicated code we would have used in a C or Pascal version. In fact, there are roughly 75 lines of Perl code here, not counting didactic comments, in contrast with about 900 in the corresponding C program. However, we've deliberately left some things out which take up space in the C version. We present them to you now as exercises.
We've deliberately simplified both our discussion of
electoral theory and the specific history of the Hugo Awards,
just touching enough of the high points to set up the problem.
(This is a problem we have a bit of experience with. Copeland
and his wife, Liz, have actually administered the Hugo Awards,
and wrote the code that does the real tallying.)
On the issue of the history of voting, there are some good references out there if you want to do further study. We can suggest:
In our coding efforts, we owe quite a bit to Tom
Christiansen and Nat Torkington's Perl Cookbook (O'Reilly,
1998, ISBN 1-56592-243-3). Tom and Nat's book provides more than
700 pages of interesting problems with copious discussion of the
hows-and-whys of the Perl tricks used to solve them.
On the other hand, if you're interested in the past
history of the Hugos and other literary awards, check out the
AwardWeb web page at http://dpsinfo.com/awardweb. If
you're a rules lawyer, you can find the codified process for Hugo
nomination and voting in the constitution of the World Science
Fiction Society, at http://www.worldcon.org/rules.html. Or for a more
up-to-date version, see http://www.chicon.org/wsfs/constit.htm.
Next month, as always, we'll explore a new and different problem that momentarily distracted us. Until then, happy trails.