Therefore is the name of it called Babel; because the Lord did there confound the language of all the earth:
--- Genesis 11:9
The Babel fish is small, yellow and leechlike, and probably the oddest thing in the Universe. ... The practical upshot of all this is that if you stick a Babel fish in your ear you can instantly understand anything said to you in any form of language.
--- The Hitchhiker's Guide to the Galaxy
Sehr geehrter Herr Dr. Haemer ! Herzlichen Dank fuer Ihre Einladung am Donnerstag, die ich dankend annehme. Wie Sie vorschlagen, werde ich die Kunstwerke an Ihren Waenden studieren. Dann koennen wir uns auch weiter ueber solche Themen wie Religion, Literatur und Philosophie unterhalten. Ich bringe Salat. Bitte schicken Sie mir directions to your Haus. Herzlich, Ana
For years, we have been writing, here and elsewhere,
about internationalization: how to write your programs so that
they are multi-lingual.
Unfortunately, we are not yet as multi-lingual as our
software. Haemer recently found the note reproduced above in his
incoming email. Haemer speaks no German. What to do, what to
do?
Speaking with forked
tongue
If you've never clicked on the little [Translate] link
in AltaVista, this is the time to try it out. Doing so takes you
to http://babelfish.altavista.com/,
home of a server that will mechanically translate some natural
languages into others. Babelfish offers machine translation
between English and a handful of common, European languages:
French, German, Italian, Portuguese, and Spanish.
Machine translation has a long, mixed history. The
bottom line is that the translations aren't perfect, and some of
them are downright silly. What impresses us after a little use
is how often babelfish's translations are good enough.
Here's what babelfish did with Haemer's message:
Dear Dr. Haemer! Cordial thanks for your invitation on Thursday, which I assume thanking. As you suggest, I will study the works of art at your walls. Then we can converse also further about such topics such as religion, literature and philosophy. I bring salad. Please you send directions to to me your house. Cordially, Ana
Imperfect, but completely intelligible. It would be a
pity to miss such wonderful email either because of our
linguistic limitations or because we demand perfect translations.
A joint venture of Systran and AltaVista, babelfish is
named after Douglas Adams' Babel fish, from Hitchhiker's Guide
to the Galaxy, ISBN: 0-517-54209-9. It's mostly used to make
sense of web pages that would otherwise be gibberish, but it's a
general-purpose tool.
Babelfish: not just for browsers
any more.
Unfortunately but unsurprisingly, the interface
AltaVista provides is tied to a browser, which limits you to
typing at it. If you're interested in both programming and human
languages -- which we are -- there are a lot of fun things you
can imagine trying with a programmatic interface to a tool like
this. However, writing web clients from scratch can require a
lot of ad hoc, trial-and-error work. (See our April, 1999,
column, http://swexpert.com/C9/SE.C9.APR.99.pdf.)
Luckily, there is now a Perl module on the CPAN, http://www.perl.com/CPAN/, to help. WWW::Babelfish is a module specifically designed to let you write web clients for the babelfish server. It only took us a few minutes work to produce a working, general-purpose translation script. A few more iterations brought us to this.
#!/usr/local/bin/perl -w
use strict;
use lib "."; # hack, cough
use WWW::Babelfish;
use Getopt::Std;
my $options = "[-i input_language | -o output_language] [filename ...]";
my $usage = "usage: $0 $options";
sub get_langs {
use vars qw($opt_o $opt_i);
getopts "i:o:" or die $usage;
die $usage if ($opt_o && $opt_i);
my ($in, $out) = ($opt_i || "English", $opt_o || "English");
}
my ($in, $out) = get_langs;
my $obj = new WWW::Babelfish( 'agent' => 'Mozilla/8.0' );
die "Babelfish server unavailable\n" unless defined $obj;
my @languages = $obj->languages;
die "source language $in must be in @languages\n"
unless grep /$in/, @languages;
die "destination language $out must be in @languages\n"
unless grep /$out/, @languages;
$/ = undef;
my $translation = $obj->translate(source=>$in, destination=>$out, text=><>);
die "Could not translate: " . $obj->error unless defined $translation;
print "$translation\n";
=head1 NAME
ana - Simple Babelfish client, for notes from Ana
=head1 SYNOPSIS
ana [-i input_language | -o output_language] [files]
=head1 DESCRIPTION
=over 2
B<ana> uses babelfish to translate from
one language to another.
Default language for each is English.
=back
=head1 OPTIONS AND ARGUMENTS
=over 8
=item I<-i>
input language
=item I<-o>
output language
=item I<filename ...>
files to translate (default: stdin)
=back
=head1 AUTHOR
Jeffrey S. Haemer <jsh@usenix.org>,
Jeffrey Copeland <copeland@alumni.caltech.edu>
=head1 SEE ALSO
perl(1) WWW::Babelfish(3)
=cut
We'll mostly let the code speak for itself. The
documentation that comes with Babelfish is very clear, and this
column has so much code that we're cramped for space,
Line 03 points the script at own, local version of WWW::Babelfish. We had to add the line
$ua->proxy(['http', 'ftp'] => $ENV{http_proxy});
to let us get to the web through proxy servers.
We play telephone
So, let's try it out.
In the game of ``telephone,'' a phrase is whispered
around a circle; when it completes a circuit, the original phrase
is contrasted with what it has turned into.
Imagine a game of telephone at the European Economic
Commission, in Brussels, where each player has a different native
language and most of the transmission noise is translation error.
We can simulate this by setting up a pipeline of translation
programs. Since babelfish only provides translations to and from
English, every other speaker will have to be an Anglophone.
(Alternatively, you can imagine that all the whispering is done
in English, but that each speaker must translate the phrase that
comes in his right ear, first from English to his or her native
language and then back to English, before passing it on to next
person to the left.)
Here's an example
#!/bin/sh
# Continental telephone
# $Id: teleEurope,v 1.3 1999/08/05 21:16:20 jsh Exp $
whisper() {
ana -o $1 | ana -i $1 | tee /dev/stderr
}
echo my hovercraft is full of eels |
whisper English |
whisper French |
whisper Portuguese |
whisper Italian |
whisper German |
whisper Spanish
And here's its output
my hovercraft is full of eels my hovercraft is full with eels mine hovercraft is full with conger-eels Hovercraft of the mine is full with the gronghi Air cushion vehicle of the pit is full with gronghi The vehicle of pneumatic shock absorber of the hollow is full with gronghi
(``My hovercraft is full of eels.'' is used in the
WWW::Babelfish documentation, and is taken from Monty
Python's Hungarian Phrasebook sketch.)
But why stop there? Looking for babelfish-related news articles on DejaNews, we found the following wonderful post, from David Chess at IBM Research:
> Subject: Babelfish invariance > From: chess@us.ibm.com (David M. Chess) > Date: 1999/05/27 > Newsgroups: alt.hackers > The first thing everyone does with a translator like Babelfish > (http://babelfish.altavista.digital.com/) is to translate > something from one's native tongue into some other language, > and then back again, to see what happens. It's only a slight > stretch to *continue* this process until you get to a fixed > point of the transform (the resulting string is the same as > the last one you put in), or a cycle (the resulting string is > the same as the one you put in N steps back). A string in > language A which, when translated into language B by Babelfish > and the result translated back into A, yields A again, is said > to be "Babelfish invariant". > [...]
David says, further on in the posting, that he hand-crafted a client to play with this idea. Lazily, we tried our hands at the same thing with WWW::Babelfish.
#!/usr/local/bin/perl -w
# $Id: telephone,v 1.3 1999/09/01 20:13:52 jeff Exp jeff $
use strict;
use lib "."; # hack, cough
use WWW::Babelfish;
use Getopt::Std;
my ($obj, $in, $phrase);
my $optionsa = "[-c] [-v]";
my $optionsb1 = "[-s language_spoken ";
my $optionsb2 = "| -t language_of_thought]";
my $optionsb = $optionsb1 . $optionsb2;
my $optionsc1 = "[-n cycles] ";
my $optionsc2 = "[filename | -e expression]";
my $optionsc = $optionsc1 . $optionsc2;
my $usage = "usage: $0 $optionsa $optionsb $optionsc";
use vars qw($opt_s $opt_t $opt_n $opt_v $opt_e $opt_c);
sub parse_args {
getopts "s:t:n:e:vc" or die $usage;
die $usage if ($opt_s && $opt_t);
my ($speak, $think) = ($opt_s || "English", $opt_t || "English");
$speak = ucfirst lc $speak;
$think = ucfirst lc $think;
my $n = $opt_n || 10;
die unless $n =~ /^\d+$/;
($speak, $think, $n);
}
sub xform {
my ($s, $d, $in) = @_;
warn "in = $in\n" if ($opt_v);
my $out = $obj->translate(source=>$s, destination=>$d, text=>$in);
warn "out = $out\n" if ($opt_v);
die "Could not translate: $s" . $obj->error unless defined $out;
chomp $out;
$out;
}
my ($speak, $think, $n) = parse_args;
$obj = new WWW::Babelfish( 'agent' => 'Mozilla/8.0' );
die "Babelfish server unavailable\n" unless defined $obj;
my @languages = $obj->languages;
die "Spoken language ($speak) must be in: @languages.\n"
unless grep /$speak/, @languages;
die "Language of thought ($think) must be in: @languages.\n"
unless grep /$think/, @languages;
if ($opt_e) {
$phrase = $opt_e;
die $usage if @ARGV;
} else {
local $/ = undef;
$phrase = <>;
}
$in = $phrase;
foreach my $t (1..$n) {
my $out = xform $speak, $think, $in;
$out = xform $think, $speak, $out;
if (lc $in eq lc $out) {
chomp $in;
print "$t\t" if $opt_c;
print "$in\n";
exit;
}
$in = $out;
}
die qq("$phrase"\n\thas become\n"$in"\n);
=head1 NAME
telephone - simulates the game of "telephone"
=head1 SYNOPSIS
telephone [-c] [-nI<n>] [-v] [files | -e expression]
[-t thought language | -s spoken langauge]
=head1 DESCRIPTION
=over 2
B<telephone> simulates the game of telephone.
(In the game of telephone,
participants sit in a big circle.
One person whispers a phrase to the person next to him.
That person then whispers what he thought he heard
to the person on the other side,
and this continues around the circle
until it gets back to the originator.
The point of the game is to see
how much the phrase changes in transit.
In this program, each simulated participant
"thinks" in one language (say, German)
and "whispers" in a second (say, English).
The changes are generated by one cycle
of translating from English to German and back again.
Translation is performed by babelfish,
This process continues through a series of
English->German->English cycles
until the phrase has either become
"babelfish invariant" (stable)
or gone around the circle.
=back
=head1 OPTIONS AND ARGUMENTS
=over 8
=item I<-v>
verbose
=item I<-t>
language of thought
=item I<-s>
language of speech
=item I<-n>n
number of participants (default: 10)
=item I<-c>
count iterations until stability
=item I<-e>
word or expression to translate
=item I<filename ...>
files to translate (default: stdin)
=back
=head1 AUTHOR
Jeffrey S. Haemer <jsh@usenix.org> and
Jeffrey Copeland <copeland@alumni.caltech.edu>,
from a suggestion in alt.hackers
by David M. Chess <chess@us.ibm.com>
=head1 SEE ALSO
perl(1) WWW::Babelfish(3)
=cut
David says that French is rumored to be the best-supported language. To test this, we wrote a little shell script
that plays telephone in several languages:
#!/bin/sh # $Id: multi-tel,v 1.3 1999/08/05 19:50:04 jsh Exp $ # comparison of languages for "telephone" for i in English French German Italian Portuguese Spanish do echo == $i telephone -c -t $i -e "$*" done
and another to exercise it:
#!/bin/sh #! $Id: mtest,v 1.4 1999/08/05 23:29:22 jsh Exp $ # demo of multi-tel multi-tel My hovercraft is full of eels. echo multi-tel Out of sight, out of mind. echo multi-tel CITRAN blows dead aardvarks.
Here's what we found when we ran it.
== English 1 My hovercraft is full of eels. == French 2 My hovercraft is full with eels. == German 5 My air cushion, machine pulls up, is full from the Aalen. == Italian 2 My Hovercraft is full of the eels. == Portuguese 3 Hovercraft of the mine is full of conger-eels. == Spanish 1 My hovercraft is full of eels. == English 1 Out of sight, out of mind. == French 3 Out of the sight, spirit. == German 4 Understand over the sight from out. == Italian 2 From sight, the mind. == Portuguese 3 Except of the sight, it is of the mind. == Spanish 2 Outside Vista, the mind. == English 1 CITRAN blows dead aardvarks. == French 2 CITRAN blows the dead aardvarks. == German "CITRAN blows dead aardvarks." has become "CITRAN burns continuous aardvarks one of the dead ones of one." == Italian "CITRAN blows dead aardvarks." has become "CITRAN jumps the aardvarks has put put out of order put put put put put put put." == Portuguese 2 Inoperative CITRAN establish aardvarks. == Spanish "CITRAN blows dead aardvarks." has become "Aardvarks died to the blowing of CITRAN."
We like reading these out loud in thick, stage accents.
The numbers at the beginning of each line are the number
of steps to babelfish invariance for that language. If no stable
phrase has been found after 10 steps, the beginning and ending
phrases are printed, as with the German, Italian, and Spanish
translations of the third phrase.
It looks to us like the Spanish translations may be
better than the French; however, the German translations are
certainly the worst. Since we started this out by showing how
useful the German translations are, the Spanish translations must
be very good indeed.
We'll leave you with a few questions.
Reader Quiz 1: Obviously, different input words can
produce different translations, but does babelfish take
punctuation into account, or just pass it through, unchanged?
Also noteworthy is the final phrase's failure to
stabilize in several languages. In his post, David notices that
some strings fail to stabilize because the translation goes into
an infinite regression. Try this:
telephone -v -t french -e pizza
Reader Quiz 2: can anyone offer a word or phrase that
puts telephone into an infinite loop by alternating
between two (or more) translations?
Oh, and the two new other phrases? ``Out of sight, out
of mind.'' has a venerable history in machine translation. It
is said that an early attempt to translate this phrase to Russian
and back returned ``Invisible idiot.''
Reader Quiz 3: can anyone out there provide a real
citation for this oft-recited, possibly apocryphal story?
And finally,
Reader Quiz 4: can anyone out there tell us what CITRAN
was, why it blew dead aardvarks, and who originally pointed this
out?
Summary
Just after we sent this article off to Lisa and Mike,
National Public Radio ran an item about the Consortium for Speech
Translation Advanced Research, or C-STAR, headquartered at
Carnegie-Mellon University. They've developed a prototype
machine translation system that, by operating in restricted
domains -- their example is a travel agency, can do nearly-simultaneous translation from voice input. If you have
the RealAudio plug-in for your browser, you can listen to it from
the NPR web page for the July 22nd edition of ``All Things
Considered,'' at http://www.npr.org.
In this column, we tried to tie together three topics
that we're interested in: programming for the fun of it, the web,
and internationalization.
But what about the art on Haemer's walls? If you want
to come see it, send him email in English, French, German,
Italian, Portuguese, or Spanish.
Plan to bring salad.
Until next time, happy trails.