Okay, now you know how use system and backticks(``),
but that's not always the right approach. Sometimes, but not always, and there
are more robust ways to do this. If you don't care about tracking the
process, these are fine ways of doing something, but there are the following
weaknesses:
Also, if you use the list context, you cannot send something into the background, and you cannot track the process ID. A beginner doesn't need to worry about this, and it's still okay for quickie scripts (I still use it...), but if you're worried about setting up pipeline scripts invoking processes, then you should be more concerned about tracking process IDs.
So it's kind of a running theme with me that invoking a shell
is a bad thing. That means that your program will start another process
as the shell, then the shell will transfer control to the command you asked
for. That's an extra level of process invocation that is completely
unnecessary. Also, it makes you dependent on having a sh on your system,
which is okay on UNIX, but might not be the case on other platforms.
Actually, it might not be okay in UNIX. I've been having a lot of problems
lately with sh meaning different things on different flavors of
UNIX...
I should mention that Randal Schwartz discusses a
lot of these concepts in his column
[SCHW99-10]
which I only discovered recently. Also, there is a pretty good discussion
in Programming Perl
[WALL00]
in the description of open. (The 2nd edition is pretty good,
but the 3rd edition is more complete.) Finally, the Perl Cookbook
[CHRI98]
also has a lot of examples of how to launch processes in chapter 16
(Process Management and Communication - especially Recipes 16.1-16.5,
and 16.10). These are the sources I've learned from.
Basically, the trick we'll use is to open a filehandle with "-|". This will basically start a child process and read the output from that through the file handle. Now, in the 3rd edition of Programming Perl [WALL00] , there is a note that there is a 3-or-more arg version of "open." This isn't available in earlier versions of Perl though. It looks very useful, but I would wait a little while before using it. I think it's only in 5.6 (5.006?) or maybe even higher. I think IRIX (SGI) still ships with version 5.4 (5.004) or even 5.003.
Basically, I would suggest using fork, exec and
open instead of system and backticks(``) for
really important production/pipeline stuff. If you're just writing a hacky
little script, then system and backticks are okay.
Additionally, when using system and exec, I would
suggest using the array context and avoiding shell metacharacters to avoid
unnecessarily invoking a shell.
Maybe I should take a moment and explain what system and backtick really do. Or at least close approximations(my guesses).
What does system really do? Let's start from the
outside looking in. What do we see as a user? The first thing you see is
that you execute another program, using a shell (if necessary) to evaluate
special shell metacharacters. Now I'm spewing gibberish. Let's just start
by saying "execute another program."
This is where fork and exec come in. If you are going to make
a lot of pipeline scripts, you really should learn this, and it's not too
difficult. A good reference is UNIX Network Programming
[STEV90]
and there's a bit of discussion in the function reference on fork
in Programming Perl.
[WALL00]
I'll try and summarize what I have gathered about UNIX boxes. I really have
no idea how Windows NT works though.
Basically, every script, every program, even the shell itself
running on your UNIX box is a "process," with a unique process identification (process id, or pid).
Often in shell scripting (and Perl for that matter), you can actually access
this with $$, but I digress.
In the beginning, there was nothing... well, not quite. When
your system boots up it will always run a program called
init. This in turn calls a lot of programs,
which in turn call other programs. Among others are the programs that you
interact with, such as the shell. Now, when I say that a program calls
other programs, there's a little more to it than that. Except for "init,"
each process has a parent, so there's really a big hierarchy of processes,
and as far as the system is concerned. On most (System V and POSIX
compliant) UNIX systems (except BSD), you can view all the processes running
with:
ps -efand on BSD, you can get close to that with:
ps -auxand the thing to look for are the columns PID and PPID. Note that "init" should have a really low number (like 0 or 1). This is the first process that was executed. You may see a bunch of daemons running like "lpd," "httpd," or "ftpd." Actually, I'm a little behind on this. I think a lot of the internet-related processes are now handled by "inetd"(internet-daemon) and that calls the others on an as-needed basis. (I could be wrong about that though.)
This is all building up to the mechanism by which processes spawn
other processes. It's quite simple, and usually referred to as "fork and
exec" (they go together hand in hand). Basically, with fork,
a process creates a copy of itself (the child). When it does this, a new
process ID is created for the child. This was actually difficult for me to
grasp, but here's my interpretation: There are now 2 copies of the current
process. But I think at first, they are probably sharing code segments in
memory. My best guess is that data segments are copied. That is, the parent
and child follow the same list of instructions, but each one has it's own set
of variables/data.
As you can
probably tell, I still don't fully comprehend it. But I think you get the
idea. You're running through the code, and suddenly, at the point of the
fork, think of the program running twice simultaneously, as the
original(parent) process and new(child) process.
The tricky part now is that you have these 2 identical processes
running, but you want them to do different things.
So when you're coding, how do you program something that is to be
interpreted different depending on if you're in the parent or child process?
That's just how fork works. fork returns a pid.
If you are in the parent, then fork returns the pid of the
child process. But if you are in the child process, you
get a 0. Then there's additional stuff to know if things go wrong, but
we'll deal with that in a minute. Anyway, for now, we'll say:
Make a tar file.
In this case, I'll take a simple example of a creating a tar file, and honestly, you'd probably just run "tar" directly. But you could use this to fire off a process to the queue. Also, the quickie way to do this would be:
$dirName = $ARGV[0];
$status = system("tar","cvf", $dirName. '.tar', $dirName);
And honestly (as you'll see later), this is how I would do it for this task.
But since this heading was "what system really does," then let's see what
this looks like under the hood. Just to clarify, system calls
C code that does this. I'm using Perl itself to describe the equivalent steps
to what is going on under the hood.
1 #!/bin/sh
2 #! -*- perl -*-
3 eval 'exec $PERLLOCATION/bin/perl -x $0 ${1+"$@"} ;'
4 if 0;
5
6 $dirName = $ARGV[0];
7
8 # This is pretty much lifted directly
9 # from "Programming Perl" by Wall et al.
10 if ($pid=fork) { # assign result of fork to $pid,
11 # see if it is non-zero.
12 # Parent process here
13 # Child pid is in $pid
14 } else {
15 # neglecting to test if fork even worked is actually pretty sloppy.
16
17 # Child process here
18 # parent process pid is available with getppid
19
20 # exec will transfer control to the tar process, and will
21 # finish (exit) when the tar is done.
22 exec("tar", "cvf", $dirName . ".tar", $dirName);
23 }
24
25 # wait for the tar to complete and get the status (like system does)
26 waitpid($pid,0);
27 $status = $?;
28
Listing 4.1.1 for code_untested/makeTar-1.pl
That's a little sloppy, but it gets the point across. (hopefully) The flaw here is that I ignore the possibility of an error when fork-ing the new process. In this case, if there was a problem forking, then the parent process will actually transfer control to "tar" because $pid is not defined so it evaluates to zero. Now, honestly, this is how I usually do it, but it's not that good, especially since Programming Perl tells you how to do it better, being much more careful about error conditions:
I guess I threw exec in there assuming that I'd have
covered it in an earlier page. It's basically a lot like system, except it
never returns. That is it transfers control from the current process to the
new program, and lets go of it's previous list of instructions(code segment).
For that matter, it also lets go of its copy of the parent's data segment.
If you pass any arguments in the exec call, that list is given to the new
process. (Now would be a good time to read UNIX Network Programming
[STEV90]
by the way...) There are also options
to pass down the environment too, but that's not worth worrying about at the
moment.
1 #!/bin/sh
2 #! -*- perl -*-
3 eval 'exec $PERLLOCATION/bin/perl -x $0 ${1+"$@"} ;'
4 if 0;
5
6 $dirName = $ARGV[0];
7
8 # This is pretty much lifted directly
9 # from "Programming Perl" by Wall et al.
10 FORK: {
11 if ($pid=fork) { # assign result of fork to $pid,
12 # see if it is non-zero.
13 # Parent process here
14 # Child pid is in $pid
15 } elsif (defined($pid)) {
16 # Child process here
17 # parent process pid is available with getppid
18
19 # exec will transfer control to the tar process, and will
20 # finish (exit) when the tar is done.
21 exec("tar", "cvf", $dirName . '.tar', $dirName);
22 } elsif ($! == EAGAIN) {
23 # EAGAIN is the supposedly recoverable fork error
24 sleep 5;
25 redo FORK;
26 } else {
27 #weird fork error
28 die "Can't fork: $!\n";
29 }
30 }
31
32 # wait for the tar to complete and return status (like system does)
33 waitpid($pid,0);
34 $status = $?;
35
36 # This part would be executed by both parent and child, except
37 # we used "exec" in the child, so effectively, this is only the parent.
38
Listing 4.1.2 for code_untested/makeTar.pl
At least that's the thorough, complete way of doing it. And I would encourage you to do that way.
Rather than running tar, this example runs perl -c.
Not a big deal, but it also integrates
earlier notes on File::Find
Not a big deal, but it is something that I have used on the code repository
for this site itself.
1 #!/bin/sh
2 #! -*- perl -*-
3 eval 'exec $PERLLOCATION/bin/perl -x $0 ${1+"$@"} ;'
4 if 0;
5
6 use File::Find;
7
8 # This is pretty much lifted directly
9 # from "Programming Perl" by Wall et al.
10
11 sub usePerlDashC {
12 my $filename = shift;
13 FORK: {
14 if ($pid=fork) { # assign result of fork to $pid,
15 # see if it is non-zero.
16 # Parent process here
17 # Child pid is in $pid
18 } elsif (defined($pid)) {
19 # Child process here
20 # parent process pid is available with getppid
21
22 # exec will transfer control to the child process,
23 # and will finish (exit) when the tar is done.
24 exec("perl", "-c", $filename);
25 } elsif ($! == EAGAIN) {
26 # EAGAIN is the supposedly recoverable fork error
27 sleep 5;
28 redo FORK;
29 } else {
30 #weird fork error
31 die "Can't fork: $!\n";
32 }
33 }
34
35 # wait for the perl -c to complete and return status (like system does)
36 waitpid($pid,0);
37 $status = $?;
38
39 return $status;
40 }
41
42
43 &File::Find::find( sub {
44 if ($_ =~ /\.p[lm]/) {
45 print "checking [$_]\n";
46 &usePerlDashC($_);
47 }
48 }, "."
49 );
50
Listing 4.1.3 for code_untested/checkPerl.pl
Fundamentally, backticks really are a similar deal to
system.
Let's take another simple project:
Read a file into an array. If the file is gzipped, then unzip the file before reading it in.
1 #!/bin/sh
2 #! -*- perl -*-
3 eval 'exec $PERLLOCATION/bin/perl -x $0 ${1+"$@"} ;'
4 if 0;
5 $fileName = $ARGV[0];
6
7 if ($fileName =~ /\.gz$/) { # File ends with ".gz"
8 @contents = `gzip -f -c -d ${filename}`;
9 } else {
10 @contents = `cat ${filename}`;
11 }
Listing 4.1.4 for code_untested/readFile-1.pl
What an attrocious piece of code that is. Why don't I like it?
1 #!/bin/sh
2 #! -*- perl -*-
3 eval 'exec $PERLLOCATION/bin/perl -x $0 ${1+"$@"} ;'
4 if 0;
5 $fileName = $ARGV[0];
6
7 if ($fileName =~ /\.gz$/) { # File ends with ".gz"
8 open(INPUTFILE, "gzip -f -c -d ${filename} |")
9 or die "couldn't open $filename: $!";
10 } else {
11 open(INPUTFILE, $filename)
12 or die "couldn't open $filename: $!";
13 }
14
15 while (<INPUTFILE>) {
16 my $line = $_;
17 # Here, we could just process the file line by line, but
18 # for this example, we'll just push it onto an array.
19 push @contents, $line;
20 }
21 close(INPUTFILE);
Listing 4.1.5 for code_untested/readFile-2.pl
Well, this was a little better. We're at least using file handles, and this should address the last 2 issues from example 1. I also like this because we can work on the input one line at a time.
But there's still the fundamental theme of this note: The gzipped file still opens a shell because it has spaces inside the quoted section of the "open" in line 8.
1 #!/bin/sh
2 #! -*- perl -*-
3 eval 'exec $PERLLOCATION/bin/perl -x $0 ${1+"$@"} ;'
4 if 0;
5 $fileName = $ARGV[0];
6
7 if ($fileName =~ /\.gz$/) { # File ends with ".gz"
8 my $pid;
9 if (not defined($pid = open(INPUTFILE, "-|"))) {
10 die "can't fork: $!";
11 }
12 if ($pid) {
13 # parent process - do nothing
14 } else {
15 # child process
16 system( "gzip", "-f", "-c", "-d", $filename);
17 exit 0;
18 }
19 } else {
20 open(INPUTFILE, $filename)
21 or die "couldn't open $filename: $!";
22 }
23
24 while (<INPUTFILE>) {
25 my $line = $_;
26 # Here, we could just process the file line by line, but
27 # for this example, we'll just push it onto an array.
28 push @contents, $line;
29 }
30 close(INPUTFILE);
Listing 4.1.6 for code_untested/readFile-3.pl
Now, we're getting a little more complicated. But basically, in the
case of gzipped file, think of this as a safer way to start up another
program (using open with "-|" instead of system or backtick) and listening
to its output. This is just explicitly opening that other process and
keeping track of the new process ID (pid). Above, I had an extensive
description of fork and exec. Well, using
open with "-|" (read from handle) or "|-" (write to handle)
is just using fork under the hood. But it has the additional
bonus that "-|" will take the STDOUT(output) from the child process and read
it in through the filehandle. Similarly, if you use "|-" then writing to
the filehandle would write to the STDIN(input) of the child process.
As a note, I've tried to do this in the ActiveState version of Perl 5.6, but it wouldn't allow me to use the 2 arg form of "|-". Basically, what this means is: stick to UNIX and Linux. Stay away from NT. Actually, you can probably get away with the 3+ arg form of open for most things you try to do. But I actually regard this as a bug, because the shipped pages imply that it can handle the 2 arg form too.
Backticks just basically use this mechanism, reading from the STDOUT of a child process, and waiting for the process to complete.
What more could a programmer want? Well, I still have a couple nit-picky issues with this.
/.gz$/ business. That's just such an
unbelievably crappy way to determine if the file is gzipped. It
imposes this file naming convention on the user, and sometimes it
is beyond their control. For instance, if they were using MTOR to
generate binary gzipped RIB files, their files would end with
".rib".
The way that I'll address that last one will be to cheat. Bascially, I will open up the file twice. The first time, I will just look at the first few characters to decide if it has the gzip magic number. This is something that is automatically put in by gzip, and is unlikely to occur by chance in another file.
1 #!/bin/sh
2 #! -*- perl -*-
3 eval 'exec $PERLLOCATION/bin/perl -x $0 ${1+"$@"} ;'
4 if 0;
5 $fileName = $ARGV[0];
6
7 $magicNumber_GZIP = pack("C*", 0x1f, 0x8b, 0x08, 0x08);
8
9 # Just peek at the first 4 characters in the file and
10 # close it again.
11 open(INPUTFILE, $filename) or die "Couldn't open file $filename: $!";
12 binmode(INPUTFILE);
13 read(INPUTFILE, $magicNumber, 4);
14 close(INPUTFILE);
15
16 if ($magicNumber eq $magicNumber_GZIP) { # file starts w/gzip magic number
17 my $pid;
18 if (not defined($pid = open(INPUTFILE, "-|"))) {
19 die "can't fork: $!";
20 }
21 if ($pid) {
22 # parent process - do nothing
23 } else {
24 # child process
25 exec( "gzip", "-f", "-c", "-d", $filename);
26 }
27 } else {
28 open(INPUTFILE, $filename)
29 or die "couldn't open $filename: $!";
30 }
31
32 while (<INPUTFILE>) {
33 my $line = $_;
34 # Here, we could just process the file line by line, but
35 # for this example, we'll just push it onto an array.
36 push @contents, $line;
37 }
38 close(INPUTFILE);
Listing 4.1.7 for code_untested/readFile.pl
Sorry about that. That
$magicNumber_GZIP = pack("C*", 0x1f, 0x8b, 0x08, 0x08);
just came out of nowhere, didn't it? I don't know how to justify it. I
just ran "od" (octal dump) on a few gzipped files, and by experimentation,
saw that they all seemed to begin with this sequence of characters.
Anyway, this is pretty much the way I think this task should be
done. We no longer have any extra shells open, and we check for errors in
most cases, so we're opening files safely. And no "system" calls or
back ticks in sight.
Well, I should say that "system" and backticks have their place, especially for quickie scripts, or scripts where you really don't care about the exit status or return values from the child process, but just want to invoke something. However, if you are working on a core system or pipeline script, you should make your script a little more robust.
Really, I'm not trying to hide anything from you. Right now, I'm just not very familiar with signal handling. Basically, this is handling what happens when processes die or get killed. For example, if you hit Ctrl-C, does it hit the child process, the parent process, or both? Well, I'm not really good at that stuff yet, so this discussion will be incorporated into the above examples once I learn it.
But what I'm trying to say here is that the above doesn't tell you exactly what system and backticks do, but it comes pretty close.
To demonstrate some of these concepts, I'll use the following:
Make a gzipped tar file (commonly referred to as a "tarball"). Do not use any temporary files (use streams).
Now, for anyone on a Linux box, you realize you can already do this on the UNIX prompt by:
tar zcvf directoryName.tar.gz directoryName
And if you're on a Sun or IRIX box or something, you know you can do this in 2 steps:
tar cvf directoryName.tar directoryName gzip directoryName.taror even a single step with piping between the 2 processes:
tar cvf - directoryName | gzip -f -c > directoryName.tar.gzand you could always make an alias or something for that...
No trade secrets there... But it will be a good demonstration of the concepts I'm going to go over. Also, since I work a lot on IRIX boxes, I do wind up using tarball.pl quite a bit.
1 #!/bin/sh
2 #! -*- perl -*-
3 eval 'exec $PERLLOCATION/bin/perl -x $0 ${1+"$@"} ;'
4 if 0;
5
6 $dirName = $ARGV[0];
7
8 system "tar cvf - ${dirName} | gzip -f -c > ${dirName}.tar.gz";
Listing 4.1.8 for code_untested/tarball-1.pl
This is a very simple implementation, but not very clean. The main thing is that our string has spaces in it, so we're invoking a shell. By now, I shouldn't have to tell you how much I dislike invoking a shell. But now, it's a little trickier, and it actually took me about three days to figure out how to do this, and it's still not quite polished.
Now, my gut instinct would be to fork/exec a process
and send everything to that. But we cannot simply say:
exec( "tar", "cvf", "-", $dirName, "|", "gzip", "-f", "-c", ">", "$dirName.tar.gz");But we have a new problem. Those | and > in there are shell metacharacters. So this would actually invoke the shell anyway even though we used the list context. There's got to be an additional trick here.
The trick here is to redirect the STDOUT, and there is a
description of how to do this in
Programming Perl
[WALL00]
in the description of open
The strategy here will be to invoke the two processes, and make them talk to each other. Easier said than done. I basically work backwards, redirecting STDOUT for each process.
1 #!/bin/sh
2 #! -*- perl -*-
3 eval 'exec $PERLLOCATION/bin/perl -x $0 ${1+"$@"} ;'
4 if 0;
5 # Perl script tarball-2.pl
6
7 # make a gzipped tar file out of a directory
8
9 $| = 1;
10
11 $inFile = $ARGV[0];
12 $outFile = $inFile . '.tar.gz';
13
14 if ($gzippid = open(PIPE_TO_GZIP, "|-")) {
15 # parent
16 } else {
17 # child
18 open(STDOUT,">$outFile");
19 binmode(STDOUT);
20 select(STDOUT);
21 $| = 1;
22 exec('gzip','-f','-c');
23 }
24 binmode(PIPE_TO_GZIP);
25
26 open(SAVEOUT,">&STDOUT");
27 open(STDOUT,">&PIPE_TO_GZIP");
28 binmode(STDOUT);
29 select(STDOUT);
30 $| = 1;
31 system('tar','cvf','-',$inFile);
32 open(STDOUT,">&SAVEOUT");
33 print "Now I'm showing off STDOUT being restored...\n";
Listing 4.1.9 for code_untested/tarball-2.pl
Again, working backwards, we first start in line 14, opening a pipe to our gzip process. Normally with gzip, you would say:
gzip filename.tarbut that's when you're passing in the name of a file. In our case, we're actually receiving our file piped into STDIN (of the gzip). In this context, gzip will want to pipe out to its STDOUT.
We are activating the gzip inside a child process though, so we can actually redirect the STDOUT without affecting the parent process (the main program). In this case (line 18), I redirect the STDOUT of the process to the output file. So any output that comes from this child process will write to the output file. Then in line 22, I transfer control to the gzip. I do not give it an input file, so it is just waiting for something to come in through STDIN.
The business in lines 19-21 and 28-30, I don't know if these are really necessary. But since I just redirected STDOUT, I thought it would be safe. $| just tells Perl not to buffer its output.
So now we have an open filehandle, PIPE_TO_GZIP that is waiting for some input.
Next, I get ready to do the tar. In this form of tar (line 31), we can output the tar file to STDOUT rather than to a file. (The key there is passing in the "-" as an argument.) And using the same technique as the gzip above, we redirect the STDOUT in line 27. In this case, we will invoke a process that will spew output to a file handle, and conveniently enough, we also have a filehandle(PIPE_TO_GZIP) just waiting for input. So in line 27, we get ready to send all of our STDOUT to the PIPE_TO_GZIP.
Though it'ss unnecessary, just for grins, I was playing around
with saving the STDOUT filehandle and restoring it (lines 26 and 32, as
described in Programming Perl
[WALL00]
) In this case, it's unnecessary because after the tar is done, the program
exits. But I just wanted to have that code sitting around someplace,
because you never know when you might want to do that.
Now, observant people might have noticed that after this
extrememly long explanation about how to avoid system,
I used one right there in line 31. Well, you may also recall I said it was
silly to do all this extra stuff for a process as simple as that. Also,
now that I look it over, I'm very sloppy about the $gzippid in line 14. I
should really check to see if it is defined first.
Honestly, that's the version that I've been using in production now for months, but if you're really picky, we can clean it up a little bit.
By the way, notice that in this case, we are piping out to gzip with "|-" where in our previous example, we were preprocessing a file with gzip to read a file from gzip using "-|".
1 #!/bin/sh
2 #! -*- perl -*-
3 eval 'exec $PERLLOCATION/bin/perl -x $0 ${1+"$@"} ;'
4 if 0;
5 # Perl script tarball.pl
6
7 # make a gzipped tar file out of a directory
8
9 $| = 1;
10
11 $inFile = $ARGV[0];
12 $outFile = $inFile . '.tar.gz';
13
14 if (not defined($gzippid = open(PIPE_TO_GZIP, "|-"))) {
15 die "Can't fork: $!\n";
16 } else {
17 # child
18 open(STDOUT,">$outFile");
19 binmode(STDOUT);
20 select(STDOUT);
21 $| = 1;
22 exec('gzip','-f','-c');
23 }
24 binmode(PIPE_TO_GZIP);
25
26 open(SAVEOUT,">&STDOUT");
27 open(STDOUT,">&PIPE_TO_GZIP");
28 binmode(STDOUT);
29 select(STDOUT);
30 $| = 1;
31
32 if (not defined($tarpid=fork)) {
33 die "Can't fork: $!\n";
34 } elsif ($tarpid == 0) { # child process
35 exec("tar", "cvf", "-", $dirName);
36 }
37
38 # wait for the tar to complete and return status (like system does)
39 waitpid($tarpid, 0);
40 $status = $?;
41
42 open(STDOUT,">&SAVEOUT");
43 print "Now I'm showing off STDOUT being restored...\n";
Listing 4.1.10 for code_untested/tarball.pl
So hopefully, you've found that a little helpful in understanding
how processes work in Perl (and UNIX, for that matter). Though this example
is a very simple one, hopefully, you can see how you could generalize it
to string an arbitrary number of processes together. (Or perhaps even
generalize the open section to take in a filehandle to redirect
to (see the fork section for saving or redirecting handles) and give it an
array or array reference to pass into the exec and make your
own generalized system/backtick function to give yourself the
best of both worlds (ability to use array context to avoid the shell
and be able to read the output of the process.