DISCLAIMER: THESE PAGES ARE STILL UNDER CONSTRUCTION. NO CODE EXAMPLE BEEN TESTED YET.

Perl - my idiosyncracies

Be more explicit than you have to - variables, scalar context, using return, and precedence


[Previous Page] |[Next Page] Table of Contents: small | med | large

One of the great things in Perl is that it is a hacker's license. It evolved from a programmer to get work done, and it just grew. It's designed for people who like to just get things done. And to that end, there are a lot of shortcuts you can take and a lot of things you can omit.

I think that a lot of the shortcuts make it more difficult to read code. So I tend to go out of my way to being explicit and try to clarify what's going on. I've been programming long enough that I've had to go back to code that I wrote a long time ago, and gotten confused. So I've realized the importance of clarity in coding.

That being said, I'm a hacker at heart, so I'd say: experiment a little bit. Figure out which shortcuts work for you, and which don't. Different people will have different degrees of hacker, and will have different ideas of what makes code readable.

These are not meant to be rules, or God forbid, standards of coding practices. But these are just an outline of what has worked for me. And if you're trying to read my code, this is an outline of what to expect.

Name your iterator - foreach [back to top]

In Perl, you don't have to name your iterator, but I believe that you always should. Let's start with an example I don't like:

EX 3.1.1: Binding a material to a patch

  1 #!/bin/sh
  2 #! -*- perl -*-
  3 eval 'exec $PERLLOCATION/bin/perl -x $0 ${1+"$@"} ;'
  4  if 0;
  5 
  6 # Assume $material was already defined
  7 
  8 	$materialExp = $materialExpHash{$material};
  9 
 10 	foreach (@patches) {
 11 		next if (! /$materialExp$/);
 12 		&bindMaterial($_, $material);
 13 	}

Listing 3.1.1 for code_untested/bindMaterial-1.pl
I don't like this because it uses the $_ variable a lot. Don't get me wrong. I'm not saying to ban this special variable. I use it myself. However, many beginners are a little put off when they see $_, roll their eyes, and complain about how incomprehensible Perl is. Well, it doesn't have to be.

In the above example, we skip the standin geometry (assuming the patch name ends with the word "standin"). Now, in our regular expression test, we just have a regular expression comparison sitting there. In this case, since we don't define an iterating variable, then the name of each patch is stored in the special variable, $_. Many many Perl commands will operate on the $_ variable if none other is specified. This includes the regular expression match operator. Finally, we send the $_ variable to the bindMaterial. While this is done explicitly, it still puts people off to just see the $_.

Perl allows you to define the iterating variable. I think it is actually scoped to the foreach, but I still explicitly scope it to the block, using my just in case. As long as I'm explicitly declaring my iterators, I'll also iterate over the materials:

  1 #!/bin/sh
  2 #! -*- perl -*-
  3 eval 'exec $PERLLOCATION/bin/perl -x $0 ${1+"$@"} ;'
  4  if 0;
  5 
  6 foreach my $material (@materials) {
  7 	my $materialExp = $materialExpHash{$material};
  8 	foreach my $patch (@patches) {
  9 		if ($patch =~ /$materialExp$/) {
 10 			&bindMaterial($patch, $material);
 11 		}
 12 	}
 13 }

Listing 3.1.2 for code_untested/bindMaterial-2.pl
Here, we've given the name $patch to the iterator variable. I think this looks more readable. Also, note that since we defined an iterator variable, we need to use it explicitly in the regular expression test. I think this is a good thing. Since we're using nested loops, I think it is especially good to declare the iterating variable.


Name your iterator - while [back to top]

As you read from a file handle, it is normai to do something like:

EX 3.1.2: Finding the named coordinate systems in a RIB file

  1 #!/bin/sh
  2 #! -*- perl -*-
  3 eval 'exec $PERLLOCATION/bin/perl -x $0 ${1+"$@"} ;'
  4  if 0;
  5 
  6 # Suppose RIBFILE names coordinate systems by using
  7 # CoordinateSystem "coordSysName"
  8 
  9 $ribFileName = $ARGV[0];
 10 @namedCoordinateSystems = ();
 11 
 12 open(RIBFILE, $ribFileName) or die "can't open $ribFileName!";
 13 while (<RIBFILE>) {
 14 	next if (! /^CoordinateSystem/);
 15 	my(@tokens) = split(/\s+/);
 16 	push @namedCoordinateSystems, $tokens[1];
 17 }
 18 close(RIBFILE);
 19 
 20 print "I have the named coord systems [@namedCoordinateSystems]\n";

Listing 3.1.3 for code_untested/getRibCoords-1.pl

Again, we are overusing $_, mostly implicitly. When you read from the filehandle RIBFILE in the scalar context, the value of the current line is stored in the variable, $_. Also, the regular expression and the split will operate on $_ by default. I don't like having this extra layer of stuff to think about. I like to name this variable just to remind myself what I was doing with the current variable.

  1 #!/bin/sh
  2 #! -*- perl -*-
  3 eval 'exec $PERLLOCATION/bin/perl -x $0 ${1+"$@"} ;'
  4  if 0;
  5 
  6 # Suppose RIBFILE names coordinate systems by using
  7 # CoordinateSystem "coordSysName"
  8 
  9 $ribFileName = $ARGV[0];
 10 @namedCoordinateSystems = ();
 11 
 12 open(RIBFILE, $ribFileName) or die "can't open $ribFileName!";
 13 while (<RIBFILE>) {
 14 	my $currline = $_;
 15 
 16 	next if ($currline !~ /^CoordinateSystem/);
 17 	my(@tokens) = split(/\s+/, $currline);
 18 	push @namedCoordinateSystems, $tokens[1];
 19 }
 20 close(RIBFILE);
 21 
 22 print "I have the named coord systems [@namedCoordinateSystems]\n";

Listing 3.1.4 for code_untested/getRibCoords-2.pl


Explicitly say scalar [back to top]

When dealing with an array, you can get the number of elements of the array by getting the scalar context of the array. That is, we could say:

	while (@remainingActions) {
		my $currAction = shift @remainingActions;

		# do something with the action
		#
	}
Inside the while, the array gets implictly evaluated as a scalar. When we shift off the last action, there will be zero elements left in the array, so we hit the termination condition of the while loop. This is a compact valid way to do this. Personally, I like bringing attention to the fact that there is a conversion to a scalar, so I do it explicitly. Also, I like to make it explicit that the terminating condition of the loop is that there are 0 elements left. So you will see a lot of my while loops look like:
	while (scalar(@remainingActions)>0) {
		my $currAction = shift @remainingActions;

		# do something with the action
		#
	}
This chapter isn't called "My Idosyncracies" for nothing.

Explicitly say return [back to top]

In Perl, you don't really need to explicitly return out of a subroutine. Often I forget to. But I don't like myself for that. When possible, I try to use the return statement. If you forget the return, then it will just return the last evaluated value in the code block, which is not always desirable. When you use:

	return;
this will return an undef do you can actually use this to trap for errors. In The Perl Cookbook [CHRI98] , they bring up the good point that if your subroutine is returning an array, then you don't want to simply say return, but you actually probably want:
	return ();
If you're wondering why, the Cookbook explains this. But basically, you are probably going to receive the return value into an array. However, if you simply return, then you will return an undef. An undef is a valid element for an array element. So you will actually get an array with one element, namely undef. Generally, it will be more productive to use the empty list as an error or "no data" condition than it will a 1-element array with an undef.

I just prefer not to implicitly return values either. I think it makes it easier to read the code if I can go back to the subroutine and just look for the return.


Parens to avoid precedence issues, and other formatting comments [back to top]

It's useful to know the order of precedence of Perl operators. You can read the man pages and see a table. But let's say you have an if block that you want to evaluate if one of 3 conditions is satisfied:

We could implement this with:
	if ( $hr < 10 && $ampm eq 'pm' || $name1 eq 'Jennifer' && $name2
			eq 'Love' && $name3 eq 'Hewitt' || $cashBox < 200) {
		&admitGuest();
	}
Okay, maybe this really wasn't a production example. I just felt like using the name "Jennifer Love Hewitt" in an example.

Now, I just think that's a mess. Consult with the precedence tables, and you'll see that it's okay. But I can never remember whether and(&&) or or(||) has precedence, and it's hard for me to group terms together. Since i can never remember precedence, and I know I won't remember precedence rules when I debug the code later (yes, my code will usually have bugs), I prefer to see:

	if ( (($hr < 10) && ($ampm eq 'pm'))
			|| (($name1 eq 'Jennifer') && ($name2 eq 'Love')
						&& ($name3 eq 'Hewitt'))
			|| ($cashBox < 200)) {
		&admitGuest();
	}
Potayto, potahto. But I like throwing a lot of parens in there so I can keep track of what is bound to what. It doesn't hurt to spread things out with multiple lines, indents, and intelligently applied whitespace.

I would also make a note about multiline expressions. I personally like to start each line with an operator as a reminder that I should look back at the previous line to see the beginning of the expression. Also, it reminds me that it is not starting a new line. I should mention that this is against the recommendations of the perl style. In this case (and, or), the perlstyle man pages (I think they're written by Tom Christiansen) actually agree. But he says that in most cases, you should leave the operator ending the previous line. I personally prefer to start lines with the operators. Christiansen actually writes a lot of intelligent stuff, so you may want to consider listening to him. But everyone has their own style.

Actually, one other thing - indents. Christiansen recommends 4 space indents. I like to just hit the tab key. 4 spaces makes sense, You will be able to fit more code on a line, and you can go deeper easier. But if it gets to be an issue, I tend to start breaking things out into subroutines. I don't say the tab-indent is a good thing. That's just what I do, and it is honestly a product of laziness (off course, Larry Wall, the inventor of Perl, recognizes the 3 virtues of a programmer are: laziness, impatience, and hubris.


Use more variables than you have to [back to top]

I often will use more variables than is necessary. Because of the way perl is set up, you often don't need a lot of temporary variables. You can just plug command into command into command. For instance:
	foreach my $leftPatch (sort {$a->{'patchName'} cmp $b->{'patchName'}}
					grep($_->{'patchName'} =~ /^L_/,
						@{$scene->{'patchList'}}) {
		# do something with left patches
	}
Well, that's an awful lot going on. I tend to be rather loose about making new variables. I'm not designing operating systems. I usually write scripts that do something and get out, so quite frankly, even if I blow 100 Meg of RAM, I don't really care much. That's usually still small enough to land on a desktop machine on the queue anyway. So, with that in mind, I would tend to really write the above code as:
	@patchList = @{$scene->{'patchList'}};
	@leftPatches = grep( $_->{'patchName'} =~ /^L_/, @patchList);
	foreach my $leftPatch (sort
				{$a->{'patchName'} cmp $b->{'patchName'}}
				@leftPatches) {
		# do something with left patches
	}
Memory is cheap again, and so variables are too. I like making these variables even if I just use them once because it breaks down my thought process and kind of provides self documentation in the code. Some people might even extend that one further and separate the sort criterion into a subroutine by itsef:
	sub byPatchName {
		# Okay, okay, so I don't ALWAYS say 'return'.  Sort routines
		# are a notable exception.

		$a->{'patchName'} cmp $b->{'patchName'};
	}

	@patchList = @{$scene->{'patchList'}};
	@leftPatches = grep( $_->{'patchName'} =~ /^L_/, @patchList);
	# I don't think I'm allowed to use & down here otherwise I would.
	foreach my $leftPatch (sort byPatchName @leftPatches) {
		# do something with left patches
	}
This is a good idea, though usually I'm to lazy to do this. It's good from a code re-use standpoint too. One thing though is that if you do this, be careful about where you define the subroutine - you might not get as much re-use as you hoped, especially if you use hash values defined inside the code block you're executing the sort from. I think they call these closures, expecially if I use a variable lexically defined in that block. Yeah, yeah, that probably sounds like spouting gibberish. Try not to worry about it.

Using ${} inside quotes [back to top]

When I expand variables inside double quotes, I like to use the curly braces. It just explicitly says to me "This is a perl expansion." This helps a lot (to me) when I stick MEL generators inside of Perl. Since I can have variables in MEL (and I usually do) that also have the form, $variableName, it helps me distinguish a perl expansion since only Perl variables can have the form ${variableName}.

Also, suppose I'm generating some texture maps that have the form, patchName_color.tx for the base color. I would say:

	$textureName = "${patchName}_color.tx";
Note that there is a problem with:
	# BAD
	$textureName = "$patchName_color.tx";
Since underscore(_) is a valid part of variable names in Perl, then Perl will think the variable name is $patchName_color. Since I really wanted $patchName and $patchName_color probably doesn't even exist, it will probably evaluate to the null string. But using the curly braces protects us from this. So I see 2 advantages in using the curly braces: So I wind up using curly braces for variable expansion a lot. But again, not always.
© 2001 Steve Hwan, hostname: @pacbell.net, username: svhwan
You should probably use the word "PERL" in the subject line to get my attention.
Last Modified: Sun Dec 2 19:00:17 2001