DISCLAIMER: THESE PAGES ARE STILL UNDER CONSTRUCTION. NO CODE EXAMPLE BEEN TESTED YET.

Perl - my idiosyncracies

Scoping and objects


[Previous Page] |[Next Page] Table of Contents: small | med | large

You'll rapidly discover that I have radically different ideas about what makes software readable, usable, and debuggable than a lot of the rest of the world. Still, what I'm trying to share here are my recommendations, and what has worked for me, not the rest of the world. My differences mostly stem from my philosphy of: Expose the implementation and make things hackable. You'll hear the phrase "Object Oriented Programming"(OOP) tossed around a lot. I don't like this methodology, though a lot of the rest of the world does. The idea behind OOP (and you can see this a lot in C++ classes) is that everything should be a black box. When presented with a class, you will be given the methods the programmer intended for you to use to access what they think you should access, and you're not supposed to question (or care) how they implemented it under the hood.

I have found that this just doesn't work for me. I like knowing what is under the hood, because it helps me decide on the outside what the best way for me to call the routines is. It also helps me make an informed decision as to whether or not the class is right for my implementation. My problem in C++ is that the idea of hierarchies and virtual functions and overloading is that these mechanisms support the idea of hiding functionality, and burying the implementation under a mess of "good design." In C, there is usually a pretty direct correlation between what you do in C vs. what is happening at the processor level. In C++, there are more layers of abstraction that you have to think through.

My feeling is that if I am programming, I should know how I want to implement everything, but I might not have the time or desire to program it. So I may want to buy an off-the-shelf toolkit. If it is open, then I can check to see if they implemented it the way I think they should implement it. A toolkit should serve you and provide shortcuts that save you time. It should not dictate your workflow or methodologies.

But many programmers feel that when they are creating an object/class, that they know better than the entire rest of the world for all of eternity, and that they should close their design and limit the way others use their software. Programming is such a thankless job most of the time, and I guess they need to seize their moments/power when they can get it.

In C++, there is a tendency to want to use overloading to make everything look the same, like it is straight C++, even though there is a lot of special stuff happening under the hood. I dislike this practice because I think it makes it harder to debug code. As I'm reading code and I see a + in something written by an OOP-er, I don't know for sure if it really means +, or whether this is overloaded. Or worse, a hierarchy or virtual function for overloading the +...

So what I'm saying is that I know that I sometimes have radical views in this area, and you may disagree with me. In fact, you'll find many programmers who disagree with me.


Using my to scope in codeblocks [back to top]

I use my all over the place. In examples on other pages, you've probably seen my use of it, even though I haven't explicitly said anything. my (and the now-hardly-ever-used-for-anything-but-filehandles local are ways to scope variables.

I might address the differences between my and local later, but both Programming Perl [WALL00] and Advanced Perl Programming [SRIM97] discuss this. Consider the subtle differences to be an advanced topic.

Using my tells Perl that you are only using a variable in a limited scope. That is, you're only going to use a variable inside a certain subroutine or inside a certain for-loop(like an iterator variable) or something. Let's say you were in the main part of your program and had a variable $filename, but inside your foreach loop, you also have a $filename. The most recent my will take precedence:

	$filename = $defaultFilename;
	foreach (@filenames) {
		my $filename = $_;
		# $filename will hold the name of every individual filename
		# in the list.  But since it is scoped inside the foreach,
		# it will not affect the global $filename outside of this
		# loop.
		print $filename,"\n";
	}
	# But here, we've exited the scope of the foreach, so $filename is
	# the global $filename again, which we had set to the $defaultFilename
	print $filename,"\n";
By contrast, if we had not used my:
	$filename = $defaultFilename;
	foreach $_ (@filenames) {
		$filename = $_;
		# $filename will hold the name of every individual filename
		# in the list.  But this sets the global.
		print $filename,"\n";
	}
	# Here, we've exited the scope of the foreach, but since we didn't
	# "my" the $filename inside the foreach, we've been setting the
	# global filename all along.  So unless we had an empty list of
	# @filenames, the default has been overridden.
	print $filename,"\n";
I like to think of using my as declaring something to be a temporary variable. In fact, my general rule for using my is:
Don't use my unless you can clearly see the point in the code where you can undef the variable before entering a new subroutine/execution scope unless you are already inside a subroutine/execution scope.
Actually, that's a little too general(and there is a notable exemption for custom sort, grep, and map routines...). Maybe I should just say:
Only use my inside code blocks, but not for member variables of a class.
because that's what I really mean.

Member variables of a class? What am I talking about? Typically, in Perl, you implement your objects/classes(C++ lingo) in separate files, and you use package to designate the class name. I use this all the time for my modules. It's actually very convenient to split up the namespace. In C++, you could designate variables inside a class (OOP-ers like to call these "member variables") and you can declare them to be "public" or "private"(or "protected"). When you declare something as "private" or "public" in C++, you are restricting access to this variable so that only subroutines declared in that class can access it. Why would someone do this? So they can feel comforted that everyone is seeing the world their way, and so they are guaranteed that no one on the outside is setting something in their routine that could screw up one of their modules. I personally think this should just be documented, and whoever's using the code should have the option to use it however they want, whether or not it fits into the original programmers paradigm, and it is much friendlier to simply document and suggest that someone leave a variable alone, rather than mandate that someone leaves a variable alone. If you had a reason for wanting the convenience of access to that variable outside of a single subroutine, it's possible that an ouside user could want that variable too.

Some of my concerns were alleviated when I read a trick somewhere (I don't remember where):

	// This is just a quick diversion to C++
	#define private public
	#define protected public

That all being said, if you're creating an object/class in Perl, you can use my inside the file and that will scope the variable to the file. This means that if you call the variable outside of the module, it will look undefined to the outside program, but all the subroutines inside of the file can treat it like a global variable - i.e. they do not need to declare the variable inside of the subroutine itself.

I very passionately disapprove of this practice that effectively makes "private variables" (See my author bio), so I'm not going to give any explicit examples anywhere on these pages about how to do this. I will, however, mention that Programming Perl and The Perl Cookbook both discuss this, and I think Advanced Perl Programming probably does too. There are many resources you can go to if it is important to you to have a private variable.

My recommendation though is:

Don't use private member variables.

Don't use functions implicitly. Always declare scope. [back to top]

Often, a module will export symbols into your code. That is, when you use or require standard modules, sometimes you get new functions within your current scope. In fact, you can make your own modules that do this with the use Exporter module.

For example, in the Cwd module, you get the cwd symbol. Even the Programming Perl [WALL00] reference suggests you do the following:

	use Cwd;

	$currWorkDir = cwd;
Now that is on the advice of the big guns (Larry Wall, Tom Christiansen) and I have a lot of respect for them. Still, I don't like to do things this way. Cwd is a standard module, but it is not intrinsic Perl. I have another page against barewords. So I think this should be clarified to be the function that it really is:
	use Cwd;

	$currWorkDir = &cwd();
But that's not quite enough for me. cwd is actually defined in another scope, though the cwd symbol is exported and accessable. I prefer to make this explicit though to make it clear that this is not part of the intrinsic Perl language, and that the details of this function are not in this scope. So I would rather say:
  1 #!/bin/sh
  2 #! -*- perl -*-
  3 eval 'exec $PERLLOCATION/bin/perl -x $0 ${1+"$@"} ;'
  4  if 0;
  5 
  6 use Cwd;
  7 $currWorkDir = &Cwd::cwd();

Listing 3.4.1 for code_untested/useCwd.pl
This exposes the fact that it is a subroutine that is not part of the native Perl language. As you get more experienced, you may start thinking "Duh. I know the Perl language and of course the cwd came from the Cwd module. Why do I care how easy it is to trace?" If you hand off your code to someone else, they might not have your level of experience, and something like cwd looks very much like it could be part of the intrinsic language, so they may be tempted to look for it in the functions section of the Perl reference. Or worse, if they are unfamiliar with it, they may think you just have a bare word, or have some call to the UNIX prompt (like backticks) that they are unfamiliar with.


Don't use method functions implicitly. Always declare scope. [back to top]

Perl does have plenty of object oriented functionality as of Perl 5. I don't necessarily think this is a good thing. I take advantage of some of it, but I intentionally avoid a lot of it because I think it makes code harder to read.

Randal Schwartz has a couple articles about how to make hierarchies in an object oriented way in Linux Magazine [SCHW00-4] [SCHW00-5] . He has pretty good coverage of it there. Certainly better than I can do (or want to do). Don't get me wrong. As you read my pages, it should be clear just from the number of citations to his work that I have a lot of respect for Mr. Schwartz.

But here, I just happen to disagree about the use of object oriented stuff. I'm not going to reiterate Schwartz's examples. Read his articles for that. But what I will say is that the Object Oriented methodology with hierarchies allows you to subclass modules off each other. Schwartz's example deals with barnyard animals, but I'll make an example from Maya and have a whole lot of dependency nodes. Of these dependency nodes, we can have operators or DAG(Directed Acyclic Graph) nodes. An example of an operator might be the actual operation of a lattice deformation or inverse kinematics solver. An example of DAG nodes might be groups/transforms or geometry - basically things that you would see in the DAG version of the hypergraph. The following shows the hierarchy of how these nodes are subclassed from each other. This is not a scene(DAG) graph in itself. Each branch just indicates that one node inherits all of the properties of its parent.

	DependencyNode__
	/		\
	Operator	DAGNode_________________________________
	/	\	|		\	\		\
Deformation	Solvers	Geometry	Camera	Light___	Group
/			/	\		/	\
Lattice			NURBS	PolyMesh	Spot	Distant

Of course, these aren't all the nodes available in Maya, and I don't think they have "Operator", but this is enough to just have an example. Now, all of the Nodes listed above are Dependency Nodes, whether it be a group node, a lattice, or a NURBS patch.

By the way, you might be laughing at me for the above diagram. See the "How is this?" section of the introduction for an explanation of why I choose to do ascii diagrams most of the time.

Now, the DependencyNode might have a subroutine/method to do something like 'getName' or 'setName' since every node should have a name. All lights might have a color and intensity with get/set methods (which I'm also opposed to, but I'll use for this example).

Now if you import, export, and @ISA things appropriately, you can actually call getName directly from a Point light or a Group node. You may be able to get something like a getIntensity directly from a Distant light, and you may have a spot light that has information about barn doors or something.

But I don't like to do that. You should sense a theme here from me that I like to have things mapped out explicitly. When you're reading the Distant module, and you see getName, you may want to read the code for it and figure out if that's the source of your problems. But if the inheritance is done properly, the getName could be defined in any of Distant, Light, DAGNode, and DependencyNode. The trouble is that as you are tracing through it, you probably don't have the full map above. You generally only know the immediate parent. So you have to go one module at a time. And each stage may or may not define the function. This means that if you don't find the function, you look for the parents and keep searching there. But there's always that little bit of insecurity. If you do find the function, of course you're happy. But if you don't find the function, you may be unsure if the function really wasn't there, or if you just didn't look hard enough.

For this reason, I think it is important to explicitly define all subroutines/methods that a given package uses, even if you could have gotten it for free. You may still need to trace through each one, but you get explicit directions where to look. Also, you can bypass the hierarchy and just jump to the base class. So in this case, in Distant, I would just say:

	package Distant;

	use DependencyNode;

	sub getName {
		return &DependencyNode::getName(@_);
	}
This will call the base class method for getMame and pass in all of the arguments that were passed in at this level. With the return, then it will just return exactly what the base class returns.

An alternative that I also consider is closer to the object oriented method, but still has it explicit:

	package Distant;
	use Light;

	sub getName {
		return &Light::getName(@_);
	}
and
	package Light;
	use DAGNode;

	sub getName {
		return &DAGNode::getName(@_);
	}
and of course,
	package DAGNode;
	use DependencyNode;

	sub getName {
		return &DependencyNode::getName(@_);
	}

Don't use the method call (->). Declare scope. [back to top]

When you are creating an "object" in Perl, you will make some subroutines within the scope of that object/package. For instance, you might have something like:

	package DependencyNode;

	sub getFieldValue {
		my $r_dependencyNode	= shift;
		my $fieldName		= shift;

		if (not exists($r_dependencyNode->{$fieldName})) {
			return;
		}
		# Yes, I really do mean to dereference here.  I found that
		# when doing scene graph stuff, it's good to be able to name
		# your data types, which is done with "bless."  But "bless"
		# requires a pointer, so if my field values are really named
		# (blessed) references, I should deref here for simplicity.

		return ${$r_dependencyNode->{$fieldName}};
	}
Now the details of that really aren't important, I'm just saying I have a package/object named DependencyNode with a subroutine, getFieldValue. If I was to call this routine from a main program , I would say:
	use DependencyNode;
	# pretend I already have a reference to a dependency node...

	# I like this way
	my $fieldValue = &DependencyNode::getFieldValue($r_depNode, 'tx');
That is, explicitly use & to designate the subroutine call, and use :: to indicate that the function getFieldValue is really in the scope of DependencyNode.

Now, I should probably mention that there is a C++ like way of calling this routine that looks like a shorthand. This is equivalent to the above call:

	use DependencyNode;
	# pretend I already have a reference to a dependency node...

	# I don't like this way
	my $fieldValue = $r_depNode->getFieldValue('tx');
Some people like this because it feels more "object oriented" to call the function as a property of the variable itself. Note that if you use this form, it implicitly passes in the variable (in this case, $r_depNode) itself as the first argument. So form getFieldValue's point of view, it sees the arguments $r_depNode and 'tx' in @_ By now, you should know how little respect I have for trying to make something more "object oriented." There are three reason why I do not like to access package methods this way:
  1. There is no & character, so there is nothing to clearly indicate that there is a subroutine being called here. At least, it is harder to determine since the -> operator is used for dereferencing array ref elements and hash ref elements.
  2. There is no indication of which package this came from. You might use several modules, and the use declarations are usually near the top of your code. Down here, you won't have a clear indication of where to look when your fieldValue isn't right, and you're trying to wade through this code to debug it.
  3. -> is the same symbol I use for hash and array element dereferencing. (more on that later.)

That being said, there is one exception to this. I kind of like using this form for a class constructor, but nothing else. This is especially true if the class constructor is named new. Of course there is more to it than this. In this form of calling the method, suppose that you do not already have an object created (like a constructor). We can still call the method with this form by using the name of the package itself:

	use DependencyNode;
	# pretend I already have a reference to a dependency node...

	# I don't like this way
	my $fieldValue = DependencyNode->new();

This still passes in something implicitly as the first argument. But since we didn't invoke it with an object (we used the name of the class itself), it will pass in undef. You can actually take advantage of this by testing for it to decide if you have a copy or a constructor... (I know some of the Perl books discuss this) The alternatives are:

In short, it is my recommendation to always use the explicitly scoped subroutine call with the &. (but it's forgivable to use -> for constructors....


© 2001 Steve Hwan, hostname: @pacbell.net, username: svhwan
You should probably use the word "PERL" in the subject line to get my attention.
Last Modified: Mon Apr 23 08:48:25 2001