In case I haven't mentioned it before, read Learning Perl [SCHW97]
In Perl, fundamentally, there are scalars, arrays, hashes (associative arrays), and code. These are accessed through the following symbols:
| $foo | scalar named foo |
| @foo | array of scalars named foo |
| %foo | hash of scalars named foo |
| &foo | subroutine named foo |
Okay, technically, there's a couple more, typeglobs and IO handles,
*foo type glob for foo - contains all of the above and IO references but we can discuss those later...
One of the things that I really like in Perl is that you do not need to predeclare your variables. We'll really take advantage of this later with complex data structures. But for now, just know that you can just use a variable without any declaration or warning. (Unlike C, C++, where you have to declare your variables ahead of time.)
Scalars are basically integers, floats, strings, or references (pointers). Well, actually in Perl, there aren't any integers, except though one of the standard modules, but don't worry about that. I'll try to discuss references later. So for now, I'm just going to mention floats and strings.
Quite simply, a float is just a number, such as
$answerLUE = 42; $pi = 3.14159265; $light_v = 3e10;(That last one is scientific notation for 30,000,000,000 - don't use commas for perl numbers though.)
A string is a series of characters, often input or output to/from a program, or a word or something. You can even read in an entire file into a single string. One of the nice things in Perl is that you don't need to predeclare the size of your string. It can be arbitrarily huge, and you are really only limited by your hardware and operating system. You can designate strings in several ways:
$director = "Cameron Crowe"; $help = "Try executing:\n\t$script $filename\n"; $codeForLater = '$help = "Try executing:\n\t$script $filename\n";';The first is the easiest to understand. I'm simply assigning the string
Cameron Crowe to the scalar variable $director.
To designate the string, you enclose it in quotes( ' or "). The second line
is a little more complex, and demonstrates what I think are the two most
common escape sequences. These are really simple though.
\n means "newline"
\t means "tab," usually indenting to the next
8 character block.
$script and
$filename inside the double quotes ("). One of the really
convenient things in Perl (csh, awk, and probably python have this too) is
that you can expand variables inside double quotes This makes it much easier
to write out reports. Note that C/C++ does not have this capability. You
either need to printf/sprintf or seperate everything with a <<
operator for cout. (csh, sh, awk, and probably python all have this
capability too - it's pretty common in languages where compilation is not an
explicit step)
Finally, in the third example, note that I use single quotes ('). This explitly means "do not expand variables or escape sequences." So the string itself has the $, \n, and \t inside it rather than the variable expansions, newlines, or tabs.
To me, the single quotes mean literal, and what
you see is what you get. In general, I prefer to use single quotes when I can,
only using double quotes if I need to expand something. So in the
above example, I would usually have used single quotes for
$director.
In perl though, there is another way to express quotes. You may see
this a fair amount from advanced perl programmers. Sometimes, it makes it a
little hard to read, especially for beginners. I know it confused me for a
while. You can use q() to indicate single
quotes and qq() to indicate double quotes. This is useful if
you expect to be using the " or ' symbol a lot inside your string. Though
if you have a long string, I would suggest learning about HERE documents.
$director = qq(Cameron Crowe); $help = qq(Try executing:\n\t$script $filename\n); $codeForLater = q($help = "Try executing:\n\t$script $filename\n";);Note that though you use the () symbols, you probably shouldn't think of this as a subroutine call. That's an exception to the otherwise golden rules. Actually, even those () characters are up for debate. Read Programming Perl [WALL00] for more details on that. I think using the qq/q notation for strings generally makes things harder to read, without too much payoff..
I would like to introduce the concept of concatanation now. It's pretty simple. This is how you build strings with other strings:
$firstName = 'Cameron'; $lastName = 'Crowe'; $wholeName = $firstName . ' ' . $lastName; # $wholeName now holds 'Cameron Crowe'That is, the . operator concatanates 2 strings together. It is worth noting that concatanating a string to a number will automatically convert the number to a string so it is easy to say:
$tmpFileCounter = 83292; $fileName = '/tmp/melScript'; $fileName .= '.' . $tmpFileCounter; # $fileName now holds '/tmp/melScript.83292'That is, I just concatanated the counter to the end of the fileName. It is convenient to do this in a simple operator. Note that I also slipped in another version of the . operator: .= - this just means concatanate to the end of the string on the left.
Incidentally, I really like the writer/director Cameron Crowe. Every movie he writes is golden. Go and watch Fast Times at Ridgemont High, Say Anything, Singles, Jerry Maguire, and Almost Famous - all great movies.
If you used C, you probably would have had to use the sprintf routine. If you used C++, you have a string class, and it probably overloaded an operator +, so you get similar functionality to the above. The trouble in C++ is that everyone seems to think they are a genius who can bring something new to the idea of strings, so everyone seems to have their own string class. So wherever you go, there's probably another implementation of strings in some proprietary library and a different set of sting operators.
Incidentally, it also impresses me how many sets of software think they have something new to offer in representing a color (a set of 3 numbers) or a point (a set of 3 or 4 numbers, depending on if you are into the homogeneous thing or not).
I'll wait until the discussion about arbitrary structures before I talk about references/pointers.
An array or list is a collection of scalars in a certain order. It is worth noting that the order is preserved and can be accessed. However, it can only be accessed numerically. For example, to define an array, we can say:
# One scalar first
$specialFrame = 5000;
# arrays
@operations = ('generateRIB', 'renderRIB', 'cleanupRIB');
@renderFramesA = ( 1, 2, 3, 10, 15, 33, 105);
@renderFramesB = (108, 110, 111, 112);
@renderFramesAll = (@renderFramesA, @renderFramesB, $specialFrame);
@nextFrames = ( (5001, 5002), (5003, 5004), (5005, 5006));
@renderFramesCopy = @renderFramesAll;
All of the above lines will define an array/list of scalars. The first
couple are pretty obvious. But the fourth one may confuse people. Some
people new to Perl think that that format would look like
@renderFramesAll is an array with 3 elements: two arrays and a
scalar. This would be wrong. This actually will concatanate the two arrays
together and put them into @renderFramesAll. @renderFramesAll
will actually contain the values: (1, 2, 3, 10, 15, 33, 105, 108, 110,
111, 112, 5000) It is actually pretty easy to tack on more values.
Similarly, in the next line, many beginners think this is the way to make
and array of arrays. And again, this is wrong. @nextFrames
actually gets the values (5001, 5002, 5003, 5004, 5005, 5006)
as a flat list/array.
Also, it is simple to assign/copy one array to the other. Note that when you do this (like the last line above), it actually copies the elements from one array to the other, so you should be careful if you have really big lists. Then again, I sling around lists of 13,000 names without much thought, so it's not that big a deal.
So if the above examples do not have an array or arrays, how do you make and array of arrays? Well, I'll get to that in a later page. In the meantime, just be aware that that isn't the way to do it. In Perl 4, there actually wasn't a way to get an array of arrays, so it was very natural to know that the above were just flat lists, and that this was a convenient way to concatanate lists together. When Perl 5 came up with a way to get arbitrary data structures (like arrays or arrays), the above syntax was already taken so they had to make another way. I'm just used to it now. I'm not going to defend the way they implemented it, but the quicker that you get over it, the quicker you can get to doing something useful with it.
Now in the scalars section, I talked about q and qq as alternate
ways of defining a string. But I didn't like them much for that. This was
really to transition to the array version with qw(quoted word). Now, with
arrays, we can actually define a list of values separated by whitespace,
and we can even omit the commas and quotes. For instance, the equivalent to
@operations above could be:
@operations = qw(generateRIB renderRIB cleanupRIB);Sometimes, that will just be more convenient. Now again, the qw notation isn't usually used that much by beginners, so it looks a little confusing, and you'll probably see it a lot in obfuscated Perl. I actually do like using this notation though just because it saves me the trouble of quotes and commas. For long lists, I think it actually tends to make code easier to read. You can use any mixture of whitespace (spaces, tabs, newlines) to seperate tokens. Personal preference.
Have I mentioned Learning Perl [SCHW97] yet? You should really read that book. It'll tell you more about arrays and slices and stuff. I don't use slices much, so I'm not going to discuss them.
Now, how do you access the elements in the array? Oh, there are so many ways, and that is one of the many wonderful things about Perl. First off, if we want to grab one of the scalars in our array, we can say:
$firstOperation = $operations[0]; $secondOperation = $operations[1]; $thirdOperation = $operations[2];That's pretty simple, isn't it? On the left, we see scalar values. On the right, we see we also use a
$ to access
@operations even though @operations is an array.
This is because we are accessing a scalar element of the array, accessing in
a scalar context. The indicies of @operations start with 0,
unless you do some really bizarre stuff that you shouldn't be doing in the
first place. Also, something else to note is the square brackets []. This
is how we access elements of arrays/lists. Square brackets are a pretty
good indicator that we're dealing with something related to an array.
But you could do that in any language. Even C++. Heck, even
in MEL. But in C/C++, you would often use arrays to create stacks, FIFOs,
buffers, and stuff. Realizing that these are common uses of arrays, Perl
offers more array access methods. The important ones are really
push, pop, shift, unshift. For instance, if the operations
were implemented as a FIFO buffer (that is First In, First Out), where you
want one part of your program to add operations to a list and another part
of the program to read out the operations in the same order they were added,
you could use:
@operations = (); # empty list # Section to create operations list # Use push to add a new element to the end of the list push @operations, 'generateRIB'; push @operations, 'renderRIB'; push @operations, 'cleanupRIB'; # Section to read an operation # Use shift to take an element off the beginning of the list. $currOp = shift @operations;
Or you could make a stack. Think of a stack of trays in a cafeteria. You can push them on one at a time, then remove them one at a time. Note that with this, the last tray you added is the first that you remove. When would you encounter this in production? Well, this is actually fundamental to most rendering, especially RenderMan.
This is apparent if you read a RIB file. The RIB describes your scene. At some point, you'll probably see a lot of Rotates, Translates, Scales, and NuPatches. This describes the hierarchy of how the scene is put together. In Maya, you know this as parenting. Each of the operations like Rotate can be thought of as a "Transform." You can have layers of transforms, and eventually, deep down, there is a NuPatch. Now how do you describe this? Well, you need to indicate which transforms affect which NuPatches. But transforms could affect other transforms, so you will need to indicate a block of Begin and End for each transform. So in your RIB, you will see something like:
TransformBegin
Scale 2 2 2
TransformBegin
Translate 1 2 3
# Patch A
NuPatch blah blah blah
TransformEnd
TransformBegin
Translate 4 5 6
# Patch B
NuPatch blah blah blah
TransformEnd
TransformBegin
This means that the Scale is applied to both patches, but the Translates are
applied to only 1 patch each because they are inside the TransformBegin and
TransformEnd blocks. But try to conceptualize how you would implement this.
As you read this file, you want to create a scene graph, creating children
as you go along. This is actually similar to the lunch trays. Each time
you see a TransformBegin, you will want to push a transform
onto your stack. Each time you see the NuPatch, you create
a child (The NURBS Patch). Each time you see a TransformEnd,
you pop a transform off the stack.
Suppose now you are writing a parser for RIB files. This is actually pretty common. I would guess that most people don't write a real formal parser, but just do quickie regular expression matching, and that's good enough in general. As you parse the RIB, you want to keep track of the transforms. I won't get into how you get the name attribute, but you may have code like:
# Found a TransformBegin push @transformStack, $currentTransformName; # Operate on new transform # ... # Found a Translate, Rotate, or Scale # incorporate the transform into the current transform # ... # Found a NuPatch # Output the current transformation and the patch to the renderer # ... # Found a TransformEnd $currentTransformName = pop @transformStack; # Now, I'm back up at parent.Now, you may be thinking "If I was going to build a renderer, it has heavy matrix crunching, so I would use C/C++. Well, first off, I would never choose C++ anymore, except for glue logic to other APIs where I'm forced to use it. But I'd also say it's not as far fetched as you may think. For instance, my friends at Steamboat Software actually do use Perl to parse RIB files to pass information on to jig in their program, hpartojig (or is that hpar2jig?). Incidentally, you can apply the same principles to Inventor files, mostly with the { and } indicating the transform begin and end. That's probably a pretty long diversion just to describe the principle of Last-In, First-Out (LIFO, or stack).
The point is that in Perl, you can implement these data
structures extremely easily with commands intrinsic to the language.
Note however that all of these operations, push, pop, shift,
unshift actually alter the array, either adding or removing elements,
where the $operations[0] just reads them. Of course, there are other
wonderful array operations like grep, map, foreach, and I'll
probably get to them on a later page.
You might have noticed that I make a distinction of talking about arrays and lists, and you may wonder, "What's the difference?" Well, they are very very similar. And to this day, I still have trouble telling them apart. This is one of the unfortunate things in Perl. Now, if you're disciplined about accessing scalar return values of subroutines into scalar values and array/list return values of subroutines into arrays, you will stay out of trouble. The trouble occurs when you try to implicitly cast arrays and lists return values to scalars.
This is discussed in Programming Perl [WALL00] in the 3rd edition, pp 72-74. But here's where it gets confusing:
# LIST context
$operation = ('generateRIB', 'renderRIB', 'cleanupRIB');
# got 'cleanupRIB'
# ARRAY context - $operation will be the scalar context of the array.
# This will actually return the number of elements in the array
# instead of giving you any value in the array. So $operation
# will be 3 ($operation didn't get an operation at all)
@operations = ('generateRIB', 'renderRIB', 'cleanupRIB');
$operation = @operations;
# got 3
# List assignment - As if this wasn't confusing enough, if the
# left hand side of the assignment is a list, the values will be
# assigned in the order listed. So...
($operation) = ('generateRIB', 'renderRIB', 'cleanupRIB');
# got 'generateRIB'
@operations = ('generateRIB', 'renderRIB', 'cleanupRIB');
($operation) = @operations;
# also got 'generateRIB'
You can probably understand why I find this confusing. But as we see in the
last example, if we receive the array or list into a list context, then we
are pretty safe, and we know that we are receiving the first value into the
scalar. This is actually pretty useful in receiving paramenters in a
subroutine. If you return values from an array you should receive the same
type you return. That is, receive scalars into scalars and arrays/lists
into lists.
There's actually one more convenience. On the left hand side, if you include an array as the last element in the list, it will absorb all the rest of the elements:
@operations = ('generateRIB', 'renderRIB', 'cleanupRIB');
($operation, @remainingOperations) = @operations;
# $operation got 'generateRIB'
# @remainingOperatiiions got ('renderRIB', 'cleanupRIB')
I guess the only other thing I would add is: Live with it. I honestly don't get stung by this one much myself. But I do try to avoid it by not doing any implicit casting from and array/list to a scalar.
Now that I've gone over that, I'm going to refer to arrays and lists interchangably, usually as "arrays."
An array is a very simple structure. If you have elements 0-11,
then your array is 12 elements long. You access your first one as
$ARGV[0] and your 12th as $ARGV[11]. And the
storage is also very simple. Somewhere in memory, you are just
allocating enough space for 12 elements, and storing them there. Now,
suppose you have 1,000,000 elements. Well, that's also pretty simple.
You just allocate enough space for 1,000,000 elements and access them
with indicies 0-999,999. And that's all fine and good.
Now, suppose we want to keep some notes about jobs we sent to our render queue. Now, supposing we're a major effects house or something and we've been in business a long time and have sent a lot of jobs to the queue. Suppose each job on the queue has an ID number ranging from 0 to 65535 or something. In a day, we might submit 100 jobs to the queue (probably more, but stick with me for a minute here), so we might see all of our jobs in a day have numbers like 9782, 9783, 9784, ... Now, we think back to how we store this data. Well, we could just stick this into an array of information keyed on the job ID. However, we could wind up allocating a block of 9785 elements (or worst case 65536 in this example), when we really only have 3 elements with high numbers. That means we've just wasted over 9500 entries! It is easy to conceive of a data structure that would hold 1k of data (output logs, usage statistics), so these 9500 entries could easily translate into 9,500,000 byes (over 9Meg of memory!), when the used portion is really only about 3 of those or 3,000 bytes. That could be a waste.
We want to have a more efficent way of storing data. The one that immediately comes to mind would be to keep an array of pointers to our information data structure. So we initially allocate a block of 65536 pointers (pointers are only about 4 bytes), so this would be about 1 Meg. Then we only allocate the information data structure as we need it, so we will only use our 3,000 bytes for the information. So in this case, we only use a little over 1 Meg. But that 1 Meg is pure overhead, compared to the 3k that we are really using. Again, this has a lot of waste.
Supposing though that we had an algorithm that could make a 2 digit number out of all the keys. Think of 9782, 9783, 9784, as keys into this structure. And supposing we could think of a function that could magically transform 9782 into 82, 9783 into 83, and 9784 into 84. Then we could just allocate 100 pointers (400 bytes) and then as we need an info block, we allocate the 1000 bytes for it. For the 3 jobs I keep talking about, they will still take 3000 bytes. So now we're talking about a total of 3400 bytes, 3.4k to hold our data. That's a lot less overhead. But now, we think about this "algorithm" and realize that it's not perfect. After all, our job IDs could range from 0 to 65535, and we're only mapping this into 100 locations, so there's bound to be some duplication if we have a huge amount of jobs. Well, under the hood, clever programmers just keep a list at each location of all the jobs that wound up in that bin. I won't address how they do that, but let's just assume that it is very low overhead, and really not that difficult.
Now, let's think about an alternative to this. We could just keep a list of the keys and a list of the data structures. Instead of using 9782, 9783, 9784, ... as the keys directly, I will instead keep this as just another piece of data. So as I get this info, I will allocate 2 arrays of 3 elements. One array will just hold the numbers 9782, 9783, 9784, and the other will hold the information data structures. Now we're looking at about 1006 bytes per element, or a total of 3018 bytes. Our data structures are shrinking! When we want to access the data for job 9782, we read through our first list until we find the key 9782. Then we look over at the other array and get our information block. That's nice and efficient for storage, but realize that we might not be adding job IDs in numerical order, so our search for the key 9782 might not be very efficient, especially for large lists.
Let's amend this idea. Supposing that in our first array, instead of just storing the job ID itself, we also store a pointer to the elements in the other array. This means that the first array will now take up 6 bytes per element - 2 for the number 0-65535, and 4 for the pointer to the other array's information elements. The second array will still just hold the data blocks about the job. Actually, it may hold an array of pointer to these blocks, so let's say it takes 1004 bytes per element, so between our 2 arrays, we are looking at 1010 bytes per element. And to store information about 3 jobs, we're looking at 3030 bytes. Okay, our data structure just got a little bit bigger. But now, suppose we sort the keys in our first array. Now, this means that we can run a binary search through the keys (which is the fastest search method through an ordered list) when we're looking for the information about job 9782. Then, once we find it, we get a pointer to the data in the other array. (Actually, the other array is not necessary at this level, but it may make garbage cleanup easier).
Yes, I know I should have some diagrams. Maybe I'll add some later. But don't hold your breath.
Anyway, this last idea has some reasonable compromises. We store our data with very little overhead. We know we have 3000 bytes of data regardless of method. But this last scheme has only 30 bytes of excess, rather than the first scheme which had a whopping 9.5 million bytes of waste. One of the areas it still needs help with is in resizing the array, as you add elements. It actually makes sense to allocate both arrays of pointers a little bigger than you need (say 100 elements). And maybe use one of the concepts from an earlier scheme to store everything in this smaller array, and have some low overhead coding under the hood that will keep track of duplicate entries.
That earlier scheme has another benefit. Given an index, we had an algorithm that transformed it into a key directly. This means that we immediately know where to look for the data related to this key. The disadvantage of the idea with the two arrays (sorted key list with pointers to the data) is that as you add keys, you have to re-sort the list. As your list grows, it will take longer and longer to figure out where to put the new key, and rearrange the other keys accordingly. If we hash out the indexes directly into keys, we save that time. (But we need to count on someone to manage the duplicate entries.)
In that earlier scheme, I described a rather trivial algorithm that would translate 9782 into 82. That's just a simplification. Sometimes, what we want is to use a string as a key. That is, instead of indexing off a job number, perhaps we have some information that we would like to key off a something like 'animate', 'render', or 'composite'. Now, think about sorting the list of keys and doing a search. To determine if the string matches for something like 'composite', there are 9 characters, so there have to be at least 9 comparisons to decide if this is the key. But we have to do these comparisons for each key (though we'll be able to reject most of them after the first character). Back when we were dealing with job IDs, all we had to do was one comparison for each key. This would give us incentive to find a way to come up with a way to translate 'animate' to something like 82. There are some simplistic ways to do this. For example, each character can be thought of as a number from 0-255 (I'm not even going to think about unicode right now). Perhaps we can take the first two characters and come up with a number like:
(first character code) * 256 + (second character code)This will give us a number from 0-65535. Just as before, you can imagine we will have a lot of duplication (like 'animate' and 'antitrust'), but again, all I can say is: have faith that a clever programmer can come up with a way to maintain a list of the duplicates, with a way to access it, without a lot of overhead.
The trick here is that though we expect to be able to get low overhead ways of handing 2 words that wind up as a duplicate key, we still want to avoid winding up in the same bucket as much as we can. We might try to be more clever about this. For instance, there probably will be a high correlation in the first 2 letters (like 'th' or 'qu'). And we might even speculate that there will be a high frequency of vowels (a, e, i, o, u) in the third letter. If we think that most keys will be at least 4 characters long, we might want to modify our key to use the first and fourth characters, using 0 if there are less than 4 characters. Again, that is not a perfect algorithm, and you'll get some duplicates. Ideally, we want to evenly fill our buckets with the least collisions. Now, there's a whole field of Electrical Engineering/Mathematics/Computer Science called "Information Theory" that discusses this a lot. If you are interested in following up on this, try reading up on Encryption or Compression and look for phrases like "high entropy". I'm not being facetious. This really is an interesting topic. I took a few quarters of it myself in college. But it is pretty hardcore math theory. You might want to look into basic statistics first going into it.
The points I'm getting to are:
One of the things Perl has that many of the other languages I've used don't is hashes intrinsic to the language. These are also known as associative arrays. I cannot do justice to how useful these things are. One could argue that in C++ that there are maps in the C++ stl (standard template library). That may be true, but for one thing, you need to include the stl, you probablly have to consult a reference manual every time you want to figure out how to access them, and of course, it's been my observation that any C++ code that I use templates in seems to take twice as long to compile, and my compile logs are twice as big. (You probably notice a strong theme here that I no longer like C much, and I loathe and despise C++).
So now what are these wonderful things? The simple way on the surface to think of them is that they are arrays. But instead of having to access something with an index, you can access them with a string. That is,
%programTable = (
'animate' , 'maya.bin',
'render' , 'render',
'composite' , 'shake',
);
# Equivalent to:
$programTable{'animate'} = 'maya.bin';
$programTable{'render'} = 'render';
$programTable{'composite'} = 'shake';
Now, admittedly, I have not read the Perl source code, so I don't really
know how hashes work. But if I had to guess, I would think they would use a
lot of the ideas discussed above in the diversion about data structures.
I don't think anyone would seriously use the hash scheme I outlined. That
was just to illustrate what you have to think about when making a hash. But
they probably did use some hash scheme, and they probably do have some
low-overhead scheme to index all the keys. In other words, the prior
discussion doesn't describe the tactics Perl designers took, but it probably
is raising similar design concerns.
Before I go any further, I want to make a few notes about the assignment schemes above.
%programTable) on the left, but a list on the right. I have
an even number of elements in the list. This is because I'm giving a list
of keys and values, as demonstrated in the second assignment scheme.
Other than the key/value constraint, there is really nothing special about
the list on the right. You can assign any array on the right.
%programTable hash. The string
inside of the {} is the key, and the value is on the right of the = sign.
=> which acts like a comma (,). But people (including me)
use it a lot in hash assignments because it makes it look like we're making
a correlation between the keys and values. Really, the , and => are
interchangable, but the => is actually used for clarity:
%programTable = ( 'animate' => 'maya.bin', 'render' => 'render', 'composite' => 'shake', );Actually, there is one difference between
=> and ,.
If you use the => operator, Perl knows that this is usually used with hash
assignments, so the value to the left of it is probably the key to a hash.
In this special case, if it sees a bareword, then it assumes it is a string.
So we can take the shortcut:
%programTable = ( animate => 'maya.bin', render => 'render', composite => 'shake', );
The question now is: when do we use a hash and when do we use an array? Hashes are really wonderful things. Generally, if you want to access the values of an array by string keys, you almost always want to use a hash. If you are using numbers to access, you probably want an array. But you might consider using a hash if you have a sparse array.
As much as I love hashes, there is one thing that they are not
good at: maintaining order. If you care about the order of the elements in
the array (like in a stack or a list of operations), then you probably want
to maintain an array. Now, there is something up in CPAN called Tie::IxHash
that actually does maintain the order. But I'm not going to get into that
here. I still have not used the module yet. Usually with hashes, you just
access the list of keys using keys %programTable, and don't
count on the order being anything that makes any sense.
Actually, I take that back. Let's consider the idea that under the hood, they have a hash scheme that converts the strings into some random looking number to fit into a small number of bins. The more random the number the better. So internally, it is probably keeping track of the elements in the array based on this hash key. Perl won't tell you what that hash key is, but if you're looking for some rationale in the order keys come back to you, well, that's it.
One of the other things to be aware of under the hood is that though it is low overhead, there is overhead nonetheless when you use a hash vs. an array. In a hash, it is probably maintaining 2 arrays internally, one for the key and one for the value. There is also some additional storage space for the hash keys, but not much. I would never actually make the storage overhead a consideration when deciding whether or not to use a hash.
You can write a subroutine and access it like:
sub helloWorld {
print "hello world\n";
return;
}
&helloWorld();
This created a subroutine called helloWorld, and I accessed it by using the
&. That's pretty simple. I give my subroutines a list of
arguments using (); A subroutine is just a code block that you can access by
name really. Sometimes, they return values. Sometimes they don't. In Perl,
a subroutine is a subroutine. A single routine may sometimes exit out without
returning a value and other times return a value. You don't need to
pre-declare anything. I find that convenient. I also like being able to
pass in a variable number of arguments and put the intelligence into the
subroutine to figure out what to do.
Anyway, I don't have a lot to say about subroutines. Learn about them. Read Learning Perl [SCHW97] and Programming Perl [WALL00]
$
means that you are getting a scalar access to something,
whether it be a scalar, array, or hash. That is, the name that
follows may not be a scalar, you are getting a scalar from it, for
instance one of the elements of the array or hash. For example:
$foo, $foo[1], $foo{'key'}
@
means that you are getting an array access to an
array. Now that sounds kind of silly, but it is important to note the
distinction between $foo[1] which is a scalar access
to array element number 1 and @foo[1] which is actually
returns another array. Some functions will behave differently
depending on if you pass in an array or scalar, so it is important
to make sure you are accessing the array in the way you really
intended.
You can extend this to the references. For example,
@{$r_myArray} means to take the reference $r_myArray,
and access the array that it is pointing to.
%
means that you are getting a hash access to a hash. There
isn't much ambiguity about hashes. For example %foo.
Or as a reference %{$r_hash}
&
means that you are accessing a subroutine.
[]
means that you are doing something related to an array.
On this page, I just described simple array access. But you will
find out later that this is also used for array referencing and
anonymous arrays. But the bottom line is that [] means you're
dealing with an array.
If you see it with a variable name, it is probably being used to access a hash. If you see it with a control block (like if, for, foreach, while) or an intrinsic command (like sort, grep, map, ...), then it is probably being used as a code block. It actually gets tricky though to determine from context if you have an anonymous hash or a code block. I wish I could give you a golden rule, but it really is a little tricky. Learn to live with it.
&subName($arg1, $arg2);. Well, to unify everything,
you can actually just think of that as passing a list to a subroutine
so once again, we can simplify and say that putting something in
parenthesis means passing an array around in some form or another.
However, realize that parenthesis are also used in
numerical expressions, such as
($oldAngle+$delta)*$PI/180.0, and it doesn't add
anything to our intuition to try to contrive that into an array. So
just realize that parens could be a list or a grouping in an
expression.
$_
though there are many other places where you can use these symbols.
Generally, I recommend against it because it makes code harder to read. But
a few of the symbols you might encounter are:
$_ or $ARG - current argument. This
is the value of an iterator in a foreach, while, grep, or map.
Looking at the golden rules, we know that it is at least a scalar.
@_ or @ARG - current array. I know I
see this and use this when I start subroutines. In this context, it
is an array of scalar arguments that were passed into the sub.
$& or $MATCH - last match in a
regular expression. I'm a little shakey about using this. I
haven't really studied it enough. What I'm worried about is if I
have 2 matches in a row, and the second was unsuccessful, will $&
contain the match in the first one or will it contain the empty
string. I think it is still defined, which I find confusing in itself.
Honestly, I usually stay away from this one.
Usually, not always.
$@ or $EVAL_ERROR - this one is a
weird one because it has a @ in it, though it has
nothing to do with arrays. Learn it. Live with it. You'll soon
see that eval is a very useful part of the Perl language. But if
you try to execute invalid code, you will get an error message and
this variable holds that message.
$? or $CHILD_ERROR - if you just did a
system call or launched any child process, you can get the return
status in this variable.
%ENV - environment variables. In the shell, you
had environment variables like PATH or LIB32_PATH or something that
you access through setenv. There is no
setenv in Perl, but Perl has a better abstraction. All
of these variables are accessable in a hash, so you would access the
elements of the hash the same way you would any other, with
$ENV{'PATH'} or $ENV{'LIB32_PATH'}.
@ARGV - If you've programmed C or C++, you should
be familiar with argc and argv as mechanism for your program to
receive arguments from the command line. Perl just has the
equivalent of the argv array. @ARGV is just an array
holding the command line args. $0 holds the command
name. $ARGV[0] holds the first arg passed in to
the program. If you want argc (the number of args), just use
scalar(@ARGV).
So now, some of you might be curious what that expression meant at the top of this section:
%{$_}=($&=>[@_,@{$_},'friggin']);#heck do all these symbols mean?
I consider this obfuscated perl and I'm actually using it as an example of
bad coding. But let's break it down anyway. First off, anything after the
pound sign (#) is a comment, so we can ignore it. Also, let's put in some
whitespace to clarify things a little.
%{$_} = (
$& => [@_, @{$_}, 'friggin']
);
Okay, this is already less intimidating. I'm cheating a little bit here.
I'm using symbolic dereferencing here. Pretend that I'm
foreaching through a list of variable names. In Perl, I can
access this through dereferencing:
$procName = 'prman';
%{$procName.'Configs'} = ( 'ShadingRate' => .25 );
# Equivalent to:
%prmanConfigs = ( 'ShadingRate' => .25 );
Already, we can see benefits over strictly compiled languages (C, C++). I
can reference variables without knowing what they are. This means, for
instance, that I can let a user tell me what variable name they want to
change without having a huge if/case block. Actually, you can't have
character arrays as case arguments, so you're really stuck with a huge
if/else-if block in C/C++.
Now, with dereferencing in see that %{$_} and
@{$_} are simply symbolic dereferences to whatever variable
name is held in $_, the current arg, most likely from a
surrounding foreach. Note that the @{} and
%{} notations are also used for normal references, so
$_ could actually hold a pointer. However, since I am
accessing it with both @{$_} and %{$_}, I realize
that it would actually be invalid to try to hash dereference an array ref,
and it would be invalid to try to array deref a hash ref, I'm going to say
that's not what's happening above. Instead, let's just say that it's a
symbolic dereference, and I'm just providing a name of a variable. So
supposing that the $_ held the string 'prmanConfig'. Then the
big expression would be equivalent to:
Refering back to my notes on special variables, realize%prmanConfig = ( $& => [@_, @prmanConfig, 'friggin'] );
@_ is
just the current array and $& is just the last pattern that matched.
So we see that all we are really doing here is initializing the hash
%prmanConfig with a single lookup.
We take whatever was last matched in a regular expression and use it as a key, and the value it points to is actually an ref to an array. Recall that one of the golden rules is that [] refers to array activity. In this case, we are creating an anonymous (no variable name) array.
Well, that was pretty contrived, and really a useless block of code. But I thought some people may wonder what that really expanded to.