In my page about regular expressions,
I went over a way of assigning components of a regular expression to variables
by using () inside the expression:
# RIGHT - GOOD my($basename, $sep1, $frame, $lastPart, $sep2, $ext) = $filename =~ /(.*)([\.\-])(-?\d+)(([\.\-])(\w+))?$/;There's something very subtle to notice here that doesn't jump out at most beginners. In this usage of
my, we are using a list/array
context in my. When you're extracting data from a regular
expression, you always return an array.
Supposing we're only interested in the basename. We could write an expression that looks like:
my($basename) = $filename =~ /(.*)[\.\-]-?\d+([\.\-]\w+)?$/;Note that here, we still have 2 sets of parenthesis inside the regexp. This will return an array of the basename and (basically) the extention. However, we are assigning this to the basename only. The variable on the left will just take whatever the first array element was (in this case, the basename), and the rest of the elements (extention) will just be ignored. So alternatively, we can actually say:
# RIGHT - GOOD my $basename; ($basename) = $filename =~ /(.*)[\.\-]-?\d+([\.\-]\w+)?$/;
This is all fine and good. But it is very tempting when you only have one variable on the left to do something like:
# WRONG WRONG WRONG my $basename = $filename =~ /(.*)[\.\-]-?\d+([\.\-]\w+)?$/;or
# WRONG WRONG WRONG my($basename); $basename = $filename =~ /(.*)[\.\-]-?\d+([\.\-]\w+)?$/;Well, the regexp returns an array. In this case, we are assigning it to a scalar. If we make a quick review of perl types (arrays in particular), we realize that this will evaluate to the number of elements in the array. In this case, if the file has an extention, there will be 2 elements. Therefore, $basename will end up with the value 2. Generally, this is not what you want.
But all I can say is: be careful. Always assign the regular expression extraction to a list/array.