Previous | Home | Exercises | Next
Erik Tjong Kim Sang,
Jakub Zavrel,
Guy De Pauw and
Walter Daelemans
CNTS - Computational Linguistics,
University of Antwerp
http://lcg-www.uia.ac.be/~erikt/perl/
In this section we will take a look at lists and arrays. You will need these if you want to manipulate larger quantities of variables that are related, such as words, database records etc. A list is an ordered set of scalar data. This means that each data item has a fixed position, or index, in the list. An array is a type of variable that holds a list. Arrays are always written with an at sign (@) before their name, instead of the usual dollar ($) for scalar variables. Perl will allow you to have two completely unrelated variables with the same name, but one is a scalar and the other is an array (e.g. $line and @line), so beware, because this can cause a lot of confusion in your code.
Unlike many other programming languages, Perl takes care of the memory for your arrays, so you need not bother about it yourself. Arrays automatically grow and shrink as you put data into them. An array variable that has not yet been initialized has the value empty list. If you, for example, insert something at place number six, Perl automatically enlarges the array to six places. The intervening positions which were not initialized yet have the value undef.
@a = (); # empty list @b = (1,2,3); # three numbers @c = ("Jan","Piet","Marie"); # three strings @d = ("Dirk",1.92,46,"20-03-1977"); # a mixed listVariables and sublists are interpolated in a list. So, assuming $a=1, you can also write the above lists as (The empty list disappears as a list member after interpolation):
@b = ($a,$a+1,$a+2); # variable interpolation @c = ("Jan",("Piet","Marie")); # list interpolation @d = ("Dirk",1.92,46,(),"20-03-1977"); # empty list interpolation @e = ( @b, @c ); # same as (1,2,3,"Jan","Piet","Marie")There are a number of practical construction operators for lists:
@x = (1..6) # same as (1, 2, 3, 4, 5, 6) @y = (1.2..4.2) # same as (1.2, 2.2, 3.2, 4.2, 5.2) @z = (2..5,8,11..13) # same as (2,3,4,5,8,11,12,13)
$string = "Jan Piet\nMarie \tDirk"; @list = split /\s+/, $string; # yields ( "Jan","Piet","Marie","Dirk" ) $string = " Jan Piet\nMarie \tDirk\n"; # watch out, empty string at the begin and end!!! @list = split /\s+/, $string; # yields ( "", "Jan","Piet","Marie","Dirk", "" ) $string = "Jan:Piet;Marie---Dirk"; # use any regular expression... @list = split /[:;]|---/, $string; # yields ( "Jan","Piet","Marie","Dirk" ) $string = "Jan Piet"; # use an empty regular expression to split on letters @letters= split //, $string; # yields ( "J","a","n"," ","P","i","e","t")This will turn out to be very useful for processing lines of text and their text fields using all kinds of field separators. split can do a lot of common parsing tasks for us. Those of you that are familiar with the language AWK know how practical this is.
Not only can an array hold a list of scalars, it can also give you access to each of the elements individually through their position in the list. For this we use the subscripting operator "[]". Numbering starts at zero, so that $array[0] gives the first item in the list @array. Negative numbers count backwards from the last item in the list (i.e. $array[-1] refers to the last item). You can manipulate list items directly, e.g. $array[0] = 5 or $array[0]++. Note that we use a dollar for this, because we are pointing to a single scalar value. We can also access selections of a list by a list of indices, such as @array[1,5,3], or @array[1..4]. This is called a slice. Now we do not use a dollar, because the result is itself a list.
There are more dualities between lists and scalars, as shown by the following example:
@array = ("an","bert","cindy","dirk"); $length = @array; # $length now has the value 4Because @array is interpreted in a scalar context instead of a list context, it behaves differently. In fact, it represents the number of items in the array. As Larry Wall et al. say in Programming Perl: "You will be miserable until you learn the difference between scalar and list context...". Read more about it in a reference book! (Note that you can always force a scalar interpretation with the scalar() function.
Another way to learn about the number of items in a list is through the special notation $#array. This gives the index value of the last element of @array. Thus:
@array = ("an","bert","cindy","dirk"); $length = @array; print $length; # prints 4 print $#array; # prints 3 print $array[$#array] # prints "dirk"You can also assign to lists (i.e. use lists of things on the left hand-side of an assignment), For example:
($a, $b) = ("one","two"); ($onething, @manythings) = (1,2,3,4,5,6) # now $onething equals 1 # and @manythings = (2,3,4,5,6) ($array[0],$array[1]) = ($array[1],$array[0]); # swap the first twoPay attention to the fact that assignment to a variable first evaluates the right hand-side of the expression, and then makes a copy of the result of that evaluation, including arrays (including copying extremely large arrays, so later you will need ways to work around this). So, after the following statements:
@array = ("an","bert","cindy","dirk"); @copyarray = @array; # makes a copy $copyarray[2] = "XXXXX";@array will still hold the original list, whereas @copyarray will be changed to ("an","bert","XXXXX","dirk").
In this section we will introduce a number of important functions for manipulating lists. Yes, Perl has a lot of functions. We're not going to list them here, because the fastest way to find out about them is to read through the function section of Programming Perl and look at anything you don't recognize that sounds interesting. And keep the Perl philosophy in mind: "There is more than one way to do it" (TM).
appends the list to the end of the array. If the second argument is a scalar rather than a list, it appends it as the last item of the array. The array grows as needed. For example:
@array = ("an","bert","cindy","dirk"); @brray = ("evelien","frank"); push @array, @brray; # @array is ("an","bert","cindy","dirk","evelien","frank") push @brray, "gerben"; # @brray is ("evelien","frank","gerben")
does the opposite of push. It removes the last item of its argument list and returns it. If the list is empty it returns undef.
@array = ("an","bert","cindy","dirk"); $item = pop @array; # $item is "dirk" and @array is ( "an","bert","cindy")
works on the left end of the list, but is otherwise the same as pop.
puts stuff on the left side of the list, just as push does for the right side.
lets you remove or replace entire ranges of items. splice removes the elements from pos to pos+number from the array and returns them in a list. If the LIST argument is provided, the removed range is replaced with the list. The array shrinks or grows as necessary.
@array = ("an","bert","cindy","dirk"); @cut = splice @array, 1, 2; # @cut is ("bert","cindy") and @array is ("an","dirk") @array = ("an","bert","cindy","dirk"); @brray = ("evelien","frank"); @cut = splice @array, 1,2, @brray; # @cut is still ("bert","cindy") but @array is now ("an","evelien","frank","dirk")
reverses the order of the elements of its argument, returning the resulting list. Note that it works on a copy and does not change the argument itself.
alphabetically sorts the elements of its argument, returning the resulting list. This function also works on a copy and does no harm to the original.
If we want to represent the contents of an array as a string, or print it, we have several methods at our disposal.
Array variables are interpolated in double quoted strings just like normal scalar variables. The following code:
@array = ("an","bert","cindy","dirk"); print "The array contains $array[0] $array[1] $array[2] $array[3]";prints:
The array contains an bert cindy dirkWe can also directly interpolate the whole array:
print "The array contains @array";Gives the same result as above (array items are automatically separated by spaces). However, we can also glue the items in a list together with other separators using the function join STRING LIST. For example:
$string = join ":", @array; # $string now has the value "an:bert:cindy:dirk"Note that the glue string is not a regular expression, but a normal string of zero or more characters. If you need to get glue ahead of every item instead of just between items, a simple cheat suffices:
$string = join "+", "", @array; # $string now has the value "+an+bert+cindy+dirk"Here, the extra "" is treated as an empty element, to be glued together with the first data element of @array. This change results in glue ahead of every element. Similarly, you can get trailing glue with an empty element at the end of the list, like this:
$string = join "\n", @array, ""; # $string now has the value "an\nbert\ncindy\ndirk\n"
It often occurs that we want to perform some operation on all items in an array or list. Knowing that we can access each item by its index, the most straightforward, but not necessarily most elegant method, is to make a for loop over the array:
for( $i=0 ; $i<=$#array; $i++){ $item = $array[$i]; $item =~ tr/a-z/A-Z/; print "$item "; }However, there is a special type of loop over lists, the foreach loop, and it makes the notation a bit more concise:
foreach $item (@array){ $item =~ tr/a-z/A-Z/; print "$item "; # prints a capitalized version of each item }Since the iterator variable ($item) is a reference to the original place in the array, we must, however, take into account that the original array is now changed. This could of course be avoided by making a temporary copy first.
Two more specialized iteration constructs should also be mentioned: grep, and map.
returns a list of all items from list that satisfy some condition. For example:
@large = grep $_ > 10, (1,2,4,8,16,25); # returns (16,25) @i_names = grep /i/, @array; # returns ("cindy","dirk")
is an extension of grep, and performs an arbitrary operation on each element of a list. For example:
@more = map $_ + 3, (1,2,4,8,16,25); # returns (4,5,7,11,19,28) @initials = map substr($_,0,1), @array; # returns ("a","b","c","d")
Suppose we have a text file containing people's personalia, addresses, and programming skills:
Bert:Perlemand:12-10-1953:Bladersteeg 1a:3581 XE:Utrecht:Smalltalk:+31-30-4565738 Evelien:Nieuwkluijs:20-03-1977:Albertheijnlaan 75:5036 EE:Tilburg:Basic,Java:+31-13-5354622 Cindy:Thompson:23-05-1969:Keizerlei 203:2000:Antwerpen:Perl,C++,Python:+32-3-2781256 Dirk:Diggler:01-04-1961:Sint-Krispijnstraat 7:8900:Ieper:Java,AWK,Perl:+32-57-229440 Frank:Schillebeeckx:12-02-1970:Ceciliastraat 13:2800:Mechelen:C++,Java,C,AWK,Perl,Python,PHP,Basic:+32-7-9052782 An:De Wilde:25-02-1975:Van Ostadestraat 182 III:1021 CF:Amsterdam:Visual Basic:+31-20-6777871and we want to print an alphabetically sorted list of the names, phonenumbers, and number of programming languages of those people who are competent Perl programmers. A first analysis of the task reveals that:
#!/usr/local/bin/perl # only needed on Unix systems # example.4.6: perform task for programming example 4.6 # usage: example.4.6 # 2000-22-02 zavrel@uia.ua.ac.be # read all lines in the input # while(defined($line = <>)){ # cut off the newline # chomp $line; # and put the fields in an array # @fields = split /:/, $line; # look at the programming field # (remeber, it starts at zero!) # $programmingskills = $fields[6]; if($programmingskills =~ /Perl/){ # compute the number of languages # @languages = split /\,/ , $programmingskills; $number = @languages; # put the first second and last field together in the original format # $name_phone_number = join ":", @fields[0,1,$#fields], $number; push @selection, $name_phone_number; } } print map $_ .= "\n", sort @selection;