Previous | Home | Exercises | Next

 

A Shortcut to Perl

Erik Tjong Kim Sang, Jakub Zavrel, Guy De Pauw and Walter Daelemans
CNTS - Computational Linguistics, University of Antwerp
http://lcg-www.uia.ac.be/~erikt/perl/


This text is part of the lecture notes for a Perl course taught at CNTS - Computational Linguistics at the University of Antwerp.

4. Lists and arrays

In this section we will take a look at lists and arrays. You will need these if you want to manipulate larger quantities of variables that are related, such as words, database records etc. A list is an ordered set of scalar data. This means that each data item has a fixed position, or index, in the list. An array is a type of variable that holds a list. Arrays are always written with an at sign (@) before their name, instead of the usual dollar ($) for scalar variables. Perl will allow you to have two completely unrelated variables with the same name, but one is a scalar and the other is an array (e.g. $line and @line), so beware, because this can cause a lot of confusion in your code.

Unlike many other programming languages, Perl takes care of the memory for your arrays, so you need not bother about it yourself. Arrays automatically grow and shrink as you put data into them. An array variable that has not yet been initialized has the value empty list. If you, for example, insert something at place number six, Perl automatically enlarges the array to six places. The intervening positions which were not initialized yet have the value undef.

4.1. How to write a list?

Lists are represented in your program by a comma-separated list enclosed in parentheses. The empty list is represented by an empty pair of parentheses. For example, the following expressions are lists:

@a = ();                             # empty list
@b = (1,2,3);                        # three numbers
@c = ("Jan","Piet","Marie");         # three strings
@d = ("Dirk",1.92,46,"20-03-1977");  # a mixed list
Variables and sublists are interpolated in a list. So, assuming $a=1, you can also write the above lists as (The empty list disappears as a list member after interpolation):
@b = ($a,$a+1,$a+2);                    # variable interpolation
@c = ("Jan",("Piet","Marie"));          # list interpolation
@d = ("Dirk",1.92,46,(),"20-03-1977");  # empty list interpolation
@e = ( @b, @c );                        # same as (1,2,3,"Jan","Piet","Marie") 
There are a number of practical construction operators for lists:

4.2. More about array variables

Not only can an array hold a list of scalars, it can also give you access to each of the elements individually through their position in the list. For this we use the subscripting operator "[]". Numbering starts at zero, so that $array[0] gives the first item in the list @array. Negative numbers count backwards from the last item in the list (i.e. $array[-1] refers to the last item). You can manipulate list items directly, e.g. $array[0] = 5 or $array[0]++. Note that we use a dollar for this, because we are pointing to a single scalar value. We can also access selections of a list by a list of indices, such as @array[1,5,3], or @array[1..4]. This is called a slice. Now we do not use a dollar, because the result is itself a list.

There are more dualities between lists and scalars, as shown by the following example:

@array  = ("an","bert","cindy","dirk");
$length = @array;             # $length now has the value 4
Because @array is interpreted in a scalar context instead of a list context, it behaves differently. In fact, it represents the number of items in the array. As Larry Wall et al. say in Programming Perl: "You will be miserable until you learn the difference between scalar and list context...". Read more about it in a reference book! (Note that you can always force a scalar interpretation with the scalar() function.

Another way to learn about the number of items in a list is through the special notation $#array. This gives the index value of the last element of @array. Thus:

@array  = ("an","bert","cindy","dirk");
$length = @array;
print $length;                # prints 4
print $#array;                # prints 3
print $array[$#array]         # prints "dirk"
You can also assign to lists (i.e. use lists of things on the left hand-side of an assignment), For example:
($a, $b) = ("one","two");
($onething, @manythings) = (1,2,3,4,5,6)  # now $onething equals 1
                                          # and @manythings = (2,3,4,5,6)
($array[0],$array[1]) = ($array[1],$array[0]); # swap the first two
Pay attention to the fact that assignment to a variable first evaluates the right hand-side of the expression, and then makes a copy of the result of that evaluation, including arrays (including copying extremely large arrays, so later you will need ways to work around this). So, after the following statements:
@array  = ("an","bert","cindy","dirk");
@copyarray = @array;         # makes a copy
$copyarray[2] = "XXXXX";
@array will still hold the original list, whereas @copyarray will be changed to ("an","bert","XXXXX","dirk").

4.3. Manipulating lists and their elements

In this section we will introduce a number of important functions for manipulating lists. Yes, Perl has a lot of functions. We're not going to list them here, because the fastest way to find out about them is to read through the function section of Programming Perl and look at anything you don't recognize that sounds interesting. And keep the Perl philosophy in mind: "There is more than one way to do it" (TM).

4.4 Converting Lists to strings

If we want to represent the contents of an array as a string, or print it, we have several methods at our disposal.

Array variables are interpolated in double quoted strings just like normal scalar variables. The following code:

@array = ("an","bert","cindy","dirk");
print "The array contains $array[0] $array[1] $array[2] $array[3]";
prints:
The array contains an bert cindy dirk
We can also directly interpolate the whole array:
print "The array contains @array";
Gives the same result as above (array items are automatically separated by spaces). However, we can also glue the items in a list together with other separators using the function join STRING LIST. For example:
$string = join ":", @array;  # $string now has the value "an:bert:cindy:dirk"
Note that the glue string is not a regular expression, but a normal string of zero or more characters. If you need to get glue ahead of every item instead of just between items, a simple cheat suffices:
$string = join "+", "", @array; # $string now has the value "+an+bert+cindy+dirk"
Here, the extra "" is treated as an empty element, to be glued together with the first data element of @array. This change results in glue ahead of every element. Similarly, you can get trailing glue with an empty element at the end of the list, like this:
$string = join "\n", @array, ""; # $string now has the value "an\nbert\ncindy\ndirk\n"

4.5. Iteration over lists

It often occurs that we want to perform some operation on all items in an array or list. Knowing that we can access each item by its index, the most straightforward, but not necessarily most elegant method, is to make a for loop over the array:

for( $i=0 ; $i<=$#array; $i++){
   $item = $array[$i];
   $item =~ tr/a-z/A-Z/;
   print "$item ";
}
However, there is a special type of loop over lists, the foreach loop, and it makes the notation a bit more concise:
foreach $item (@array){
   $item =~ tr/a-z/A-Z/;
   print "$item ";        # prints a capitalized version of each item
}
Since the iterator variable ($item) is a reference to the original place in the array, we must, however, take into account that the original array is now changed. This could of course be avoided by making a temporary copy first.

Two more specialized iteration constructs should also be mentioned: grep, and map.

4.6. Programming example

Suppose we have a text file containing people's personalia, addresses, and programming skills:

Bert:Perlemand:12-10-1953:Bladersteeg 1a:3581 XE:Utrecht:Smalltalk:+31-30-4565738
Evelien:Nieuwkluijs:20-03-1977:Albertheijnlaan 75:5036 EE:Tilburg:Basic,Java:+31-13-5354622
Cindy:Thompson:23-05-1969:Keizerlei 203:2000:Antwerpen:Perl,C++,Python:+32-3-2781256
Dirk:Diggler:01-04-1961:Sint-Krispijnstraat 7:8900:Ieper:Java,AWK,Perl:+32-57-229440 
Frank:Schillebeeckx:12-02-1970:Ceciliastraat 13:2800:Mechelen:C++,Java,C,AWK,Perl,Python,PHP,Basic:+32-7-9052782
An:De Wilde:25-02-1975:Van Ostadestraat 182 III:1021 CF:Amsterdam:Visual Basic:+31-20-6777871
and we want to print an alphabetically sorted list of the names, phonenumbers, and number of programming languages of those people who are competent Perl programmers. A first analysis of the task reveals that:
  1. we must read all lines into our program before we can sort anything.
  2. each line contains one record, whose fields are separated by colons (":").
  3. we must select only those lines which contain Perl in the programming skills field (the seventh field).
  4. we must count the number of items in the seventh field.
  5. we must reduce the information in the records to the name, phonenumber, and above count.
  6. the remaining information must be sorted and printed.
Now we can convert the specification to a real Perl program:
#!/usr/local/bin/perl  # only needed on Unix systems

# example.4.6: perform task for programming example 4.6
# usage: example.4.6
# 2000-22-02 zavrel@uia.ua.ac.be

# read all lines in the input
#
while(defined($line = <>)){                 
    
    # cut off the newline
    #
    chomp $line;
    
    # and put the fields in an array
    #
    @fields = split /:/, $line;             

    # look at the programming field
    # (remeber, it starts at zero!)
    # 
    $programmingskills = $fields[6];         
    
    if($programmingskills =~ /Perl/){
	
	# compute the number of languages
	# 
	@languages = split /\,/ , $programmingskills;
	$number = @languages;
	
	# put the first second and last field together in the original format
	#
	$name_phone_number = join ":", @fields[0,1,$#fields], $number; 
	push @selection, $name_phone_number;
    }
}

print map $_ .= "\n", sort @selection;


Previous | Home | Exercises | Next
Last update: February 24, 2000. zavrel@uia.ua.ac.be