Previous | Home | Exercises | PDF slides | Next

Perl 2007: Lesson 5

This text is part of the lecture notes for a programming course taught at University of Tilburg, The Netherlands.

5. Subroutines

This section presents subroutines and related concepts such as variable scope and program arguments.

5.1. The basic facts

Subroutines are named blocks of code. The fact that they have a name enables us to execute their body of code from anywhere in the program by calling their name. Calls to subroutines can be recognized because subroutine names often start with the special character &. Here is an example:

   sub askForInput {
      print "Please enter something: ";
   }
   # other code is inserted here
   &askForInput();

When the subroutine askForInput is called at the end of the program, its body will be executed and the request for input will be printed. Note that the subroutine body will only be executed when the subroutine is called. So if the subroutine is located at the start of the program, the code in the body will just be skipped when the program is read.

Subroutines will perform small and useful tasks. After you have been programming for some months, you will have written many subroutines. Some of these will be used in different programs. You do not want to have to copy a subroutine to a program file every time you need it. There is a convenient work-around for this: put related subroutines in a file and include the file with the command require:

   # files with subroutines are stored here
   use lib "/home/wwwppl/erikt/perllib"; # directory
   # we will use this file
   require "nlp.pm"; # file: /home/wwwppl/erikt/perllib/nlp.pm

Here we start with defining in which directory the files with subroutines have stored. Then we read the file we want to use. Files included like this are called packages, modules or libraries. It is customary to give them a name with extension .pm which does not have to be specified in the require command. So in this example we are actually including file nlp.pm. When you write your own packages, you should be aware of the fact that Perl requires packages to return a true value. This can be enforced by letting the packages end with a one followed by a semicolon (1;):

   # example .pm file with one subroutine
   sub askForInput {
      print "Please enter something: ";
   }
   # to avoid "did not return a true value" error line
   1;

5.2. Variable scope

Suppose we have written a program with one subroutine. A variable $a is used both in the subroutine and in the main part program of the program. Here is an overview of the program:

   $a = 0;
   print "$a\n";
   sub changeA { $a = 1; }
   print "$a\n";
   &changeA();
   print "$a\n";

The value of $a will be printed three times and the question is what values will be printed. The first time the value is printed, the value 0 has been put in $a so this value will be printed. This will also happen the second time because the body of the subroutine definition will be skipped. The third time, the value 1 will be printed because $a has been changed by the subroutine call on the previous line. In this example, the variable $a can be accessed throughout the program: both in the main part and in the subroutine. We say that $a is a global variable.

Suppose that we have written a subroutine which is included in some package. This subroutine may modify many variables. When we use this subroutine in some large program, we do not want to have to look inside the subroutine to see which variable names might clash with the variables of the main program. We want the subroutine to hide its variables from the rest of the program. This can be done by declaring the variables with my:

   my $a = 0;
   print "$a\n";
   sub changeA { my $a = 1; }
   print "$a\n";
   &changeA();
   print "$a\n";

Because we have put my before the first use of $a in the subroutine, changeA obtained its own variable $a which is not related to variable used in the main program. This means that the value of the main program variable will not be changed and the program will print 0 three times. The my construct influences the scope of a variable: the part of the program in which it can be used. It restricts it to the part starting after the variable definition and ending at the end of the subroutine in which the variable is defined.

5.3. Communication between subroutines and programs

Subroutines communicate with other parts of programs by exchanging variable values. Input of a subroutine can be specified by providing the input as arguments of the subroutine call, for example: &doSomething(2,"a",$abc). A peculiarity of Perl subroutines is that they convert their input variables to a flat list. This means that &doSomething((2,"a"),$abc) will result in the same as the earlier example. Inside the subroutine the argument values can be accessed via the special list @_. So the first argument (here 2) will be put in $_[0], the second (here "a") in $_[1] and the third (here $abc) in $_[2].

When a variable is used as argument of a subroutine, then a modification of the @_ location corresponding with the argument will result in a modification of the variable. So, if the subroutine doSomething modifies $_[2], after the example call in the previous paragraph, then $abc will be modified. It is more difficult to globally change the contents of list arguments. This requires using references which we will not deal with here (see chapter 4 of the book Programming Perl).

Just like a subroutine uses a list for its input, it also uses a list as output. In general the return value of a subroutine is equal to the return value of its final command. We can enforce a specific return value by specifying it as final part of the subroutine. For example, if we end a subroutine with a line containing (1,2), or more explicitly return(1,2), then it will return the list (1,2). These return values can be intercepted by using the subroutine as the right-hand side of an assignment, for example: ($a,$b) = &subr(). The return values of the subroutine will be stored in the left-hand side variables.

The concept of communication explained here does not only apply to subroutines but also to complete programs. A Perl program can also output values by specifying them on the last line as 0 or exit(0) as you may have seen in some of the example programs. These values are only interesting for programs that communicate with other programs or the operating system. More useful is the concept of program arguments. These are the strings specified on the command line after the program call. Perl programs can be called with arguments like in tstprg abc 123. These arguments will be stored in the special variable @ARGV (here $ARGV[0]="abc" and $ARGV[1]=123).

5.4. Programming example

A translation program is a program which translates text from one language to another language. In this programming example we will construct a translation program with subroutines. We will also enable the user to influence the behavior of the program by using command line arguments. The program will translate Dutch to English or English to Dutch and the user can use the arguments d-e and e-d to enforce one of the translation directions. We start with making a top-level description of the program:

   # determine translation mode
   # repeat forever
      # read text
      # translate
      # print result

This program contains a loop and four tasks. Each of the tasks can be put in a subroutine. However, the reading and printing bits are simple so we will only create subroutines for mode determination and translation. The translation memory will consist of a hash with Dutch words as keys and English words as values. Translation consists of word lookup and translation from English to Dutch is required we will swap the keys with the values (reverse in Perl). We will start with filling the translation memory and determining the translation mode:

   use strict; # make definition of variables compulsory

   my %dict = qw(Jan John 
                 en and
                 Marie Mary
                 gingen went
                 naar to
                 het the);

   # determine translation direction from first argument
   sub detTrMod {
      if (defined $ARGV[0] and $ARGV[0] eq "e-d") {
         # English - Dutch required: reverse dictionary
         %dict = reverse(%dict);
      } elsif (not defined $ARGV[0] or $ARGV[0] ne "d-e") {
         # argument neither e-d nor d-e
         print "usage: perl -w translate.pl e-d|e-d\n";
         exit(1);
      }
      # remove direction from argument list
      shift(@ARGV);
   }

The subroutine detTrMod does not use arguments and returns no values. It checks the argument of the program and reverses the translation dictionary when English to Dutch translation is required.

The translation part is more complex than the subroutine detTrMod. The text will be received in a string. It needs to be converted to a list, be translated and be converted back to a string. Since we will translate by lookup in the dictionary, we need to perform some cleaning up as well by removing punctuation marks. We will keep the translation subroutine restricted to processing a clean list of words:

   sub translate {
      my @translation = ();
      my $word;
      foreach $word (@_) {
         if (defined($dict{$word})) {
            # known word: store translation
            push(@translation,$dict{$word});
         } else {
            # unknown word (e.g. restaurant): just copy it
            push(@translation,$word);
         }
      }
      return(@translation);
   }

translate receives a list of words as it arguments. It looks up each of these words in the dictionary and adds the translation of each word to the list of translated words. When a word is not specified in the dictionary, the word itself will be added to the translation. After having processed all words, translate returns the list of translated words.

The translation subroutine works by iteration: it contains a loop which processes one word after another. It is also possible to to work by recursion: translate one word and leave the rest of the translation to an embedded call of translate. Here is an example:

   sub translate {
      if (not @_) { return(); } # nothing to translate
      else { 
         my ($word,@rest) = @_;
         if (defined($dict{$word})) {
            # known word: store translation
            return($dict{$word},&translate(@rest));
         } else {
            # unknown word (e.g. restaurant): just copy it
            return($word,&translate(@rest));
         }
      }
   }

This version starts by checking if the subroutine was called with an empty argument list. If that is the case we return the empty list since there was nothing to translate. Otherwise, we take the first word from the list and return its translation, if one exists, and the translation of the rest. For the latter part we trust translate to be able to translate the rest of the text.

The translation task can be solved both by iteration and by recursion. For some other tasks recursion is the best solution and that is why we showed you an example of recursion right here.

Now we only have to specify the main part of the program:

   my $text = "start text; will be ignored";
   my @text = ();   # input text
   my @translated;  # translated text
   my $translated;  # translated text

   &detTrMod(); # determine translation direction
   while (defined($text) and $text ne "") {
      print "> ";
      $text = <STDIN>;
      if (defined $text) { chomp($text); }
      if (defined($text) and $text ne "") {
         # remove non-word characters
         $text =~ tr/[a-zA-Z0-9 ]//cd;
         # convert string $text to list @text
         @text = split(/\s+/,$text);
         # translate words
         @translated = &translate(@text);
         # convert translated word list to string
         $translated = join(" ",@translated);
         print "$translated\n";
      }
   }

After initializing two variables the program determines the translation direction and enters a loop. In the loop, a text is read and non-word characters except white space characters are removed from the text. Next, the text is converted to a list, translated, converted back to a string and printed. The complete program is capable of translating the sentence John and Mary went to the restaurant both from English to Dutch and the other way around.

Previous | Home | Exercises | PDF slides | Next

Last update: October 09, 2007. erikt(at)science.uva.nl