Previous | Home | Exercises | Solutions | Next

 

A Shortcut to Perl

Erik Tjong Kim Sang, Jakub Zavrel, Guy De Pauw and Walter Daelemans
CNTS - Language Technology Group, University of Antwerp
http://lcg-www.uia.ac.be/~erikt/perl/


This text is part of the lecture notes for a Perl course taught by the CNTS - Language Technology Group at the University of Antwerp.

6. Programming

This section contains some background information on programming. We will look at the semantics of programs, examine debugging strategies and give some general programming tips.

6.1. Semantics of programs

A Perl program is a list of statements. During its execution, the program proceeds through a number of states. The states can be identified by the contents of the variables and the current execution location. Here is an example:

   # state A: {}
   $x = 1;
   # B: {$x==1}
   $end = 9;
   # C: {$x==1 and $end==9}
   while ($x <= $end) {
      # D: {1<=$x<=9 and $end==9}
      if (2*int($x/2) == $x) {
         # E: {1<=$x<=9 and $end==9 and 2*int($x/2) == $x}
         print "$x\n";
         # F: E
      }
      # G: D
      $x++;
      # H: {1<$x<=10 and $end==9}
   }
   # I: {$x==10 and $end==9}

This program prints the even numbers in the range 1-9. Before it starts, all variables are empty so the initial state (A) is empty as well. We give $x value 1 and this equality is present in the next state (B). Next, we initialize $end which influences the next state (C). After the start of the next while loop, the exact value of $x is not known but we know it should be somewhere in the range 1-9 (D). We test if it is even and print it. If the test is satisfied, the state is exactly the same as state D except that $x is even. After the test, we increase $x which is acknowledged in the next stage by putting the variable in the range 2-10. After the loop $x must be 10. Note that the value of $end did not change during the loop. In other words, state relation $end==9 is invariant for this loop.

The example program is small and it is possible to write down all conditions for each state. Most real programs will be too big for checking all state conditions. However, being aware of the fact that a program statement can be seen as a method for changing run-time states, is absolutely essential for being able to debug programs. In the next subsection we will use this knowledge.

6.2. Debugging

Finding errors in programs by running them is called debugging. Together with testing, debugging is an important and time-consuming part of software development. In order to be able to discover an error in a program, you must understand it. Therefore it is important to write down programs as clearly as possible. In case reading a program does not reveal the error, you need to run it and attempt to find out where it goes wrong. We will try to do this with the following program:

   $x = 1;
   $end = 9;
   while ($x <= $end) {
      if (2*int($x) == $x) {
         print "$x\n";
      }
      $x++;
   }

This program should print the even numbers in the range 1-9. We test it and find out that it does not print anything. Something is wrong. The program contains two tests: one for entering the while loop and one behind the if statement. Our first concern is that the program does not enter the loop. We test this by adding the statement print "DEBUG: $x\n"; as first statement in the loop and running the program. We know from our state analysis that at that position $x will have values 1-9. The program should print these values and this is exactly what it does. We conclude that the loop is correct.

Since the program does not print the even values of $x, something must be wrong in the if part. We test this by replacing the extra print statement with print "DEBUG (", 2*int($x), ",$x)\n";. We expect that for the first three values of $x, the program will print the pairs (0,1), (2,2) and (2,3). However, it prints (2,1), (4,2) and (6,3). This helps us to find the error: something is wrong with the part 2*int($x): we forgot to divide $x by 2 before computing the integer value. We correct this error, run the program again and find out that it performs correctly. Then we remove the extra print statement.

Instead of adding debug print lines to the program we can also use the Perl debugger for checking the program. It can be started by running Perl with option -d. Here is an example run for our test program:

   $ perl -d test.pl  
   main::(test.pl:1):    $x = 1;
     DB<1> n
   main::(test.pl:2):    $end = 9;
     DB<1> 
   main::(test.pl:8):    }
     DB<1> 
   main::(test.pl:4):       if (2*int($x) == $x) {
     DB<1> 
   main::(test.pl:7):       $x++;
     DB<1> 
   main::(test.pl:3):    while ($x <= $end) {
     DB<1> 
   main::(test.pl:4):       if (2*int($x) == $x) {
     DB<1> 
   main::(test.pl:7):       $x++;
     DB<1> p $x
   2
     DB<2> p 2*int($x)
   4
     DB<3> q

Here we execute the program statement by statement. The debugger shows each statement before it is executed. Together with the statement, it displays the program name and the current line number. The step-by-step execution can be started by entering the command n (next) and it is continued by pressing Enter. After having executed eight statements, something unexpected happens. The program should print $x for $x==2 but it skips the print statement and signals that it is about to execute $x++;. This means that the if condition has failed. We can test what went wrong by printing the associated expressions with command p. This reveals the problem. We leave the debugger with command q.

6.3. Miscellaneous

A program is called robust when it is impossible to get it into an unforeseen state. Such a state is bound to lead to unpredictive and erroneous behavior. A big problem for programs is data that is entered by users or submitted by other programs. We should take care that the format of the data is checked and that unexpected formats or data values are handled in a reasonable way. This often requires a lot of extra code (see programming example).

During programming you will inevitably have made some Perl syntax errors. The Perl interpreter will respond to those errors that it recognizes by printing an error message. Detecting errors is hard and classifying them is even harder. Therefore error messages can be incorrect and confusing. One programming error may trigger many messages. So in case many error messages are printed, start with correcting the first rather than the last. Remember that an error message may have been caused by something that happened on a previous line (omitted quote, bracket or semicolon). And when you run Perl, always run it with option -w. It will make Perl signal as many programming problems as it can detect and help you to find errors at an early stage.

6.4. Programming example

We will construct a program for manipulating text. The actions performed by the program will be specified by the commands entered by a user. This means that the program has a command line interface. Text will be stored in variables. The user can specify on which texts the commands should operate.

The program should read lines with commands until a quit command is entered. For each line that is entered, it should extract the command and its arguments. Then the command should be executed and the next command should be read. Here is a first version of the program texttool:

   # texttool: process text
   # usage: texttool
   # 2000-03-08 erikt@uia.ua.ac.be

   %text = ();
   $quit = 0;
   while (not $quit) {
      # read command
      print "> ";
      $commandLine = <>;
      chomp($commandLine);
      $commandLine =~ s/^\s*//;
      ($command,@args) = split(/\s+/,$commandLine);
      # execute command
      if ($command eq "quit") { $quit = 1; }
      elsif ($command eq "print") { print $text{$args[0]}; }
      elsif ($command eq "read") {
         $text{$args[0]} = "";
         while (<>) {  $text{$args[0]} .= $_; }
      } else { print "unknown command $command\n"; }
   }
   exit(0);

It stores the texts in a hash while using the name of each text as a key. This program works fine for valid commands. However, the user might make errors when entering commands. We need to define how the program should behave for errors in the input. Therefore we add extra code for testing the format of the commands. It should be inserted before the command execution code:

   $error[1] = "incorrect number of arguments";
   $error[2] = "text variable does not exist";
   ...
   # test command format
   if (defined($command) and $command ne "") {
      $error = 0;
      if ($command eq "quit" and @args != 0) { $error = 1; }
      if ($command eq "print" and @args != 1) { $error = 1; }
      if ($command eq "read" and @args != 1) { $error = 1; }
      if ($command eq "print" and @args == 1 and 
          not(defined($text{$args[0]}))) { $error = 2; }
      if ($error > 0) { 
        print "$error[$error]\n";
      } else {
         # execute command
         ...
      }
   }

The error checking code covers five problems: empty line input, commands with an incorrect number of arguments (three times) and print applied to an undefined text. This is a modest number but still the error checking part is almost as long (14 lines) as the original program (16 lines). This is not an exception. The largest part of software deals with error detection, error handling and user interface.

In the exercises we will expand texttool with other useful functions.


Previous | Home | Exercises | Solutions | Next
Last update: March 10, 2000. erikt@uia.ua.ac.be