Previous | Home | Exercises | Solutions | Next


A Shortcut to Perl

Erik Tjong Kim Sang, Jakub Zavrel, Guy De Pauw and Walter Daelemans
CNTS - Language Technology Group, University of Antwerp

This text is part of the lecture notes for a Perl course taught by the CNTS - Language Technology Group at the University of Antwerp.

7. Programming 2

This section repeats some of the problematic Perl facts presented in the previous six sessions.

7.1. Data structures

Perl contains three data structure types: scalars, lists and hashes. In programs they can be recognized by their first character: a $ for scalars, an @ for lists and a % for hashes. Scalar variables may contain both numbers and strings. Lists and hashes both are sets of scalar values but lists require numeric keys while hashes can take both numbers and strings as keys. The interpretation of a variable depends on its context:

   $scalar = 123;      # (1) lines result
   $scalar = "123";    # (2) in the same
   $t = $scalar . " "; # $scalar used as string
   $t = $scalar + 1;   # $scalar used as number
   @t = $scalar;       # $scalar used as list
   $t = @list;         # @list used as number
   @t = @list;         # @list used as list
   %t = @list;         # @list used as hash
   @t = %hash;         # %hash used as list
   %t = %hash;         # %hash used as hash

The operations are a recursive application of operators. For example $x + 1 . "y", contains the addition of 1 to $x after which a y is concatenated to the result. Usually the operations are performed from left to right but some operators take precedence over other and change the left-to-right evaluation. For example, 123+4*5 results in 143 and not in 635.

7.2. Control structures

The conditional structure if can be used for conditionally executing some commands. Perl contains three iterative structures for repeatedly executing a command block: for, foreach and while. The three are more or less equivalent. Both if, for and while use a condition to determine which commands to execute. The condition consists of a comparison of numbers or strings. It can also contain a test to see if a string matches a regular expression.

The control structures can be nested. This means that you can say something like: if some condition is true then while some other condition is true execute these commands. This nesting may contain many levels. When you design your programs you should view your programs as containing code for performing tasks at several levels. Here is an example:

You task is to print the characters of a text which is stored in a list in which each element is a line. Each character should be printed at a different line. This task contains three levels. The bottom level prints a single character. The next level prints a line and the top level prints the text. The two highest levels use the code of the level immediately under them. The code will contain two loops: one for printing all characters on a line (intermediate level) and one for printing all lines (top level). Here is an example program:

   # text is stored in @lines
   foreach $line (@lines) {     # top level loop
      @chars = split(//,$line); # put chars in list
      foreach $char (@chars) {  # level 2 loop
         print "$char\n";       # bottom level task

This short part of code might be part of a larger program. Whenever you need something for printing a text stored as a list of lines, you can use this Perl code. You only have to make sure that the text is stored in in a list called @lines and that the variables that are modified here ($line, @chars and $char) do not mess up the values of variables with the same name that might appear in some other part of the program. In the next session we will present a programming construct which makes it possible to re-use code without having to worry about variable names.

7.3. Regular expressions

A regular expression is a description of a set of strings. We say that a regular expression matches a string if the string is part of the set that the regular expression describes. These expressions can be tested with conditional structures: if string =~ /regexp/ succeeds then the regular expression regexp matches the string string. If the expression is a simple sequence of characters like xxx then it will match this sequences anywhere in the target string. However, the expression can be expanded to match the sequence only at the start of a string (^xxx), only at the end (xxx$) or matching the complete string (^xxx$).

Regular expressions are useful because they contain character class names and quantifiers. The class names can be used as an abbreviation for character sets, for example \w stands for [a-zA-Z_0-9]. Quantifiers can be used for matching reoccurring patterns. For example, x+ matches a sequence of one or more x's. The quantifiers operate on the characters just before them. If you want to match reoccurring sequences with a basic element longer than one character, you need to enclose the basic element in brackets: (ha)+ matches a sequence of one or more ha's.

Previous | Home | Exercises | Solutions | Next
Last update: March 16, 2000.