Previous | Home | Exercises

 

A Shortcut to Perl

Erik Tjong Kim Sang, Jakub Zavrel, Guy De Pauw and Walter Daelemans
CNTS - Language Technology Group, University of Antwerp
http://lcg-www.uia.ac.be/~erikt/perl/


This text is part of the lecture notes for a Perl course taught by the CNTS - Language Technology Group at the University of Antwerp.

10. Programming (3)

In this section we will start with a looking at some small useful topics which have not been discussed in the previous sections. After that we will briefly pay some attention to memory issues. We will close off the course by looking forward to how you can use Perl in your daily work.

10.1. Some useful topics

One operator which we have missed in the previous sections is the question mark operator. This operator can be used for conditionally executing expressions. For example: $a = $b ? $c : $d means: if $b is true then put the value of $c in $a else put the value of $d in $a. This operator is useful when you increment a hash entry which might be undefined: $h{$a} = $h{$a} ? $h{$a}+1 : 1 will increment $h{$a} when it is defined and otherwise store 1 in it.

In the previous section, we have seen that <STDIN> is equivalent to <>. However, the latter behaves differently from the first when the program parameter list @ARGV is non-empty. In that case, <> will interpret @ARGV as a list of file names and will attempt to read a line from the first one. So when a program uses command line parameters while it reads from standard input (the keyboard) then using <STDIN> should be preferred over using <>.

There are two special variables in Perl which should be mentioned. The first is list separator $" (default value space) which defines the separator which should be put between list elements when they are interpolated. For example: $" = |; print "@a"; will print the list @a with pipe symbols between the elements. The second variable is the input record variable $/ (default value newline) which contains the separator character between records in input files. Normally when you read an item from a file or from the keyboard with <FILE> you will read string of characters until the next newline. You can change this boundary character by changing the $/ variable.

When a program is generating output and needs to register when this output was generated, then it needs access to the current time. For this purpose, it can use the build-in function time. However, this function generates the current time in seconds, which is not very helpful for humans. Instead, call localtime. This will return a list of nine numbers which represent the current second, minute, hour, day of the month, month, year, day of the week, day of the year and the daylight saving time situation, respectively. Note that all of these are represented with numbers so you must make the conversion to things like month name (0-11) and week day names (0-6) yourself.

A final phrase which we should mention is the #!/usr/bin/perl -w line which you may encounter at the top of some Perl programs. That sequence is used for making it possible to run a Perl program as a separate program, that is as program.pl instead of as perl -w program.pl. The #! characters in this phrase must be the first two characters of the file. The remaining part of the line until perl specifies the location of Perl on your system. Instead of -w any Perl parameter can be specified. As far as we know, this construction only works on Unix and Linux systems, not at pc's and macs.

10.2. Memory management

Every variable you use in Perl will occupy some memory space. The memory of your computer is limited so when you work with a lot of data then you should consider the amount of memory that your program is using. Let's look at a program which reads a file and prints the first word of each line in another file:

   use strict;
   open(IN,"in"); 
   my @lines = <IN>
   close(IN);
   open(OUT,">out"); 
   foreach (@lines) { print OUT m/([^\s]+)/,"\n"; }
   close(OUT);

This program reads the input file and stores it in the list @lines. This is no problem for the files we have worked with up until now. However, suppose you use this program for processing a file of 100 megabytes. In that case the list in the program will require 100 megabytes of memory. If you try this on a computer with less than 100 megabytes of memory, then your program will run out of memory and perhaps crash.

This problem does not mean that you cannot process big files on computers with few memory. The trick here is to divide the file in sections, process the file section by section and store each section on disk as soon as you are finished with it. The most natural parts to divide text files in are lines. The task we are trying to solve here can perfectly be solved by a program which processes a file line by line. This leads us to a second, improved version:

   use strict;
   open(IN,"in"); 
   open(OUT,">out");
   while (<IN>) { print OUT m/([^\s]+)/,"\n"; }
   close(OUT);
   close(IN); 

This version uses one scalar variable ($_) which contains one line at a time. This program does not require more memory than the length of the longest line in the file. This is a big improvement with respect to the previous version. Natural language processing software regularly uses big data files, so try to be aware of how your files are processed when you are writing language processing software.

10.3. The final words...

This is the final section of these lecture notes. Over the past ten weeks we have tried to teach you Perl. We hope you have acquired sufficient knowledge about this programming language to be able to apply it in your daily work. Ten weeks is insufficient to master a programming language, let alone to become a good programmer. We have not paid much attention to program design, standard algorithms like searching and sorting, complex data structures like trees, program testing and documentation, all of which are essential to full-time programmers. There is much left to be learned.

When you continue with Perl after this course, you will undoubtfully experience some moments where you are uncertain about how to solve particular problems in this programming language. There are three basic ways to overcome these moments. Firstly, you can consult a Perl book. We have not attempted to cover every aspect of Perl in these lecture notes so if you continue with Perl we advise you to buy a good Perl reference book: it will prove to be useful! Secondly, you can consult online Perl resources like www.perl.com. Apart from background knowledge in Perl, they often have pointers to online communities of helpful user groups. And thirdly, you can ask your colleagues or your local Perl guru for tips which could help you to solve your programming problem.

Remember that programming is a skill which you will only master by programming a lot. Good luck!


Previous | Home | Exercises
Last update: April 07, 2000. erikt@uia.ua.ac.be