Previous | Home | Lecture Notes | Exercises

 

Perl Solutions (8)


These exercise solutions are part of a Perl course taught at CNTS - Language Technology Group at the University of Antwerp.

The programs corresponding with these exercises can be found in the appendix.

Exercise 8.1

Write two error checking routines for the texttool program: checkError1 checks if @args has the correct number of arguments and checkError2 checks if these arguments are valid texts.

The subroutines used default processing: they return an error code larger than zero if the error is present. Otherwise they reach the end of the subroutine and return 0 (no error). Both check their own arguments for inconsistencies. The program was tested with the example provided in the exercise and it performed exactly as was required: output 0\n1\n0\n2\n".

Exercise 8.2

Write a program that reads a text that the user types in at the keyboard and prints each line in the text together with some context.

The program checks possible problems with its arguments and then applies the subroutine printText to the output of readText. Example run:

   $ perl -w expand 1                              
   she
   saw
   sad
   sly
   sip
   spa
   ^Z
   ??? she saw
   she saw sad
   saw sad sly
   sad sly sip
   sly sip spa
   sip spa ???

Exercise 8.3

Write a recursive subroutine which takes a non-negative integer as input and returns the corresponding Fibonacci value.

If we use f(n)=f(n-1)+f(n-2) in our program then the computation of f(n) will require two computations of f(n-2): the one mentioned in the formula plus another one via the f(n-1) call. They will invoke three computations of f(n-3), four of f(n-4) and so on until n-1 computations of f(1) are performed. This will get out of hand for large n. We would like every f(n) to be computed only once.

My subroutine f is able to use only one computation for each n by using another subroutine fibonacci which returns two values: the n-th Fibonacci value and the (n-1)-th Fibonacci value. The next pair in the sequence is (f(n)+f(n-1),f(n)). The second subroutine checks if the 0-th or the first number in the sequence is required and returns the correct pair. The first subroutine calls the second with the same argument and returns the first of the two values it gets back.

We have tested the program for computing f(10) and f(50) and the results were 55 and 12586269025.

Exercise 8.4*

Write a simple hyphenation program.

I have written a program which hyphenates Dutch words. It uses the following rules (test word is palindroom):

  1. Put a hyphen between each vowel (a,e,i,o,u,ij) and a successive non-vowel. This generates pa-li-ndroo-m.
  2. Append word-final syllable without vowels to previous syllable. Combined with the previous rules this generates: pa-li-ndroom.
  3. Whenever a syllable ends with a vowel while the next starts with two or more non-vowels, move the first consonant from the second to the first syllable. Combined with the previous rules this generates: pa-lin-droom (correct).

The program treats two combinations of characters as units: "ij" and "ch". This means that no hyphens can appear between the characters in the units. It contains one subroutine for each rule and words are hyphenated by applying rule 3 to the output of rule 2, which is applied to the output of rule 1. Here are some examples:

   Correct            Wrong
   ach-ter-blij-ver   *an-gstschreeuw (angst-schreeuw)
   ka-ra-vaan         *bijec-tie (bi-jec-tie)
   voet-bal-spel      *zeeen (zee-en)

So the program has problems with long vowel sequences (zee-en), exceptional unit breakups (bi-jec-tie) and codas with more than one consonant at the end (angst-schreeuw).

Exercise 8.5*

Write a program that reads a text and converts it to a version in which every syllable has been spelled backwards.

My solution uses all subroutines of exercise 8.4* and the tokenize code of exercise 3.5*. I have recognized two unsolved problems with the program apart from the fact that the hyphenation code is far from perfect. First, it deletes all characters outside of the range a-zA-Z0-9-, which means that punctuation marks are removed. Second, it does not deal correctly with hyphens in the words, that is when a word contains a hyphen, its hyphenation is almost certain to be wrong. Because of these two problems, the program cannot be used for converting the text back to the original. However, even if this program was perfect, it would not be able to convert text back because the hyphens in the inverted text will not be put in the same positions as in the original text: absent -> batnes -> tabsen. Example run:

   $ cipher
   Can you accommodate the absent-minded professor?
   ^Z
   nac uoy camocomadet eht banesnim-tded orpsefros

Appendix

Exercise 8.1

sub checkError1 {
   if (not(defined($_[0]))) { print "cannot happen\n"; exit(1); }
   if (@args != $_[0]) { return(1); }
   return(0);
}

sub checkError2 {
   my $i;
   foreach $i (@_) { 
      if ($i !~ /^[0-9]+$/) { print "cannot happen\n"; exit(1); }
      if (not(defined($text{$args[$i]}))) { return(2); }
   }
   return(0);
}

Exercise 8.2

# exercise82: print text with context
# usage: exercise82
# 2000-03-29 erikt@uia.ua.ac.be

use strict;

# error checking
if (not(defined($ARGV[0]))) { 
   print "usage: expand number\n"; 
   exit(1); 
}
if ($ARGV[0] !~ /^[0-9]+$/) { 
   print "argument is not a number\n"; 
   exit(1); 
}

# variable initialization
my $context = shift(@ARGV);

# main program
&printText(($context,&readText()));
exit(0);

# subroutines
sub readText {
   my @text = ();
   while (<>) { chomp(); push(@text,$_); }
   return(@text);
}

sub printText {
   my ($i,$j,$context,@text);
   ($context,@text) = @_;
   for ($i=0;$i<@text;$i++) {
      for ($j=$i-$context;$j<=$i+$context;$j++) {
         if ($j >= 0 and $j < @text) { print "$text[$j]"; }
         else { print "???"; }
         print " " if ($j != $i+$context);
      }
      print "\n";
   }
}

Exercise 8.3

# f(x): returns the x-th fibonacci number
sub f {
   # initialization
   my ($f1,$f2);
   # error checking
   if (not(defined($_[0]))) { print "cannot happen\n"; exit(1); }
   if ($_[0] !~ /^[0-9]+$/) { print "cannot happen\n"; exit(1); }
   # computation
   ($f1,$f2) = &fibonacci($_[0]);
   return($f1);
}

# fibonacci(x): returns the x-th and the (x-1)-th fibonacci number
sub fibonacci {
   # initialization
   my ($f1,$f2);
   # error checking
   if ($_[0] < 0) { print "cannot happen\n"; exit(1); }
   # computation
   if ($_[0] == 0) { return(0,0); }
   if ($_[0] == 1) { return(1,0); }
   ($f1,$f2) = &fibonacci($_[0]-1);
   return($f1+$f2,$f1);
}

Exercise 8.4*

# exercise84: hyphenate words
# usage: exercise84
# 2000-03-29 erikt@uia.ua.ac.be

use strict;

# initialize
my $true = 1;
my $false = 0;
my $word = "";
# process words
while (defined($word)) {
   print "> ";
   $word = <>
   if ($word) {
     &printText(&rule3(&rule2(&rule1(&makeUnits(&cleanUp($word))))));
   }
}
exit(0);

sub rule1 {
   my (@chars,@syllables,$i);
   my $s = 0;
   my $lastChar = "";
   @chars = split(//,$_[0]);
   $syllables[$s] = "";
   for ($i=0;$i<@chars;$i++) {
      # hyphenate between vowels followed by consonants
      if (&vowel($lastChar) and not &vowel($chars[$i])) {
         $s++;
         $syllables[$s] = "";
      }
      $syllables[$s] .= $chars[$i];
      $lastChar =  $chars[$i];
   }
   return(@syllables);
}

sub rule2 {
   my @chars;
   my @syllables = @_;
   # do not allow final syllable without vowels
   if ($#syllables > 0) {
      @chars = split(//,$syllables[$#syllables]);
      if (not(&vowel($chars[$#chars]))) {
         $syllables[$#syllables-1] .= pop(@syllables);
      }
   }
   return(@syllables);
}

sub rule3 {
   my (@chars,@syllables,$s);
   @syllables = @_;
   for ($s=1;$s<@syllables;$s++) {
      @chars = split(//,$syllables[$s]);
      # move first of syllable-initial consonant to previous syllable 
      if (@chars >= 2 and not(&vowel($chars[0])) and not(&vowel($chars[1]))) {
         $syllables[$s-1] .= shift(@chars);
      }
      $syllables[$s] = join('',@chars);
   }
   return(@syllables);
}

sub makeUnits {
   my $word = $_[0];
   $word =~ s/ij/Y/g;
   $word =~ s/ch/G/g;
   return($word);
}

sub breakUnits {
   my $word = $_[0];
   $word =~ s/Y/ij/g;
   $word =~ s/G/ch/g;
   return($word);
}

sub cleanUp {
   my $word = $_[0];
   chomp($word);
   $word =~ tr/a-zA-Z-//cd;
   $word =~ tr/A-Z/a-z/;
   return($word);
}

sub printText {
   my $i;
   for ($i=0;$i<@_;$i++) {
      print &breakUnits($_[$i]);
      if ($i < -1+@_) { print "-"; }
   }
   print "\n";
}

sub vowel {
   return($true) if ($_[0] eq "a");
   return($true) if ($_[0] eq "e");
   return($true) if ($_[0] eq "i");
   return($true) if ($_[0] eq "o");
   return($true) if ($_[0] eq "u");
   return($true) if ($_[0] eq "Y");
   return($false);
}

Exercise 8.5*

# exercise85: make cipher format of text
# usage: exercise85
# 2000-03-30 erikt@uia.ua.ac.be

use strict;

# initialize
my $true = 1;
my $false = 0;
my $text = "";
my $word;
# read text
while (<>) { $text .= $_; }
foreach $word (&tokenize($text)) {
   &printText(&rule3(&rule2(&rule1(&makeUnits(&cleanUp($word))))));
}
print "\n";
exit(0);

sub tokenize {
   $_ = $_[0];
   s/\s+/\n/g;
   s/^\n//;
   s/$/\n/;
   s/([.,!?:;,])\n/\n$1\n/g;
   s/\n(["'`])([^\n])/\n$1\n$2/g;
   s/([^\n])(["'`])\n/$1\n$2\n/g;
   s/([^\n])([.,])\n/$1\n$2\n/g;
   s/\n([A-Z])\n\./\n$1./g;
   s/\n\.\n([^"A-Z])/\.\n$1/g;
   s/(\.[A-Z]+)\n\.\n/$1.\n/g;
   s/([^\n])'s\n/$1\n's\n/g;
   s/([^\n])n't\n/$1\nn't\n/g;
   s/([^\n])'re\n/$1\n're\n/g;
   s/\n\$([^\n])/\n\$\n$1/g;
   s/([^\n])%\n/$1\n%\n/g;
   s/Mr\n\.\n/Mr.\n/g;
   return(split(/\n/,$_));
}

sub printText {
   my $i;
   for ($i=0;$i<@_;$i++) {
      print join('',reverse(split(//,&breakUnits($_[$i]))));
   }
   print " ";
}

# other subroutines are the same as in exercise 8.4*


Previous | Home | Lecture Notes | Exercises
Last update: May 30, 2000. erikt@uia.ua.ac.be