Previous | Home | Lecture Notes | Exercises | Next

 

Perl Solutions (6)


These exercise solutions are part of a Perl course taught at CNTS - Language Technology Group at the University of Antwerp.

The programs corresponding with these exercises can be found in the appendix. Each exercise adds some extra functionality to the program texttool.

Exercise 6.1

count x: counts the lines in text x and shows the result.

The number of lines of a text can be obtained by splitting the text in lines and taking the number of elements in the result. Example run:

> read test
abc
def
> count test
2

Exercise 6.2

grep string x y: selects all lines containing the string string from text x and puts these in text y. The string may not contain white space.

This function looks like read but it reads lines from another text and only adds a line if it contains the specified string. Example run:

> read test
abc  
def
> grep e test result
> print result
def

Exercise 6.3

cat x y: puts text x behind text y. The result is stored in text y.

We have allowed the second text to be undefined. In that case cat x y means copy x to y. Example run:

> read first
abc
def
> read last
ghi
jkl
> cat first last
> print last
ghi
jkl
abc
def

Exercise 6.4

chars x y: divides text x in characters and puts these in text y with each character on a different line. The white space characters in x should be replaced by hash symbols (#).

We store the first text in a temporary variable, replace white space by hashes, split it to a character list and join the list with newline characters as separators. After adding a final newline, we obtain the required result. Example run:

> read test
ab c
> chars test result
> print result
a
b
#
c
#

Exercise 6.5

replace string1 string2 x y: replaces all occurrences of string string1 by string string2 in text x and puts the result in text y. The strings may not contain white space.

We copy the first text to the second and replace the strings. Example run:

> read test
abc
dbf
> replace b e test result
> print result
aec
def

Exercise 6.6

delete x: deletes text x.

The Perl function undef can be used for removing a variable. Example run:

> read test
abc
> delete test
> print test
text variable does not exist

Exercise 6.7

paste x y z: puts the lines of text y next to those of text x and places the result in text z. The lines should be separated by a single space.

The two source texts are converted to lists of lines and the lines are added to a text until the lists are empty. In case one of the texts is shorter than the other, the final lines of the new text will only contain elements of the longer source text. Example run:

> read test1
abc
def
> read test2
ghi
jkl
> paste test1 test2 result
> print result
abc ghi
def jkl

Exercise 6.8

tail number x y: copies the lines of text x to text y starting from the line specified with number.

This is a variant of grep but the restriction on which lines to include is made on the basis of their position in the text. The error code required extra attention because of the presence of a number argument. We have to make sure that the number argument contains a number and that the number is not larger than the amount of lines in the text. Example run:

> read test     
abc
def
ghi
> tail 2 test result
> print result
def
ghi

Exercise 6.9

tokenize x y: divides text x in tokens and puts the result in text y one token per line.

We have used the tokenize code provided by the teacher with the solution of exercise 3.5*. Example run:

> read test        
Oh no! Mr. John's parrot died?
> tokenize test result
> print result
Oh
no
!
Mr.
John
's
parrot
died
?

Exercise 6.10

uniq x: counts how often lines occur in text x and prints the result, with the most frequent lines first

We need three loops. The first counts the number occurrences of each line. The second prints each different line with the number of times that it has occurred. The third is embedded in the second. It selects the most frequent line from the lines that have not been printed yet. Example run:

> read test   
abc
def
abc
> chars test result
> uniq result
3 #
2 a
2 b
2 c
1 d
1 e
1 f

Exercise 6.11*

The commands in texttool require at least one text argument. Modify the program in such a way that when commands are entered without this text argument, a default text is processed.

We have allowed omitting any text argument in any command. Before the error checking code, we have inserted code that checks if a text variabele has been omitted. All missing text variables have been replaced by a default text name for which we have chosen the empty name since this name will never occur. The remainder of the program has been left unchanged. Example run:

> print
text variable does not exist
> read
abcd
> paste
> chars
> count
10
> print
a
b
c
d
#
a
b
c
d
#

Exercise 6.12*

alias command1 command2: creates an alias for command command1. From that moment on command1 can be executed by entering command2

We have created a hash containing all the commands as keys with the same command as value, for example $alias{"cat"} = "cat". The alias command will bind a new name to a known command, for example $alias{"concatenate"} = "cat". In the previous version, texttool would read a command and attempt to execute it. In this version between reading and executing a command, the program will look the command up in the alias table and execute the command found there. Because of this, the code that was written in the earlier exercises could be re-used without changes. Example run:

> read test
abc
> alias alias rename
> rename print show
> rename show display
> display test
abc

Appendix

Exercise 6.1

# added error checking code for count
if ($command eq "count" and @args != 1) { $errorNbr = 1; }
if ($command eq "count" and @args == 1 and 
    not(defined($text{$args[0]}))) { $errorNbr = 2; }

# added processing code for count
elsif ($command eq "count") {
   $tmp = 0;
   while ($text{$args[0]} =~ /\n/g) { $tmp++; }
   print "$tmp\n";
   # earlier solution removed trailing empty lines:
   # @lines = split(/\n/,$text{$args[0]});
   # print $#lines+1,"\n"; 
   # returns 0 for text containing any number of empty lines
}

Exercise 6.2

# added error checking code for grep
if ($command eq "grep" and @args != 3) { $errorNbr = 1; }
if ($command eq "grep" and @args == 3 and 
    not(defined($text{$args[1]}))) { $errorNbr = 2; }

# added processing code for grep
elsif ($command eq "grep") {
   @lines = split(/\n/,$text{$args[1]});
   $text{$args[2]} = "";
   for ($i=0;$i<@lines;$i++) {
      if ($lines[$i] =~ /$args[0]/) { 
         $text{$args[2]} .= $lines[$i] . "\n";
      }
   }
}

Exercise 6.3

# added error checking code for cat
if ($command eq "cat" and @args != 2) { $errorNbr = 1; }
if ($command eq "cat" and @args == 2 and 
    not(defined($text{$args[0]}))) { $errorNbr = 2; }
# we allow the second text of cat to be undefined

# added processing code for cat
elsif ($command eq "cat") {
   if (defined($text{$args[1]})) { 
      $text{$args[1]} .= $text{$args[0]}; 
   } else { $text{$args[1]} = $text{$args[0]}; }
}

Exercise 6.4

# added error checking code for chars
if ($command eq "chars" and @args != 2) { $errorNbr = 1; }
if ($command eq "chars" and @args == 2 and 
   not(defined($text{$args[0]}))) { $errorNbr = 2; }

# added processing code for chars
elsif ($command eq "chars") {
   $tmpText = $text{$args[0]};
   $tmpText =~ s/\s/#/g;
   @chars = split(//,$tmpText);
   $text{$args[1]} = join("\n",@chars) . "\n";
}

Exercise 6.5

# added error checking code for replace
if ($command eq "replace" and @args != 4) { $errorNbr = 1; }
if ($command eq "replace" and @args == 4 and 
    not(defined($text{$args[2]}))) { $errorNbr = 2; }

# added processing code for replace
elsif ($command eq "replace") {
   $text{$args[3]} = $text{$args[2]};
   $text{$args[3]} =~ s/$args[0]/$args[1]/g;
}

Exercise 6.6

# added error checking code for delete
if ($command eq "delete" and @args != 1) { $errorNbr = 1; }
if ($command eq "delete" and @args == 1 and 
    not(defined($text{$args[0]}))) { $errorNbr = 2; }

# added processing code for delete
elsif ($command eq "delete") { undef($text{$args[0]}); }

Exercise 6.7

# added error checking code for paste
if ($command eq "paste" and @args != 3) { $errorNbr = 1; }
if ($command eq "paste" and @args == 3 and 
    (not(defined($text{$args[0]})) or
     not(defined($text{$args[1]})))) { $errorNbr = 2; }

# added processing code for paste
elsif ($command eq "paste") {
   @lines0 = split(/\n/,$text{$args[0]});
   @lines1 = split(/\n/,$text{$args[1]});
   $tmpText = "";
   $i = 0;
   while ($i < @lines0 or $i < @lines1) {
      if ($i < @lines0) { $tmpText .= $lines0[$i]; }
      $tmpText .= " ";
      if ($i < @lines1) { $tmpText .= $lines1[$i]; }
      $tmpText .= "\n";
      $i++;
   }
   $text{$args[2]} = $tmpText;
}

Exercise 6.8

# added error checking code for tail
$errorMsg[3] = "expected number argument is not a positive integer";
$errorMsg[4] = "number argument exceeds maximum value";
if ($command eq "tail" and @args != 3) { $errorNbr = 1; }
if ($command eq "tail" and @args == 3 and 
    not(defined($text{$args[1]}))) { $errorNbr = 2; }
if ($command eq "tail" and @args == 3 and defined($text{$args[1]})) {
   if ($args[0] !~ /^[0-9]+$/) { $errorNbr = 3; }
   else {
      @lines = split(/\n/,$text{$args[1]});
      if ($args[0] > @lines) { $errorNbr = 4; }
   }
}

# added processing code for tail
elsif ($command eq "tail") {
   @lines = split(/\n/,$text{$args[1]});
   $text{$args[2]} = "";
   for ($i=$args[0]-1;$i<@lines;$i++) {
      $text{$args[2]} .= $lines[$i] . "\n";
   }
}

Exercise 6.9

# added error checking code for tokenize
if ($command eq "tokenize" and @args != 2) { $errorNbr = 1; }
if ($command eq "tokenize" and @args == 2 and 
    not(defined($text{$args[0]}))) { $errorNbr = 2; }

# added processing code for tokenize
elsif ($command eq "tokenize") {
   $_ = $text{$args[0]};
   # tokenize code from exercise 3.5* by erikt
   s/\s+/\n/g;
   s/^\n//;
   s/([.,!?:;,])\n/\n$1\n/g;
   s/\n(["'`])([^\n])/\n$1\n$2/g;
   s/([^\n])(["'`])\n/$1\n$2\n/g;
   s/([^\n])([.,])\n/$1\n$2\n/g;
   s/\n([A-Z])\n\./\n$1./g;
   s/\n\.\n([^"A-Z])/\.\n$1/g;
   s/(\.[A-Z]+)\n\.\n/$1.\n/g;
   s/([^\n])'s\n/$1\n's\n/g;
   s/([^\n])n't\n/$1\nn't\n/g;
   s/([^\n])'re\n/$1\n're\n/g;
   s/\n\$([^\n])/\n\$\n$1/g;
   s/([^\n])%\n/$1\n%\n/g;
   s/Mr\n\.\n/Mr.\n/g;
   # end of tokenize code
   $text{$args[1]} = $_;
}

Exercise 6.10

# added error checking code for uniq
if ($command eq "uniq" and @args != 1) { $errorNbr = 1; }
if ($command eq "uniq" and @args == 1 and 
    not(defined($text{$args[0]}))) { $errorNbr = 2; }

# added processing code for uniq
elsif ($command eq "uniq") {
   @lines = split(/\n/,$text{$args[0]});
   %freq = ();
   $differentLines = 0;
   # count the occurrences of the lines and store results in %freq
   foreach $line (@lines) {
      if (defined($freq{$line})) { $freq{$line}++; }
      else { 
         $freq{$line} = 1;
         $differentLines++;
      }
   }
   for ($i=0;$i<$differentLines;$i++) {
      $freqMostFrequent = 0;
      $mostFrequent = "";
      # select most frequent line
      foreach $line (keys %freq) {
         if (defined($freq{$line}) and 
             $freq{$line} > $freqMostFrequent) {
            $freqMostFrequent = $freq{$line};
            $mostFrequent = $line;
         }
      }
      # print it and remove it
      print "$freq{$mostFrequent} $mostFrequent\n";
      undef($freq{$mostFrequent});
   }
}

Exercise 6.11*

# no extra error checking code was required

# added processing code for default processing
$defaultTextName = "";
if (@args == 0 and 
    ($command eq "read"  or $command eq "print" or 
     $command eq "count" or $command eq "delete" or 
     $command eq "cat"   or $command eq "chars" or
     $command eq "paste" or $command eq "tokenize" or
     $command eq "uniq")) {
   $args[0] = $defaultTextName; 
}
if (@args == 1 and
    ($command eq "cat"  or $command eq "chars" or 
     $command eq "grep" or $command eq "paste" or
     $command eq "tail" or $command eq "tokenize")) {
   $args[1] = $defaultTextName;
}
if (@args == 2 and
    ($command eq "replace" or $command eq "grep" or
     $command eq "paste" or $command eq "tail")) {
   $args[2] = $defaultTextName;
}
if (@args == 3 and
    ($command eq "replace")) {
   $args[3] = $defaultTextName;
}

Exercise 6.12*

# added error checking code for alias
$errorMsg[5] = "cannot make alias for an unknown command";
if ($command eq "alias" and @args != 2) { $errorNbr = 1; }
if ($command eq "alias" and @args == 2 and 
    not(defined($alias{$args[0]}))) { $errorNbr = 5; }

# initial alias table
$alias{"cat"} = "cat";
$alias{"grep"} = "grep";
$alias{"read"} = "read";
$alias{"tail"} = "tail";
$alias{"uniq"} = "uniq";
$alias{"alias"} = "alias";
$alias{"chars"} = "chars";
$alias{"count"} = "count";
$alias{"paste"} = "paste";
$alias{"print"} = "print";
$alias{"delete"} = "delete";
$alias{"replace"} = "replace";
$alias{"tokenize"} = "tokenize";

# converting alias to real command
$command = $alias{$command} if (defined($alias{$command}));
 
# added processing code for alias
elsif ($command eq "alias") { $alias{$args[1]} = $alias{$args[0]}; }


Previous | Home | Lecture Notes | Exercises | Next
Last update: March 22, 2000. erikt@uia.ua.ac.be