Previous | Home | Lecture Notes

 

Perl Exercises (Final)


These exercises are part of a Perl course taught at CNTS - Language Technology Group at the University of Antwerp.

If you are a course participant and you want to submit your answers, please send the solutions to one these exercises to erikt@uia.ua.ac.be before or on Monday May 8, 2000. Note that these exercises are require more work than the previous onces and therefore you only have to make one of them. Please submit the title of the exercise that you want to make to erikt@uia.ua.ac.be as soon as possible but not later than Wednesday April 12, 2000. When you submit your results, please include the Perl code you have written, the result of at least one test and the answers to the questions mentioned in the exercise, if there are any. Your program may not contain shell escapes (system or related) to programs which perform the exercise for you.

Final Exercise 1: Qubic

You only have to hand in a solution for one of these five exercises. The programs you construct for these exercises must contain the line use strict at the top.

Qubic is a three-dimensional version of tic-tac-toe. It is played on a field of four planes, each with four columns and four rows. The goal of this game is to occupy four squares in a row in any direction. Write a Perl program that is able to play Qubic against a human opponent. The program must be able to detect that one of the players has won and it must use a reasonable strategy for choosing its moves.

Final Exercise 2: Text generation

Create a program that generates text. The program uses a training file, for example the file oliver.txt from the previous session, and stores all trigrams of tokens. The generation process consists of randomly choosing one of the trigrams that starts a sentence and selecting the next token by randomly choosing a token that can follow the two final tokens that were generated. Present a sample of the generated text in your results. Is the text reasonable? If you want to improve your text, you can try working with a larger training text or use a larger model than a trigram model (4-gram or 5-gram) and choose tokens based on the previous 3 or 4 tokens.

Final Exercise 3: Spelling correction

Write a simple spelling correction program. It should read a text, either from the keyboard or from a file, and output a list of words which are not in a predefined dictionary. Your program should do something reasonable with capital characters: normally words may contain any number of capital characters but some words, like names, must start with a capital character. The dictionary should be read from a file which contains one word per line. Here are some example word lists: Dutch (178429 words, no accents), English (26879) and French (138257, ISO 8859-1 accents). You are allowed to add words to the list and, alternatively, to work with your own toy word list.

Final Exercise 4: HTML to text conversion

Write a program that converts an HTML file to a text file which is a reasonable representation of the text in the HTML file. Your program should remove the HTML tags, convert some of the codes for characters with accents like é to the correct character and perform some basic formatting actions on the resulting text to fit it in a window of 80 characters wide. The output of your program should contain all the text present in the original file, unless the text is stored in an image, but it should not contain HTML formatting code. When you are in doubt as to how the output should look like, you can check what your favorite web browser produces when it saves an HTML file as text.

Final Exercise 5: Database interface

Create a database query system with a similar interface like the texttool program. The program should be able to process tables which are stored in texts that look like:

   snbr  sname  city
   s1    Smith  London
   s2    Jones  Paris
   s3    Blake  Paris
   s4    Clark  London
   s5    Adams  Athens

They contain one row per line with the first row specifying the names of the columns. Internally the columns should be separated by a special character which you may choose yourself (other table storage formats are allowed). Your program should contain three commands: one for reading tables (either from the keyboard or from a file), one for printing tables in some pretty way and one for manipulating tables. The table manipulation command should be called select and it has the following structure:

   a = select b , c from d where e = 'f' , g = 'h'

This means select from table d the rows where attribute (=column) e has value f and attribute g has value h and from these rows, put columns b and c in table a. Several parts of this command are optional. First, a = can be left away and in that case the result should be printed on screen. Second, b,c can be replaced by a * which means that all the columns should be in the result. And third, the part starting at where can be left away which means that all the rows should be used. Here is an example which uses the suppliers table:

   > readfile sup.db supplier
   > select sname from supplier
   +-------+
   | sname |
   +-------+
   | Smith |
   | Jones |
   | Blake |
   | Clark |
   | Adams |
   +-------+
   > eng = select * from supplier where city = 'London'
   > print eng
   +------+-------+--------+
   | snbr | sname | city   |
   +------+-------+--------+
   | s1   | Smith | London |
   | s4   | Clark | London |
   +------+-------+--------+


Previous | Home | Lecture Notes
Last update: April 10, 2000. erikt@uia.ua.ac.be