previous main page next

Dokumenthantering VT98:12

These are the exercises for the fourth lab session of the course Dokumenthantering VT98. There are 2 exercises this week and 1 of them is obligatory. The obligatory exercise has been marked with a *.

Write a report about the obligatory exercise. The report should fulfill the same requirements as the report for the first lab. The deadline for handing in the report for this week's exercises is Wednesday February 25, 1998.


Exercises Lab 4

  1. Our AIX workstations have three compression programs available: pack (creates .z files), compress (.Z) and gzip (.gz). Browse through the manuals (man 1 pack for pack) to find out what algorithms these compression methods are using. Then compress the file /corpora/Press65/UnixAscii/p65.001 with each compression method and compute the compression factors. Which compression method is the best? Which one is the fastest? Note that gzip can be run in several variants with different compression factors (see manual for more information).

  2. * Design and implement a lossless compression method in Perl which can compress Swedish texts with ISO 8859-1 characters. Apply your program to the file of exercise 1. It should generate a smaller file which can be decompressed to exactly the same file. What compression rate did you obtain? Your compression method does not need to generate smaller versions for every possible file.
    Tip: use a dictionary method with a small static lexicon. [answer example]


References Week 4

Ian H. Witten & Alistair Moffat & Timothy C. Bell. Managing Gigabytes, Compression and Indexing Documents and Images, Van Nostrand Reinhold, 1994.
A book about document processing.


Last update: February 26, 1998. erikt@stp.ling.uu.se