Dokumenthantering VT98:12
These are the exercises for the fourth lab session of the course
Dokumenthantering VT98.
There are
2
exercises this week and
1
of them is obligatory.
The obligatory exercise has been marked with a *.
Write a report about the obligatory exercise.
The report should fulfill the same requirements as the report for the
first lab.
The deadline for handing in the report for this week's exercises is
Wednesday February 25, 1998.
-
Our AIX workstations have three compression programs available:
pack (creates .z files),
compress (.Z) and
gzip (.gz).
Browse through the manuals (man 1 pack for pack) to find out what
algorithms these compression methods are using.
Then compress the file
/corpora/Press65/UnixAscii/p65.001
with each compression method and compute the compression factors.
Which compression method is the best?
Which one is the fastest?
Note that gzip can be run in several variants with different compression
factors (see manual for more information).
- *
Design and implement a lossless compression method in Perl which can
compress Swedish texts with ISO 8859-1 characters.
Apply your program to the file of exercise 1.
It should generate a smaller file which can be decompressed to exactly
the same file.
What compression rate did you obtain?
Your compression method does not need to generate smaller versions for
every possible file.
Tip: use a dictionary method with a small static lexicon.
[answer example]
-
Ian H. Witten & Alistair Moffat & Timothy C. Bell.
Managing Gigabytes, Compression and Indexing Documents and Images,
Van Nostrand Reinhold, 1994.
-
A book about document processing.
Last update: February 26, 1998.
erikt@stp.ling.uu.se