previous main page next

Dokumenthantering VT98:15

This is the exercise for the fifth lab session of the course Dokumenthantering VT98. There is 1 obligatory exercise this week.

Write a report about the obligatory exercise. The report should fulfill the same requirements as the report for the first lab. The deadline for handing in the report for this week's exercises is Wednesday March 4, 1998.


Exercises Lab 5

  1. * Convert the ispell word list for Swedish to a lexicon containing 30 chars per word, 4 frequency bytes and 4 inverted file bytes. The word list can be found in the file:

    /home/staff/erikt/P/st97/lrtlab/words.swedish

    The frequency bytes and the inverted file bytes may be filled with anything you want.

    Now encode this lexicon by using front coding with word groups of size four but keep the eight extra bytes in the lexicon entries. Compare the size of the resulting lexicon with the size of the intermediate lexicon, that is the one with extra bytes but without front coding.

    Tip: perform your tests with a fraction of the word lists. [answer example]


References Week 5

Ian H. Witten & Alistair Moffat & Timothy C. Bell. Managing Gigabytes, Compression and Indexing Documents and Images, Van Nostrand Reinhold, 1994.
A book about document processing.


Last update: March 05, 1998. erikt@stp.ling.uu.se