previous main page next

Dokumenthantering VT97:07

These are the exercises and references for the seventh class of the course Dokumenthanteringen


Exercises

The results of the exercises marked with * have to be handed in. The exercises marked with ? are optional obligatory exercises: you only have to hand in the results of one of them

  1. * Change your sentence / word tokenizer from class four to a program that generates an inverted file for bare case insensitive words with pointers to sentence numbers or intervals between sentence numbers. Bare words means words without punctuation marks or other non-word characters.

  2. * Apply the program of the first exercise at the text "Om Uppsala universitet" which can be found in the file

    /usr/users/staff/erikt/html/dh97/uppsala.txt

    Estimate the size of this text if you store it by using a minimal number of bits (five or six per character and number). Did the size decrease in comparison with the original file?


References


Last update: April 16, 1997. erikt@stp.ling.uu.se