Hints practical exercise 2

Hints assignment 2.3

These are the hints for assignment 2.3. Not all these hints are necessary for successfully completing the assignment.

  1. Try to modify the counting Perl part in such a way that the $nbrOfChars variable will only be increased if the current character is one of the problematic characters.

  2. The third hint of assignment 2.6 deals with working with sets of characters.

Hint assignment 2.4

This is the hint for assignment 2.4. It is possible to complete the assignment without using this hint.

  1. In this case a unigram model is nothing else than a character frequency list. You already created a frequency list for words in assignment 1.1. You can use the same commands again if you manage to put every character on a different line. Commands for doing this can be found in the count script.

Hint assignment 2.5

This is the hint for assignment 2.5. It is possible to complete the assignment without using this hint.

  1. A unigram corrector outputs the most frequent character for a set of characters which could have been exchanged.

  2. You can use the messUp script as unigram corrector. You only have to remove code from it.

  3. An alternative unigram corrector can be made by using the Unix command tr.

Hints assignment 2.6

These are the hints for assignment 2.6. Not all these hints are necessary for successfully completing the assignment.

  1. In this case a bigram model is nothing else than a count of the character bigrams in the text. You already created a frequency list for word bigrams in assignment 1.2. Apply the same method used in that assignment for obtaining a file with character bigrams from the file containing character unigrams from assignment 2.4.

  2. You can use the unigram corrector from the previous assignment as a start for a bigram corrector. Add code to the script which stores the previous character in a variable. After this you should make printing the characters that were messed up dependent on the previous character by adding some if-statements.

  3. Many extra if-statements may be necessary. You can use the following structure to make your script more readable:

    # if $prevChar is in the set {q,w,e,r,t,y} then print 'a'
    if ( $prevChar =~ /[qwerty]/ ) { printf 'a'; }
    # else print 'e'
    else { printf 'e'; }


Last update: November 26, 1996. erikt@stp.ling.uu.se