Hints practical exercise 2
These are the hints for
assignment 2.3.
Not all these hints are necessary for successfully completing the
assignment.
- Try to modify the counting Perl part in such a way that the
$nbrOfChars variable will only be increased if the current
character is one of the problematic characters.
- The third hint of assignment 2.6 deals with working with
sets of characters.
This is the hint for
assignment 2.4.
It is possible to complete the assignment without using this hint.
- In this case a unigram model is nothing else than a character
frequency list.
You already created a frequency list for words in
assignment 1.1.
You can use the same commands again if you manage to put every
character on a different line.
Commands for doing this can be found in the
count script.
This is the hint for
assignment 2.5.
It is possible to complete the assignment without using this hint.
- A unigram corrector outputs the most frequent character
for a set of characters which could have been exchanged.
- You can use the messUp script as unigram corrector.
You only have to remove code from it.
- An alternative unigram corrector can be made by using
the Unix command tr.
These are the hints for
assignment 2.6.
Not all these hints are necessary for successfully completing the
assignment.
- In this case a bigram model is nothing else than a count of
the character bigrams in the text.
You already created a frequency list for word bigrams in
assignment 1.2.
Apply the same method used in that assignment for obtaining a file
with character bigrams from the file containing character unigrams
from assignment 2.4.
- You can use the unigram corrector from the previous assignment
as a start for a bigram corrector. Add code to the script which
stores the previous character in a variable. After this you should
make printing the characters that were messed up dependent on the
previous character by adding some if-statements.
- Many extra if-statements may be necessary. You can use the
following structure to make your script more readable:
# if $prevChar is in the set {q,w,e,r,t,y} then print 'a'
if ( $prevChar =~ /[qwerty]/ ) { printf 'a'; }
# else print 'e'
else { printf 'e'; }
Last update: November 26, 1996.
erikt@stp.ling.uu.se