Språkstatistik HT95:03
Class: 03
Date: 950912
Topic: Exercises 1
Homework exercises 3
We will use the second paragraph of the paper of Church of 1988 with
the title "A Stochastic Parts Program and Noun Phrase Parser for
Unrestricted Text".
This paragraph contains 206 words.
-
Use the paragraph for computing the following probabilities:
P("the"),
P("noun"),
P("part of"),
P("speech"),
P("speech"|"the"),
P("noun"|"the") and
P("speech"|"part of").
Furthermore use two different ways for computing P("the"|"NOUN") in
which NOUN is an arbitray noun and "the" is the work preceeding this
noun.
-
-
Find the trigrams "i? " in the the paragraph.
? is an arbitrary character.
-
Can you use these trigrams for computing the probability of characters
appearing between 'i' and ' '?
-
Suppose that the text was typed in by someone who often mistypes the
's' as an 'd' and vice versa.
A statistical corrector program would correct these mistakes by
replacing all occurrences of 's' and 'd' by the most frequent ones.
Would this introduce errors in these trigrams?
And what score would the corrector achieve for these trigrams?
Last update: December 13, 1995
erikt@strindberg.ling.uu.se