Home |
1 |
2 |
3
Statistical NLP: Exercise 1
This is the first of a series of exercises on statistical natural
language processing.
In this exercise you learn to compute basic statistical features
of texts.
For this purpose you can use an
online search program
which processes the novel Dracula by Bram Stoker.
This exercise has been created by Erik Tjong Kim Sang,
University of Antwerp, Campus Drie Eiken, room J0.07, phone 03-8202793,
e-mail erikt@uia.ua.ac.be
Assignments
Use the online search program
for making the following assignments:
- Find 5 words with a frequency of 1000 or more.
- Find 3 word pairs with a frequency of 200 or more.
- Find 1 word trigram with a frequency of 10 or more.
- For one word w2 and three different words w1, compute the
conditional probability P(w2|previous word is w1).
Choose the words in such a way that the probability is
larger than zero for at least two pairs.
- We choose an arbitrary word from the corpus.
What is the probability that the word is "godalming"?
And what is the probability that the word is "godalming"
given that the previous word is "lord"?
Last update: November 23, 2003.
erikt@uia.ua.ac.be