Home |
1 |
2 |
3
Statistical NLP: Exercise 2
This is the second of a series of exercises on statistical natural
language processing.
The exercise has been created by Erik Tjong Kim Sang,
University of Antwerpen, Campus Drie Eiken, room J0.07, phone 03-8202793,
e-mail erikt@uia.ua.ac.be
General
In this exercise, you will use a statistical context-free grammar for
correcting text.
This exercise has been based on a EACL tutorial of Mark Liberman and
Yves Schabes.
A reference to this work can be found at the bottom of this page.
You do not need to have read it for being able to make the
exercise.
The exercise contains six assignments:
- Parsing with a statistical grammar
- Generating with a statistical grammar
- Modifying the grammar
- Parsing with the modified grammar
- Correction with a statistical grammar
- Correction with a unigram model
A question is associated with each assignment:
- Draw a parse tree for processing x*f[x] with the original grammar.
- Give ten different expressions that can be
generated by the original grammar.
- What are the rules that you have added to the grammar?
- What is the probability of the most probable analysis of
f+x-f] by the new grammar?
- Repeat the correction experiment with the statistical grammar
until at least one error is made.
Which expressions have been processed incorrectly?
Show what characters are wrong in the incorrect expressions.
- What characters did you replace in the unigram model?
Can you motivate your choices?
References
- Eugene Charniak 1997
Statistical Techniques for Natural Language Parsing,
AI Magazine, volume 18, issue 4, page 33-44, 1997.
[postscript]
- Mark Liberman and Yves Schabes 1993
Statistical Methods in Natural Language Processing,
Handout tutorial EACL conference Utrecht, 1993.
- Erik Tjong Kim Sang 1997
Handout course Språkstatistik, Uppsala University, 1997.
Last update: November 23, 2003.
erikt@uia.ua.ac.be