Home | 1 | 2 | 3

Statistical NLP: Exercise 2

This is the second of a series of exercises on statistical natural language processing. The exercise has been created by Erik Tjong Kim Sang, University of Antwerpen, Campus Drie Eiken, room J0.07, phone 03-8202793, e-mail erikt@uia.ua.ac.be

General

In this exercise, you will use a statistical context-free grammar for correcting text. This exercise has been based on a EACL tutorial of Mark Liberman and Yves Schabes. A reference to this work can be found at the bottom of this page. You do not need to have read it for being able to make the exercise.

The exercise contains six assignments:

A question is associated with each assignment:

Draw a parse tree for processing x*f[x] with the original grammar.
Give ten different expressions that can be generated by the original grammar.
What are the rules that you have added to the grammar?
What is the probability of the most probable analysis of f+x-f] by the new grammar?
Repeat the correction experiment with the statistical grammar until at least one error is made. Which expressions have been processed incorrectly? Show what characters are wrong in the incorrect expressions.
What characters did you replace in the unigram model? Can you motivate your choices?

References

Eugene Charniak 1997
Statistical Techniques for Natural Language Parsing, AI Magazine, volume 18, issue 4, page 33-44, 1997. [postscript]
Mark Liberman and Yves Schabes 1993
Statistical Methods in Natural Language Processing, Handout tutorial EACL conference Utrecht, 1993.
Erik Tjong Kim Sang 1997
Handout course Språkstatistik, Uppsala University, 1997.

Last update: November 23, 2003. erikt@uia.ua.ac.be