Home | 1 | 2 | 3

 

Statistical NLP: Exercise 2

This is the second of a series of exercises on statistical natural language processing. The exercise has been created by Erik Tjong Kim Sang, University of Antwerpen, Campus Drie Eiken, room J0.07, phone 03-8202793, e-mail erikt@uia.ua.ac.be


General

In this exercise, you will use a statistical context-free grammar for correcting text. This exercise has been based on a EACL tutorial of Mark Liberman and Yves Schabes. A reference to this work can be found at the bottom of this page. You do not need to have read it for being able to make the exercise.

The exercise contains six assignments:

  1. Parsing with a statistical grammar
  2. Generating with a statistical grammar
  3. Modifying the grammar
  4. Parsing with the modified grammar
  5. Correction with a statistical grammar
  6. Correction with a unigram model

A question is associated with each assignment:

  1. Draw a parse tree for processing x*f[x] with the original grammar.
  2. Give ten different expressions that can be generated by the original grammar.
  3. What are the rules that you have added to the grammar?
  4. What is the probability of the most probable analysis of f+x-f] by the new grammar?
  5. Repeat the correction experiment with the statistical grammar until at least one error is made. Which expressions have been processed incorrectly? Show what characters are wrong in the incorrect expressions.
  6. What characters did you replace in the unigram model? Can you motivate your choices?


References


Last update: November 23, 2003. erikt@uia.ua.ac.be