NP Chunking

Dividing sentences into non-overlapping phrases is called text chunking. NP chunking deals with a part of this task: it involves recognizing the chunks that consist of noun phrases (NPs). A standard data set for this task was put forward by Lance Ramshaw and Mitch Marcus in their 1995 WVLC paper [RM95]. The data has been divided in two parts: training data and test data. The goal is to make a machine learning algorithm learn the training data and evaluate its performance by testing it with the testing data.

The performance of the algorithm is measured with two scores: precision and recall. Precision measures how many NPs found by the algorithm are correct and the recall rate contains the percentage of NPs defined in the corpus that were found by the chunking program. The two rates can be combined in one measure: the F rate in which F = 2*precision*recall / (recall+precision) [Rij79].

The standard data set put forward by Ramshaw and Marcus consists of sections 15-18 of the Wall Street Journal corpus as training material and section 20 of that corpus as test material. Here are the published results for this data set:

              +-----------+-----------++-----------++
              | precision |   recall  ||     F     ||
   +----------+-----------+-----------++-----------++
   | [KM01]   |   94.15%  |   94.29%  ||   94.22   ||
   | [TDD+00] |   94.18%  |   93.55%  ||   93.86   ||
   | [TKS00]  |   93.63%  |   92.89%  ||   93.26   ||
   | [MPRZ99] |   92.4%   |   93.1%   ||   92.8    ||
   | [XTAG99] |   91.8%   |   93.0%   ||   92.4    ||
   | [TV99]   |   92.50%  |   92.25%  ||   92.37   || 
   | [RM95]   |   91.80%  |   92.27%  ||   92.03   || 
   | [ADK99]  |   91.6%   |   91.6%   ||   91.6    || 
   | [Vee98]  |   89.0%   |   94.3%   ||   91.6    || 
   | [CP98]   |   90.7%   |   91.1%   ||   90.9    || 
   | [CP99]   |   89.0%   |   90.9%   ||   89.9    || 
   +----------+-----------+-----------++-----------++
   | baseline |   78.20%  |   81.87%  ||   79.99   ||
   +----------+-----------+-----------++-----------++

The results of [ADK99], [CP98] and [CP99] have been obtained without using lexical information, that is with part-of-speech tags only. The baseline results were produced by a system that assigned the most frequent chunk tag to each part-of-speech tag.

[RM95] has also reported work on a larger task: using sections 02-21 of the WSJ corpus as training material and section 00 for testing. Learning algorithms achieve a better performance than for the previous task because of the larger size of the training data. The published results for this data set are:

              +-----------+-----------++-----------++
              | precision |  recall   ||     F     ||
   +----------+-----------+-----------++-----------++
   | [KM01]   |   95.62%  |   95.93%  ||   95.77   ||
   | [TKS00]  |   95.04%  |   94.75%  ||   94.90   ||
   | [TV99]   |   93.71%  |   93.90%  ||   93.81   ||
   | [RM95]   |   93.1%   |   93.5%   ||   93.3    ||
   +----------+-----------+-----------++-----------++

Other languages: [KK99] have reported NP chunking results for Swedish. [SB99] have published results for German. [ZH98] have presented a model for analyzing Chinese.

Software and Data

Related information

References


Last update: April 13, 2005. erikt@uia.ua.ac.be