BaseNP Combination Experiments

This page contains data, software and other information about base noun phrase recognition with machine learning techniques. We have applied different learning methods to one data set. Postprocessing their output with system combination techniques has yielded a better result than that of the best performing learning techniques. We are interested in including other learning systems in this experiment. If you use a baseNP recognition method that we have not used already then you can join this experiment by sending your results to erikt@uia.ua.ac.be

Data
Software
Results
Participants
Relevant papers

Data

The data sets supplied here, have been extracted from the Penn Treebank-2. Although a large part of these baseNP data sets has been publically available since 1995, only organizations that have paid for the Penn Treebank should use them. If neither you nor your organization have a license for the Treebank corpus then you probably do not want to download these data sets.

The lines in the files contain three items: a word, its part-of-speech tag as generated by the Brill tagger and a chunk tag (IOB) extracted from the Treebank. Empty lines denote sentence boundaries. The files have been compressed with gzip.

Experiment 1: train data and test data: These data sets are used for tuning the learning system parameters and for selecting the best combination method. The training data consists of 90% of the Ramshaw and Marcus 1995 training data (198597 lines). The test data consists of two parts: the remaining 10% of the RM95 training data (tuning, 22066 lines) and WSJ section 21 of the Penn Treebank (test, 41710 lines).
Experiment 2: train data and test data: This experiment consist of the small Ramshaw and Marcus 1995 data sets. However, 10% of the original train data has been moved to the test data. These 10% will be used by the combination algorithm as training data. The data sets should be processed with the same parameters as you have used in experiment 1. The best method found for experiment 1 will be used for combining the results.

Software

changeRepr
Perl script for converting chunk files to different output representations. The input files should contain lines with three fields separated by spaces: the current word, its POS tag and its IOB tag.
pairwise, vote
Perl scripts for combining results by performing voting by majority, accuracy, precision, precision-recall and pairs. The data files should contain lines with classifier results separated by spaces. The final item on each line is regarded as the correct result: this one will not be used in the combination process.
TiMBL
The momory-based learner which we have used for the stacked classifier experiments. We have used version 2.0 of this system; the current version is higher.

Results

The following results are available for experiment 1:

+--------------------+--------+--------+--------+--------+-------+
| EXPERIMENT 1       | tuning | tuning |  test  |  test  |  test |
| Method (site)      | openb  | closeb | openb  | closeb |  FB1  |
+--------------------+--------+--------+--------+--------+-------+
| ALLiS (Tuebingen)  | 98.37% | 98.37% | 97.87% | 98.08% | 92.15 |
| C5 (Antwerp)       | 97.51% | 97.71% | 97.05% | 97.76% | 89.97 |
| IGTree (Antwerp)   | 98.16% | 98.30% | 97.70% | 97.99% | 91.92 |
| MaxEnt (Cambridge) | 98.54% | 98.47% | 97.94% | 98.24% | 92.60 |
| MBL (Antwerp)      | 98.52% | 98.47% | 98.04% | 98.20% | 92.82 |
| MBSL (Bar-Ilan)    | 97.78% | 97.90% | 97.27% | 97.66% | 90.71 |
| SNoW (Illinois)    | 98.49% | 98.20% | 97.78% | 97.68% | 91.87 |
| TBL (UPenn)        | 98.74% | 98.68% | 97.67% | 97.93% | 91.36 |
+--------------------+--------+--------+--------+--------+-------+
| Best combination   |   -    |   -    | 98.22% | 98.31% | 93.44 |
+--------------------+--------+--------+--------+--------+-------+

Notes:

In the TBL data of UPenn (RM95) the tuning data results are unreliable since the TBL learner used this data as training material as well.
The results for MBL, Maccent and IGTree have been obtained by combining five output representations.
The best combination was obtained by applying majority voting to the output of the best five results.
The results of the individual classifiers are available in a tar file.

The following results are available for experiment 2:

+--------------------+--------+--------+--------+--------+-------+
| EXPERIMENT 2       | tuning | tuning |  test  |  test  |  test |
| Method (site)      | openb  | closeb | openb  | closeb |  FB1  |
+--------------------+--------+--------+--------+--------+-------+
| ALLiS (Tuebingen)  | 98.38% | 93.36% | 97.97% | 98.16% | 92.59 |
| C5 (Antwerp)       | 97.51% | 97.71% | 97.20% | 97.73% | 90.12 |
| IGTree (Antwerp)   | 98.16% | 98.30% | 97.78% | 98.02% | 91.96 |
| MaxEnt (Cambridge) | 98.54% | 98.47% | 98.02% | 98.28% | 93.15 |
| MBL (Antwerp)      | 98.52% | 98.47% | 98.12% | 98.29% | 93.25 |
| MBSL (Bar-Ilan)    | 97.79% | 97.90% | 97.49% | 97.92% | 91.63 |
| SNoW (Illinois)    | 98.49% | 98.20% | 98.03% | 97.89% | 92.80 |
| TBL (UPenn)        | 98.74% | 98.68% | 97.89% | 98.03% | 92.03 |
+--------------------+--------+--------+--------+--------+-------+
| System combination |        |        | 98.32  | 98.41  | 93.86 |
+--------------------+--------+--------+--------+--------+-------+

Note:

The combination results have been obtained by applying majority voting to the output of the best five results.
The results of the individual classifiers are available in a tar file.

Participants

Walter Daelemans, daelem@uia.ua.ac.be (C5.0)
Hervé Déjean, dejean@sfs.nphil.uni-tuebingen.de (ALLiS)
Rob Koeling, koeling@cam.sri.com (MaxEnt)
Yuval Krymolowski, yuvalk@macs.biu.ac.il (MBSL)
Vasin Punyakanok, punyakan@cs.uiuc.edu (SNoW)
Dan Roth, danr@cs.uiuc.edu (SNoW)
Erik Tjong Kim Sang, erikt@uia.ua.ac.be (IGTree, MBL)

Relevant papers

More references to papers about baseNP recognition can be found on a separate NP chunking page.

Hans van Halteren, Jakub Zavrel, and Walter Daelemans, Improving data driven wordclass tagging by system combination. In Proceedings of COLING-ACL'98, Association for Computational Linguistics, 1998.
Erik F. Tjong Kim Sang, Walter Daelemans, Hervé Déjean, Rob Koeling, Yuval Krymolowski, Vasin Punyakanok and Dan Roth, Applying System Combination to Base Noun Phrase Identification, In Proceedings of Coling 2000, 2000.

Last update: August 17, 2000. erikt@uia.ua.ac.be