This page contains data, software and other information about base noun phrase recognition with machine learning techniques. We have applied different learning methods to one data set. Postprocessing their output with system combination techniques has yielded a better result than that of the best performing learning techniques. We are interested in including other learning systems in this experiment. If you use a baseNP recognition method that we have not used already then you can join this experiment by sending your results to erikt@uia.ua.ac.be
The data sets supplied here, have been extracted from the Penn Treebank-2. Although a large part of these baseNP data sets has been publically available since 1995, only organizations that have paid for the Penn Treebank should use them. If neither you nor your organization have a license for the Treebank corpus then you probably do not want to download these data sets.
The lines in the files contain three items: a word, its part-of-speech tag as generated by the Brill tagger and a chunk tag (IOB) extracted from the Treebank. Empty lines denote sentence boundaries. The files have been compressed with gzip.
The following results are available for experiment 1:
+--------------------+--------+--------+--------+--------+-------+ | EXPERIMENT 1 | tuning | tuning | test | test | test | | Method (site) | openb | closeb | openb | closeb | FB1 | +--------------------+--------+--------+--------+--------+-------+ | ALLiS (Tuebingen) | 98.37% | 98.37% | 97.87% | 98.08% | 92.15 | | C5 (Antwerp) | 97.51% | 97.71% | 97.05% | 97.76% | 89.97 | | IGTree (Antwerp) | 98.16% | 98.30% | 97.70% | 97.99% | 91.92 | | MaxEnt (Cambridge) | 98.54% | 98.47% | 97.94% | 98.24% | 92.60 | | MBL (Antwerp) | 98.52% | 98.47% | 98.04% | 98.20% | 92.82 | | MBSL (Bar-Ilan) | 97.78% | 97.90% | 97.27% | 97.66% | 90.71 | | SNoW (Illinois) | 98.49% | 98.20% | 97.78% | 97.68% | 91.87 | | TBL (UPenn) | 98.74% | 98.68% | 97.67% | 97.93% | 91.36 | +--------------------+--------+--------+--------+--------+-------+ | Best combination | - | - | 98.22% | 98.31% | 93.44 | +--------------------+--------+--------+--------+--------+-------+
Notes:
The following results are available for experiment 2:
+--------------------+--------+--------+--------+--------+-------+ | EXPERIMENT 2 | tuning | tuning | test | test | test | | Method (site) | openb | closeb | openb | closeb | FB1 | +--------------------+--------+--------+--------+--------+-------+ | ALLiS (Tuebingen) | 98.38% | 93.36% | 97.97% | 98.16% | 92.59 | | C5 (Antwerp) | 97.51% | 97.71% | 97.20% | 97.73% | 90.12 | | IGTree (Antwerp) | 98.16% | 98.30% | 97.78% | 98.02% | 91.96 | | MaxEnt (Cambridge) | 98.54% | 98.47% | 98.02% | 98.28% | 93.15 | | MBL (Antwerp) | 98.52% | 98.47% | 98.12% | 98.29% | 93.25 | | MBSL (Bar-Ilan) | 97.79% | 97.90% | 97.49% | 97.92% | 91.63 | | SNoW (Illinois) | 98.49% | 98.20% | 98.03% | 97.89% | 92.80 | | TBL (UPenn) | 98.74% | 98.68% | 97.89% | 98.03% | 92.03 | +--------------------+--------+--------+--------+--------+-------+ | System combination | | | 98.32 | 98.41 | 93.86 | +--------------------+--------+--------+--------+--------+-------+
Note:
More references to papers about baseNP recognition can be found on a separate NP chunking page.