This is one of the suggested topics for the course Language Technology Project 2005.
A named entity recognizer is a system which identifies names in texts and classifies them according to broad categories like person, organization and location. For some applications, like question answering, such a broad classification is insufficient and a more fine-grained categorization is required.
Develop and build a fine-grained named entity recognizer for a language other than English (preferably for Dutch) which identifies names in text and assigns to them one or more specific categories
The suggested corpus for extracting information is the free online Wikipedia encyclopedia. The proposed implementation method for the named entity recognition system is table lookup in combination with hand-crafted or learned patterns.
The named entity recognition system will contain several modules which will be realized by different people in the project groups:
The system will be evaluated with a text to be supplied by the teachers in the final week of the project.