These are the large assignments that you can make to fulfill the requirements of this course.
In this assignment you will develop a corpus access tool with an X-windows interface. As a base of the tool you can use the TSSA program from Bengt Dahlqvist. Developing the complete tool is a task which is too large for this course. Therefore the assignment has been split in different parts:
The STP education plan consist of a Word file maintained by Lars Borin. There are plans to put this file on the web in HTML format. Your task here is to find out what methods are available for converting Word or RTF to HTML (Word itself or other conversion programs). Try to test these programs and evaluate the results. If the programs do not handle particular features then you can try to write a Perl program which covers them. The goal is to get some working system which can be used for converting the future education plans to HTML without too much effort.
The alignment software used in our department is not perfect and alignment results have to be checked and corrected. Write a Perl support tool for this. The corpora used in our department are encoded in SGML but you can work on an intermediate format so that you do not have to worry about parsing SGML.
If you have a good suggestion for a reasonable final project for this course you can also propose to do it. The project should take approximately two and a half weeks full time