previous main page

Dokumenthantering VT97:11

These are the large assignments that you can make to fulfill the requirements of this course.


1. Developing an X-windows Corpus Access Tool

In this assignment you will develop a corpus access tool with an X-windows interface. As a base of the tool you can use the TSSA program from Bengt Dahlqvist. Developing the complete tool is a task which is too large for this course. Therefore the assignment has been split in different parts:

  1. Designing a modular corpora access tool
    This task concentrates on the design of the access functions of the tool and the communication protocol between the user, the main part of the tool and the different modules.
  2. Writing an X-windows interface in Perl
    This task concentrates on the implementation of the interface of the tool. It can best be combined with the first task and be done by two persons which work on both tasks.
  3. Developing modules for the corpus tool
    Different persons can develop different Perl modules for the corpus tool which should be easy to integrate in the main part. Possible tasks for modules are text segmentation, concordance building, text conversion (to an indexed file, for example) and searching.


2. Converting the STP Utbildingsplan to HTML

The STP education plan consist of a Word file maintained by Lars Borin. There are plans to put this file on the web in HTML format. Your task here is to find out what methods are available for converting Word or RTF to HTML (Word itself or other conversion programs). Try to test these programs and evaluate the results. If the programs do not handle particular features then you can try to write a Perl program which covers them. The goal is to get some working system which can be used for converting the future education plans to HTML without too much effort.


3. Writing a Alignment Correction Support Tool

The alignment software used in our department is not perfect and alignment results have to be checked and corrected. Write a Perl support tool for this. The corpora used in our department are encoded in SGML but you can work on an intermediate format so that you do not have to worry about parsing SGML.


4. Your own suggestion

If you have a good suggestion for a reasonable final project for this course you can also propose to do it. The project should take approximately two and a half weeks full time


Last update: May 20, 1997. erikt@stp.ling.uu.se