These are the final exercises for the course Dokumenthantering VT98. There are 6 exercises and you have to choose one of them.
If you choose the final exercise then discuss the topic with your teacher before you start to work. Write a report about the exercise you have chosen. The report should fulfill the same requirements as the report for the first lab. The deadline for handing in this report is Sunday March 29, 1998.
The STP education plan consist of a Word file maintained by Lars Borin. The educational plan has been converted to HTML by Hannes Skirgård. He has used the Word program and manual post-processing for converting the file. The post-processing phase makes it difficult to generate a new web version of this document, which changes a couple of times per year. A study by Henrik Wache has categorized the problems with the Word-to-HTML conversion process. The task in this assignment is to continue Wache's work and provide a working method for the conversion process in which manual document processing has been eliminated. This working method will consist of post-processing software written in Perl and recommendations to the authors of the document for markup they should or should not use. Alternatively a separate RTF-to-HTML conversion program could be developed but that would probably take too much time.
Design and implement a lossless compression method in Perl which can compress Swedish texts with ISO 8859-1 characters with a reasonable compression rate. Aim at a compression rate of at least 1.14 (87.5% remaining file size after compression). You may also implement an existing compression technique but in that case you have give a good description of the compression method in your report. You can test your program with the files of the Press65 corpus: /corpora/Press65/UnixAscii/
The program CATCH is a Perl program which is being used for developing Computer-Assisted Language Learning lessons for World Wide Web. The program performs document format conversion: it changes an SGML CALL specification to HTML. The users of the program have come forward with several suggestions for extended functionality. The task in this assignment is to examine the feasibility of these suggestions and implement them if possible. You may also suggest additional program extensions and implement these.
The alignment software used in our department is not perfect and alignment results have to be checked and corrected. You have performed this manual checking and correction process in the course Språkdatabaser VT97. The correction process is a tedious job and it is easy to make errors. Therefore it would be nice if there would exist a tool for supporting this. The task in this assignment is to write such a tool. The first version of the tool will probably contain a textual interface; adding a graphical interface is optional.
Most corpus processing tasks in our department are currently being performed on the Unix machines. However for some tasks it is still necessary to fall back to the pc's. These tasks involve text segmentation and sorting for which on our pc's the program TSSA of Bengt Dahlqvist can be used. The functionality of this program can be simulated with combinations of standard Unix commands but doing that requires extensive Unix programming knowledge. The task in this assignment is to write a Perl tool which will simulate some TSSA tasks and thus provide an opportunity for non-programmers to perform text segmentation and sorting on the Unix systems. The first version of the tool will probably contain a textual interface; adding a graphical interface is optional.
If you have a good suggestion for a reasonable final project for this course you discuss it with your teacher. The project should take approximately two weeks full time
We are currently using a ksh script for searching web pages. A disadvantage of this script is that it runs several processes parallelly and thus causes an unnecessary extra load of the system. The task in this assignment is to write a Perl cgi script that can replace this ksh script. It will read web files, divide them in smaller parts (paragraphs or list items) and return the parts that contain the search string. The result should be specified as an HTML file. Links to local files in the result should work. The searched files should be specified in the script. You may test the script with the people search at the STP home page.
Write a Perl program that converts arbitrary Perl source code to a pretty-printed version of the code which can be processed by LaTeX. Your program should be able to recognize and process Perl function names, user function names and variable names.