Class: 13 Date: 951010 Topic: Practical Exercise 2
In this practical exercise you will apply the clustering algorithms mentioned in Stephen Finch's thesis for dividing characters into different classes. You will have to make a little report about this practical exercise and send it by e-mail to erikt@strindberg.ling.uu.se You can send me your report until Monday November 6. Reports which are sent in after that date will receive a half point penalty per extra day.
In this exercise we will use is the book Alice's Adventures in Wonderland by Lewis Carroll (obtained from Project Runeberg). You can find all this text and programs I describe here in the directory /usr/users/staff/erikt/P/ss95/pex2
You can print this exercise by choosing the print option from your web browser while viewing this text.
./makeClusterData < alice30.ch2|./makeReadable
You might have to adjust your window size to be able to view the complete table. Try to read the makeClusterData program and check if you understand the commands used in there.
cluster -e = Use Euclidean metric
cluster -m = Use Manhatten metric
cluster -n = Normalize representations
cluster -s = Use Spearman Rank metric
You can use any combination of these four options. For example if you want to use the Manhatten metric combined with representation normalization you would want to use the commands:
./makeClusterData < alice30.txt|./cluster -n -m
If you do not specify any options, the Euclidean metric without representation normalization will be used.
Send your reports to erikt@strindberg.ling.uu.se until Monday November 6 6. If you have any questions please ask me.