This is an overview of the course in Statistical approaches to Natural Language Processing (in Swedish: Språkstatistik) that was taught in the Autumn term of 1997 at the department of Linguistics at the University of Uppsala. There is a page for this course on World Wide Web at the address:


The course was taught by Erik Tjong Kim Sang. He can be reached in room B382, HSC building, Kyrkogårdgatan 10, Uppsala, telephone (018) 4711175, e-mail erikt@stp.ling.uu.se

You can take a look at summaries of the results of the midcourse evaluation and the final evaluation.


       date time  room  subject
 1. må 0908 14-16 A144  Lecture: ch. 3, 4
 2. on 0910 14-16 B159  Lecture: ch. 5, 6
 3. to 0911 14-16 B162  Exercise class 1
 4. må 0915 14-16 A144  Lecture: ch. 8, 9
 5. on 0917 14-16 B159  Lecture: ch. 10, 11
 6. to 0918 14-16 B162  Exercise class 2
 7. må 0922 14-16 A144  Lecture: ch. 13, 14
 8. on 0924 14-16 B125  Lecture: ch. 15
 9. to 0925 14-16 B162  Exercise class 3
10. må 0929 14-16 A144  Lecture: ch. 16, 17
11. on 1001 14-16 A159  Lecture: ch. 18, 26 (not 26.6) (mid-course evaluation)
12. to 1002 14-16 A162  Exercise class 4
13. må 1006 14-16 A144  Lecture: Basic Corpus Processing 1
14. on 1008 14-16 B159  Lecture: Basic Corpus Processing 2
15. to 1009 13-17 H327  Practical exercise session 1 (deadline 971021)
16. må 1013 14-16 A144  Lecture: Part of Speech Tagging 1
17. on 1015 14-16 B159  Lecture: Part of Speech Tagging 2
18. to 1016 13-17 H327  Practical exercise session 2 (deadline 971028)
19. må 1020 14-16 A144  Lecture: Clustering
20. on 1022 14-16 B159  Lecture: Statistical grammars
21. to 1023 13-17 H327  Practical exercise session 3 (deadline 971111)
22. må 1027 14-16 A144  Lecture: Aligning of Parallel Texts 1
23. on 1029 14-16 B159  Lecture: Aligning of Parallel Texts 2 (final evaluation)
24. to 1030 13-17 H327  Practical exercise session 4
    må 1103 09-13 POLA  Test
25. må 1110 10-12 H327  Extra practical exercise session

Course information

The course will consist of two parts. The first part contains eight lectures about general statistics and four exercise sessions. The second part consists of eight lectures about statistics applied to natural language processing and four practical exercise sessions. There are no obligatory sessions. However, the students are advised to visit the eight exercise sessions because the assignments they will receive during those sessions will determine their final grade.

Students will receive a grade between 0 and 10 for four home work assignments, three practical assignments and the final written test. In order to receive the pass mark for the course the student will have to get an average grade of 6.0 or higher for the home work assignments, an average grade of 6.0 or higher for the practical assignments and a grade of 6.0 or higher for the final test. A student will get the high pass grade when he/she qualifies for the pass mark and achieves a total average for the three grades of 8.0 or higher while the test grade is 8.0 or higher.

The practical sessions will be split in two groups. The students may choose themselves if they want to be in the first or in the second group provided that the group is not already complete. Students are allowed to work together on the homework exercises and the practical exercises. However, a group that hands in the same exercise answers or the same lab report cannot be larger than two persons.

There is a practice test with answers available. The file is in postscript format.


The material that will be dealt with in this course is presented in the following list.

Main Literature

The first part of the course will use the book Freedman et.al. 91. It would be a good idea for the students of the course to try to obtain this book (should be available at Studentbokhandel). The second half of the course will be based on different papers of which I have listed the most important here. The papers are often difficult to obtain and therefore reading them is not obligatory. Students that want to read an unavailable paper of this list can contact me.

Written notes of the last eight lectures will be supplied by the teacher to the students.

Additional Literature

The literature in this list will not be used in the course. It has been listed here because it might contain interesting additional material for the students.

