main page next

Språkstatistik HT97:03

Exercise class 1

Deadline

These exercises are supposed to be handed in on Monday September 15.

Data

Case 1: The file /corpora/Press65/UnixAscii/p65.001 contains a Swedish text. The lengths of the words in this file have been measured and the result is the following:

    Length   Part
     1-2      15%
      3       25%
     4-5      22%
     6-8      20%
     9-17     18%

Case 2: The results of a test are distributed according to the normal curve with an average of 7.8 and a standard deviation of 0.8.

Exercises

  1. Draw a histogram for the length of the words in case 1.

  2. Compute the average length of the words in case 1.

  3. Estimate the median length of the words in case 1.

  4. For the data in case 1: estimate how many words are longer than four characters but shorter than eleven characters.

  5. Compute the standard deviation for the length of the words in case 1.

  6. Would the normal curve be a good description for the length of the words in case 1? Motivate your answer.

  7. In case 2: What SU value corresponds with a score of 9.4? And what score corresponds with an SU value of -2?

  8. In case 2: What is the probability that someone will get a score between -0.5 SU and +0.5 SU for this test?

  9. In case 2 What is the probability that someone will get a score between 7.0 and 9.0?

  10. Compute the 99th percentile for case 2.

Each exercise is worth 1 point.

Related Exercises

If you want to make extra exercises you can try the following optional exercises from the second edition of the book:

Reading Histograms: 3A1, 3A2, 3A4.
Drawing Histograms: 3B1, 3B3, 3C1.
Average: 4A1, 4A5, 4A8 4B1.
Standard Deviation: 4D1, 4D4, 4D11, 4E1, 4E3, 4E5.
Normal curve: 5B1, 5B3.
Normal Approximation of Data: 5A1, 5C1, 5C3.
Percentiles: 5D1, 5E1, 5E2.

A code like 3A1 points to the first exercise of section A from chapter 3. The answers to the exercises can be found in the final part of the book.


Last update: October 28, 1997. erikt@stp.ling.uu.se