Hints assignment 1.1

These are the hints for assignment 1.1. Not all these hints are necessary for successfully completing the assignment.

  1. There should be a solution to this exercise in your lecture notes. But you need to add some commands for cleaning up the input a little bit before applying that solution.

  2. You can use the command cut for selecting specific characters of lines. For example "cut -c12-17 FILE" will select the characters 12, 13, 14, 15, 16 and 17 of each line in FILE. By leaving the out the last number you can select all characters from a specific character to the end of the line.

  3. You can use the | for sending the output of one command to another command.

  4. You can use the command sed to replace strings by something else. For example: "sed 's/A[^A]*A//g' FILE" will replace every substring starting with A followed by zero or more non-A's and ended with A by nothing. The / characters indicate the borders of the search pattern and the replace pattern. The little g means replace all patterns on the line instead of replacing only the first one. The [] indicate a character class and ^ means not. Thus [^A] means one character that is not an A and [^A]* means a sequence of non-A characters.

  5. You can use the command tr for replacing word boundary characters with newlines. Don't forget the -s options to avoid getting long sequences of newlines.

  6. You can use the command sort to sort lines. Without options it will perform an alphabetical sort. If you want a numeric sort then try "sort -n FILE". Insert option -r if you want a reverse sort.

  7. You can use the command wc for counting words and lines. Output format: number of lines, number of words, number of characters.

  8. You can use the uniq command to count repeated lines: "uniq -c FILE".

  9. You can use the command more to view a file page by page and to prevent output scrolling over your screen.

  10. It is quite possible that the answers that you find are different than the one of your neighbor. This happens when your cleaning up code is different or when you use different tokenization commands. You should only worry if the differences are big. Small differences are quite common.


Last update: September 26, 1996. erikt@stp.ling.uu.se