Hints assignment 1.1
These are the hints for
assignment 1.1.
Not all these hints are necessary for successfully completing the
assignment.
- There should be a solution to this exercise in your lecture notes.
But you need to add some commands for cleaning up the input a
little bit before applying that solution.
- You can use the command cut for selecting specific characters of
lines. For example "cut -c12-17 FILE" will select the characters
12, 13, 14, 15, 16 and 17 of each line in FILE. By leaving
the out the last number you can select all characters from a
specific character to the end of the line.
- You can use the | for sending the output of one command to another
command.
- You can use the command sed to replace strings by something else.
For example: "sed 's/A[^A]*A//g' FILE" will replace every substring
starting with A followed by zero or more non-A's and ended with A
by nothing. The / characters indicate the borders of the search
pattern and the replace pattern. The little g means replace
all patterns on the line instead of replacing only the first one.
The [] indicate a character class and ^ means not. Thus [^A]
means one character that is not an A and [^A]* means a sequence
of non-A characters.
- You can use the command tr for replacing word boundary characters
with newlines. Don't forget the -s options to avoid getting long
sequences of newlines.
- You can use the command sort to sort lines. Without options it
will perform an alphabetical sort. If you want a numeric sort then
try "sort -n FILE". Insert option -r if you want a reverse sort.
- You can use the command wc for counting words and lines.
Output format: number of lines, number of words, number of
characters.
- You can use the uniq command to count repeated lines: "uniq -c
FILE".
- You can use the command more to view a file page by page and to
prevent output scrolling over your screen.
- It is quite possible that the answers that you find are different
than the one of your neighbor. This happens when your cleaning up
code is different or when you use different tokenization commands.
You should only worry if the differences are big. Small
differences are quite common.
Last update: September 26, 1996.
erikt@stp.ling.uu.se