Dokumenthantering VT98:05

These are the exercises for the second lab session of the course Dokumenthantering VT98. There are 8 exercises this week and 2 of them are obligatory. The obligatory exercises have been marked with a *.

Exercises Lab 2
References Week 2

Write a report about the obligatory exercises. The report should fulfill the same requirements as the report for the first lab. The deadline for handing in the report for this week's exercises is Wednesday February 17, 1998.

Exercises Lab 2

The command printf can be used under Unix to convert decimal numbers to octal or hexadecimal values. Examples:
```
   printf "octal: %o\n" 224
   printf "hexadecimal: %x\n" 224
```
In the first string %d will print a decimal number, %o will print an octal number and %x (or %X) will print a hexadecimal number. The number behind the string (in this case 224) will be interpreted as a decimal number. Use this printf command to verify the numbers shown in the table in section 3.1. The same can be done with the following perl script:
```
   #!/usr/local/bin/perl -w
   for (4,10) {
      printf "decimal %3d; octal %3o; hexadecimal %3x\n",$_, $_, $_;
   }
```
The control characters can also be printed with printf. Try running:
```
   #!/usr/local/bin/perl -w
   printf "%c%c%c%ca%cb%c\n", 7, 10, 13, 127, 8;
```
The %c will print a character with the decimal ASCII value which has been specified behind the string. The result will depend on the terminal configuration you are using. The a or the b may be deleted by the next control character and you may hear a bell. Try running the command and piping it to the more command. This should remove the bell sound and show ^G.
Our AIX machines have all ISO 8859 character sets available except ISO 8859-10. You can display the font in a terminal window by using the command asciiTable. Each window can display only one character set. You can start a window with a different character set (font) by starting the window program on the command line with arguments -fn FONT where FONT is an X windows font. You can use the command xlsfonts to get an overview of the available fonts. Example:
```
   xterm -fn -urw-courier-medium-r-normal--13-100-100-100-m-80-iso8859-5
```
starts an xterm window with font ISO 8859-5 size 13. You can also work with the different character sets by starting one of the programs aixterm1, aixterm2, and so on or emacs1, emacs2 and so on. These programs start either aixterm or emacs with the font ISO 8859-X in which X is the digit in the program name. Again you can get an overview of the characters that are being used by typing asciiTable in one of the aixterm windows.
Use these overviews and the web page mentioned in the references to choose the best ISO 8859 character set for displaying an aligned text in Swedish and Slovenian. The character set should include as many as possible characters with diacritics of both languages. For Swedish the lower case characters with diacritics are å, ä ö and é. For Slovenian these are:
```
   v   v   v
   c   s   z
```
Write a Perl script that performs the same task as asciiTable.
Examine the file /home/staff/web/priv/dh98/misc/mail and try to find back the MIME topics discussed in section 4.1.
Examine at least one of the RFCs mentioned in the text (see the RFC site). You neither have to read nor understand it completely but it would be good if you had browsed through at least one of them. You might need RFCs in the future for looking up something technical in detail. Note: relevant RFCs: RFC 822: e-mail; RFC 1521: MIME; RFC 1866: HTML 2.0 and RFC 1738: URLs.
* At our AIX system there are two programs available for checking SGML: sgml-ncheck (for the TEI Lite DTD) and html-ncheck (for several HTML DTDs). Use html-ncheck for checking if the HTML file in section 4.2 uses correct HTML. Don't forget to insert the following extra SGML header line as a first line in this file before you check it (see section 4.3):

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">

Now replace the first line of the file by:

<!DOCTYPE html [ ]>

and add DTD definitions between the square brackets until html-ncheck accepts the file (example, more). You need to make your own DTD for the file. This is an alternative way of specifying DTDs: in the documents themselves.
If you want to see how html-ncheck has analyzed the file then add a -o option between the command and the filename. The program is nothing more than a script which calls the James Clark's nsgml program. There is a manual for the latter if you want more information about it. The HTML DTD can be found in /usr/local/lib/html-check/lib/html.dtd (look at the source if you do not see anything readable).
* Write a Perl program that behaves like the program htmlize (see the manual page for information). Your program only has to simulate conversion of four characters: å (å), ä (ä), ö (ö) and é (é). It has to be able to convert these characters from ISO Latin 1 to SGML entities and back (-r option) for an arbitrary number of files. [answer example]

References Week 2

http://www.ioc.ee/home/tarvi/mime_pem/FAQ-ISO-8859-1.html: The ISO 8859 Frequently Asked Question List.
http://rocinante.colorado.edu/~wilms/computers/lowascii.html: Explanation of the ASCII control characters.
http://wwwwbs.cs.tu-berlin.de/~czyborra/charsets/: Overview of the ISO 8859 character sets with examples of the characters in the different sets.
http://www.unicode.org/: Home page of the Unicode Consortium.
http://www.mindspring.com/~mgrand/mime.html: Mark Grand's description of MIME
http://www.cs.ruu.nl/wais/html/na-dir/mail/mime-faq/.html: MIME Frequently Asked Questions (FAQ)
http://ds.internic.net/rfc/: Request For Comments (RFC) directory containing many RFCs.
http://www.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimerP1.html: NCSA's Beginner's Guide to HTML.
http://www.w3.org/: Home page World Wide Web Consortium.
http://etext.virginia.edu/bin/tei-tocs?div=DIV1&id=SG: A Gentle Introduction to SGML is an SGML introduction which is part of the documents for the Text Encoding and Interchange (TEI) initiative.
ftp://ftp.math.utah.edu/pub/sgml/index.html: James Clark's free SGML software.

Last update: February 19, 1998. erikt@stp.ling.uu.se