previous main page next

Dokumenthantering VT97:02

These are the exercises and references for the second class of the course Dokumenthanteringen


Exercises

The results of the exercises marked with * have to be handed in.

  1. Examine the file ~erikt/P/dh97/mail and try to find back the MIME topics discussed in section 2.1.

  2. Examine at least of the RFCs mentioned in the text (see the reference pointer for the location). You don't have to read nor understand it completely but it would be good if you had browsed through at least one of them. You might need them later on to look up something technical in detail.

  3. * At our AIX system there are two programs available for checking SGML: sgml-ncheck (for the TEI Lite DTD) and html-ncheck (for several HTML DTDs). Use html-ncheck for checking if the HTML file in section 2.2 uses correct HTML. Don't forget to insert the extra SGML header line as a first line in this file before you check it (see section 2.3). Now replace the first line of the file by:

    <!DOCTYPE html [ ]>

    and add DTD definitions between the square brackets until the html-ncheck accepts the file. You need to make your own DTD for the file. This is an alternative way of specifying DTDs: in the documents themselves.

    If you want to see how the html-ncheck has analyzed the file then add a -o option between the command and the filename. The program is nothing more than a script which calls the James Clark's nsgml program. There is a manual for the latter if you want more information about it. The HTML DTD can be found in /usr/local/lib/html-check/lib/html.dtd

  4. * The following Perl script reads files from standard input and replaces every occurrence of the word "pipe" in the text by the pipe symbol |

    #!/usr/local/bin/perl
    # simple conversion program
    # usage: myconvert < infile > outfile
    # 970210 erikt@stp.ling.uu.se
    
    # while a line can be read
    while (<>) {
       # change every "pipe" on the line to "|"
       s/pipe/|/g;
       # show the line
       printf;
    }
    

    Expand this program to an HTML to ISO Latin 1 conversion program (output should be without HTML tags) that can at least handle the HTML document of section 2.2. It does not need to take care of paragraph formatting. In order to do this exercise you have to know something about regular expressions (man 5 regexp).


References


Last update: October 09, 1998. erikt@stp.ling.uu.se