These are the exercises and references for the second class of the course Dokumenthanteringen
The results of the exercises marked with * have to be handed in.
<!DOCTYPE html [ ]>
and add DTD definitions between the square brackets until the html-ncheck accepts the file. You need to make your own DTD for the file. This is an alternative way of specifying DTDs: in the documents themselves.
If you want to see how the html-ncheck has analyzed the file then add a -o option between the command and the filename. The program is nothing more than a script which calls the James Clark's nsgml program. There is a manual for the latter if you want more information about it. The HTML DTD can be found in /usr/local/lib/html-check/lib/html.dtd
#!/usr/local/bin/perl # simple conversion program # usage: myconvert < infile > outfile # 970210 erikt@stp.ling.uu.se # while a line can be read while (<>) { # change every "pipe" on the line to "|" s/pipe/|/g; # show the line printf; }
Expand this program to an HTML to ISO Latin 1 conversion program (output should be without HTML tags) that can at least handle the HTML document of section 2.2. It does not need to take care of paragraph formatting. In order to do this exercise you have to know something about regular expressions (man 5 regexp).