Normalization of Data
You get a lot of benefit from normalization of SGML:
-
tag names, attribute names, and some attribute value types are normalized
into upper case
-
record ends are normalized in a consistent fashion based on element content
models
-
optional or minimized tags are made explicit (makes programmatic parsing
much easier)
-
most normalizers put attributes in the order in which they are declared
in the DTD (though this is not part of the formal definition of normalization)
Our favorite normalizer is sgmlnorm
from James Clark's SP.
command: sgmlnorm doctype_file sgml_file > output_sgml_file
Here is an example of how normalization
might change an sgml document and some detail on how this eases parsing.
Do look at all of
James Clark's SGML/XML tools.
Normalization: Hands On
To get more of a feel for the process we'll use the bosnia Makefile to
do the necessary normalization (sgmlnorm)
step. But before we can normalize the data it must be transformed.
The <PB> (pagebreak) tags are processed and their attributes
and values are changed to conform to the expectations of the Page Viewer.
After the <PB> tags have been "munged" we will also use the Makefile
to check for valid sgml before normalizing. This runs nsgmls,
James Clark's parser.
% cd $HOME/dlxs/idx/b/bosnia
% make noded
% make validate
% make norm