The sgmlrgn Step

The sgmlrgn50 command creates a coll.rgn file, an index into every SGML region in the text.

The command, outside the Makefile, would look like this:

sgmlrgn50 -m region -o /dir/coll -D /dir/coll.dd doctype coll.sgm

where the -m region tells sgmlrgn50 that it is building regions, the -o directory tells it where the output file should go, the -D ddfile tells it what .dd file to use. The next argument is the doctype file to use (usually collname.inp, and the last argument is the sgml data file.


What sgmlrgn50 Gets Us

The sgmlrgn step identifies main SGML elements and constructs them as regions. To make things as precise as possible:

What elements we have is determined by our use of the DTD. What sgml regions we have is determined by the DTD: sgmlrgn makes the following regions for us automatically:

So, if we had a proper SGML database of some HTML documents, we would have sgml regions like the following:

HTML
P
STRONG
META-T
INPUT-T
A-HREF
A-NAME
A-ALT

For empty elements, those elements that have a start tag, and no content or end tag (say, IMG or BR), region ELEMENT and region "ELEMENT-T" are congruent.

NOTE: regions whose names have characters other than [A-Za-z0-9] in them must be double-quoted to be interpreted by pat50 correctly. All the A-ATTRIB and ELEMENT-T regions must be so quoted, as must things like TEI.2. Fine argument for always double-quoting region names.

We can print out a list of all currently defined regions with the fabulously useful {ddinfo regionnames} command:

>> {ddinfo regionnames}
HTML-T [ ]
HEAD-T [ ]
TITLE-T [ ]
TITLE [ ]
A-NAME [ ]
A-CONTENT [ ]
META-T [ ]
META [ ]
HEAD [ ]
A-BGCOLOR [ ]
BODY-T [ ]
A-ALIGN [ ]
...
DIV-T [ ]
H3-T [ ]
H3 [ ]
DIV [ ]
A-SRC [ ]
A-ALT [ ]
IMG-T [ ]
IMG [ ]
BR-T [ ]
BR [ ]
>> 

The [ ] would usually hold some user-defined descriptive or annotative text about the region under PAT50 (say, how or why it was made by whom, especially important for fabricated regions), viz.:

    <Region>
      <Name>A-OTHERSOURCE</Name>
      <Desc>attribute describing the other source...</Desc>
      <File>
        <SysName>/home/pagliere/dlxs/idx/t/text/main.rgn</SysName>
        <ModDate>898201323</ModDate>
        <Offset>2279824</Offset>
      </File>
      <Count>4240</Count>
      <Type>pairs</Type>
    </Region>

Which renders in XPat in a {ddinfo regionnames} command as:

A-OTHERSOURCE [attribute describing the other source...]

We've never actually used this. We're not even sure there is a reason for using this.

{ddinfo regionnames} not only lists the sgml regions defined, but also any fabricated regions.