Index Building: Bibliographic Class

You will need to identify a directory or directories where you plan to store your SGML or XML source file, your index file (approximately 75% of the size of your bibliographic information), your "region" files, and other information such as data dictionaries. We recommend you use the following structure:

The instructions below assume a sample collection named "nyt" and a DLXSROOT of "/l1", as in the above examples. Please replace these sample names with your local filenames.

  1. Ensure that your SGML is fully validated or normalized, or that your XML is fully validated. Use a validating parser such as nsgmls to accomplish this. NB: Building indexes without validation can cause problems such as unreliable results; data that will not validate should not be put online.
  2. Ensure that your data is Unicode (see DLXS Unicode Data Preparation and Online Presentation Issues).
  3. Assuming XML, put the file nyt.xml in /l1/obj/n/nyt/nyt.xml
  4. Copy the sample data dictionary file bib-sample.dd to /l1/idx/n/nyt/ and rename as nyt.dd
  5. Edit the nyt.dd file to replace
    1. b/bib-sample/bib-sample.xml with n/nyt/nyt.xml
    2. b/bib-sample/bib-sample.idx with n/nyt/nyt.idx
    3. and b/bib-sample/bib-sample.init with n/nyt/nyt.init
  6. Copy the sample init file bib-sample.init to /l1/idx/n/nyt/ and rename as nyt.init
  7. Index your collection using the following command, replacing the value 10m with an appropriate amount of memory. Please see XPAT documentation to determine how much memory to allocate.
        xpatbldu -m 10m -D /l1/idx/n/nyt/nyt.dd
  8. Create your region files by issuing the following command.
        multirgn -f -D /l1/idx/n/nyt.dd -t bib-regions.tags
    The file bib-regions.tags can be located in any directory and can be deleted after the regions have been indexed. DLPS keeps a copy of this file in /l1/obj/lib/sgml/bib-regions.tags

You have now built indexes and region files for your collection. You can test that things are properly indexed by issuing the command
    xpatu /l1/idx/n/nyt/nyt.dd
and then searching a common word (e.g., "the") and
    region A
Strategically, it is good to test this from a directory other than the one you indexed in, to ensure that relative or absolute paths are resolving appropriately.