Last updated 2002-07-08 12:15:41 EDT
Doc Title Indexing the Collection (Text Class)
Author 1 Powell, Chris
CVS Revision $Revision: 1.6 $
Indexing the Collection (Text Class)

After you have followed all the steps to set up your directories and prepare your files, as found in the Text Class preparation documentation, indexing the collection is fairly straightforward. To create an index for use with the Text Class interface, you will need to index the words in the collection, then index the SGML/XML (the structural metadata, if you will), and then finally "fabricate" structures based on a combination of elements (for example, defining what the "main entry" is, without adding a <MAINENTRY> tag around the appropriate <AUTHOR> or <TITLE> element). The following commands can be used to make the index, alone or in combination.

  1. Ensure that your collection SGML is valid by using the make validate command in the Makefile stored at $DLXSROOT/bin/c/collid/Makefile
  2. make singledd indexes words for texts that have been concatenated into one large file for a collection. This is the recommended process, as a data dictionary built from a single concatenate file is faster for searching and more reliable than one built using multi-file system indexing.. Use the make singledd command in the Makefile stored at $DLXSROOT/bin/c/collid/Makefile .
  3. make sgml indexes the SGML structure by reading the DTD, and validates as it indexes. It is slower than multiregion indexing (see XPAT documentation for more information) for this reason. However, it is necessary for collections that have nested elements of the same name (even when separated by an intervening element, such as a <P> within <NOTE1> that is itself within a <P>). Use the make sgmlcommand in the Makefile stored at $DLXSROOT/bin/c/collid/Makefile
  4. make post builds and indexes fabricated regions based on the XPAT queries stored in the $DLXSROOT/prep/c/collid/collid.extra.srch file. Because every collection is different, this file will need to be adapted after you have determined what you want to use as a "poem" for text (e.g., perhaps every DIV1 TYPE="sonnet" and DIV2 TYPE="poem" in the collection) and how many levels of division heads you have in your collection (e.g., at least one text is nested to DIV4, so you'll need to fabricate up to div4head). If the extra.srch file references elements not used in your text collection, you will see errors like Error found: <Error>syntax error before: ")</Error> when you use the make post command in the Makefile stored at $DLXSROOT/bin/c/collid/Makefile. Remove unnecessary lines.

You have now built indexes and region files for your collection. You can test that things are properly indexed by issuing the commandxpat $DLXSROOT/idx/c/collid/collid.dd and doing searches, such as for a common word like the or an element that should appear such as region "main" or region "HEADER". Strategically, it is good to test this from a directory other than the one you indexed in, to ensure that relative or absolute paths are resolving appropriately.