||2002-07-08 12:15:41 EDT
||Indexing the Collection (Text Class)
||$Revision: 1.6 $
Indexing the Collection (Text Class)
After you have followed all the steps to set up your directories and prepare your files, as found in the Text Class preparation documentation, indexing the collection is fairly straightforward. To create an index for use with the Text Class
interface, you will need to index the words in the collection, then index the
SGML/XML (the structural metadata, if you will), and then finally "fabricate"
structures based on a combination of elements (for example, defining what the
"main entry" is, without adding a <MAINENTRY> tag around the appropriate
<AUTHOR> or <TITLE> element). The following commands can be used to
make the index, alone or in combination.
- Ensure that your collection SGML is valid by using the make validate command in the Makefile stored at $DLXSROOT/bin/c/collid/Makefile
- make singledd indexes words for texts that
have been concatenated into one large file for a collection. This is the
recommended process, as a data dictionary built from a single concatenate file is faster for searching and more reliable than one built using multi-file system indexing.. Use the make singledd command in the Makefile stored at $DLXSROOT/bin/c/collid/Makefile .
- make sgml indexes the SGML structure by
reading the DTD, and validates as it indexes. It is slower than multiregion indexing (see
XPAT documentation for more information) for this reason. However, it is necessary for collections that have nested
elements of the same name (even when separated by an intervening element, such as a <P> within <NOTE1> that is itself within a <P>). Use the make sgmlcommand in the Makefile stored at $DLXSROOT/bin/c/collid/Makefile
- make post builds and indexes fabricated
regions based on the XPAT queries stored in the $DLXSROOT/prep/c/collid/collid.extra.srch file. Because every collection is different, this file will need to be adapted after you have determined what you want to use as a "poem" for text (e.g., perhaps every DIV1 TYPE="sonnet" and DIV2 TYPE="poem" in the collection) and how many levels of division heads you have in your collection (e.g., at least one text is nested to DIV4, so you'll need to fabricate up to div4head). If the extra.srch file references elements not used in your text collection, you will see errors like Error found:
<Error>syntax error before: ")</Error>
when you use the make post command in the Makefile stored at $DLXSROOT/bin/c/collid/Makefile. Remove unnecessary lines.
You have now built indexes and region files for your collection. You can test that things are properly indexed by issuing the commandxpat $DLXSROOT/idx/c/collid/collid.dd and doing searches, such as for a common word like the or an element that should appear such as region "main" or region "HEADER". Strategically, it is good to test this from a directory other than the one you indexed in, to ensure that relative or absolute paths are resolving