Indexing the Collection (Finding Aids)

After you have followed all the steps to set up your directories and prepare your files, as found in the finding aids preparation documentation, indexing the collection is fairly straightforward. To create an index for use with the Findaid Class interface, you will need to index the words in the collection, then index the XML (the structural metadata, if you will), and then finally "fabricate" structures based on a combination of elements (for example, defining what the "main entry" is, without adding a <MAINENTRY> tag around the appropriate <AUTHOR> or <TITLE> element). The following commands can be used to make the index, alone or in combination.

  1. Ensure that your collection data is valid by running make validate, which will use the dlxsead2002.dtd to validate the full xml file.
  2. Ensure that your collection data is normalized by running make norm. This step is done to put attributes in the order in which they were defined in the DTD. Even thought your collection data is XML, it is a requirement of xmlrgn (part of the make xml step below) that the attributes appear in this order.
  3. make singledd indexes words for texts that have been concatenated into one large file for a collection. Creating an index from a single file (as opposed to multi file system indexing) is the recommended process for reasons of speed and reliability. Use the make singledd command in the Makefile stored at $DLXSROOT/bin/c/collid/Makefile .
  4. make xml indexes the SGML structure by reading the DTD. sgmlrgn validates as it indexes, and is slower than multiregion indexing (see XPAT documentation for more information) for this reason. However, this method necessary for collections that have nested elements of the same name (and the EAD DTD permits this). Use the make sgml command in the Makefile stored at $DLXSROOT/bin/c/collid/Makefile .
  5. make post builds and indexes fabricated regions based on the XPat queries stored in the $DLXSROOT/prep/c/collid/{coll}.extra.srch file. Because every collection is different, this file will need to be adapted after you have determined what you want to use as the "main title" for a finding aid (e.g., perhaps the ORIGINATION within the DID within the ARCHDESC) and how many levels of components (e.g., nested to C04) you have in your collection. If you try to index/build fabricated regions from elements not used in your finding aids collection, you will see errors like Error found: <Error>syntax error before: ")</Error> when you use the make post command in the Makefile stored at $DLXSROOT/bin/c/collid/Makefile .

You have now built indexes and region files for your collection. You can test that things are properly indexed by issuing the command xpat $DLXSROOT/idx/c/collid/collid.dd and doing searches, such as region "c02" and region "main" . For more information about searching, see the XPAT manual .Strategically, it is good to test this from a directory other than the one you indexed in, to ensure that relative or absolute paths are resolving appropriately.