Last updated 2002-07-08 12:21:49 EDT
Doc Title Creating Text Class Word Wheels
Author 1 Powell, Chris
CVS Revision $Revision: 1.6 $
Creating Text Class Word Wheels

General Information

The word wheel tools extract each word in a text and build it into a small SGML file, along with a count of the number of times the word appears in a given collection. The SGML file is then normalized and indexed, ready to be used by the Text Class middleware.

In order to make word wheels available for your collection, you must both build the word wheel and fill in the appropriate fields in the collection manager, indicating the location of the index (the wwdd field, usually containing /idx/c/collid/WW/collid.ww.dd) and the wwrealms and wwrealmseng fields, which identify the fields available (e.g., full text, author, title) and indicate how they should appear in the interface (e.g., perhaps as "Full Text" or "all the words" or some other variation on "full text").

Building the Word Wheel

In the DLXS release, there are files in the directory $DLXSROOT/bin/WW to help you build the word wheel. The SGML file that this process creates will be created and stored in the directory /l1/prep/c/collid/WW which must exist before you begin running the scripts. The normalized SGML that results from running all the word wheel creation steps will be stored with the collection SGML in /l1/obj/c/collid, and the indexes and data dictionary will be stored in /l1/idx/c/collid/WW/.

  1. Copy $DLXSROOT/bin/WW/sample.ww.blank.dd to $DLXSROOT/idx/c/collid/WW/collid.ww.blank.dd and edit it to reflect the name of your collection.
  2. Copy $DLXSROOT/bin/WW/sample.ww.inp to $DLXSROOT/idx/c/collid/WW/collid.ww.inp and eidt it to add or point to any character entity declarations necessary for this collection
  3. Copy $DLXSROOT/bin/WW/Makefile to $DLXSROOT/idx/c/collid/idx/WW/Makefile and edit.
  4. Copy $DLXSROOT/bin/WWmakeWordWheelFiles.sample.cfg to $DLXSROOT/idx/c/collid/idx/WW/makeWordWheelFiles.cfg and edit to point to the proper directories.
  5. cd to $DLXSROOT/idx/c/collid/WW and run:
    % $DLXSROOT/bin/WW/makeWordWheelFiles.pl makeWordWheelFiles.cfg
    
       
    This will create collid.ww.unnorm.sgm in $DLXSROOT/prep/c/collid/WW
  6. collid.ww.unnorm is then normalized (in $DLXSROOT/obj/c/collid) and indexed by the Makefile, thereby creating a XPAT indexed wordwheel file for your collection.

Note 1: Input to makeWordWheelFiles.pl as specified in the .cfg can consist of one or more .sgm files, e.g., collections indexed with a single file or collections indexed through multi-file system indexing (MFS).

Note 2: The configuration (.cfg) file can specify an array of dd files for collections that have multiple indexes. Currently these two mechanisms are mutually exclusive. Either a single collection can have multiple .dd files or a collection of multiple .sgm files will have a single index.