|Last updated||2002-07-08 12:21:49 EDT|
|Doc Title||Creating Text Class Word Wheels|
|Author 1||Powell, Chris|
|CVS Revision||$Revision: 1.6 $|
The word wheel tools extract each word in a text and build it into a small SGML file, along with a count of the number of times the word appears in a given collection. The SGML file is then normalized and indexed, ready to be used by the Text Class middleware.
In order to make word wheels available for your collection, you must both build the word wheel and fill in the appropriate fields in the collection manager, indicating the location of the index (the wwdd field, usually containing /idx/c/collid/WW/collid.ww.dd) and the wwrealms and wwrealmseng fields, which identify the fields available (e.g., full text, author, title) and indicate how they should appear in the interface (e.g., perhaps as "Full Text" or "all the words" or some other variation on "full text").
In the DLXS release, there are files in the directory $DLXSROOT/bin/WW to help you build the word wheel. The SGML file that this process creates will be created and stored in the directory /l1/prep/c/collid/WW which must exist before you begin running the scripts. The normalized SGML that results from running all the word wheel creation steps will be stored with the collection SGML in /l1/obj/c/collid, and the indexes and data dictionary will be stored in /l1/idx/c/collid/WW/.
% $DLXSROOT/bin/WW/makeWordWheelFiles.pl makeWordWheelFiles.cfgThis will create collid.ww.unnorm.sgm in $DLXSROOT/prep/c/collid/WW
Note 1: Input to makeWordWheelFiles.pl as specified in the .cfg can consist of one or more .sgm files, e.g., collections indexed with a single file or collections indexed through multi-file system indexing (MFS).
Note 2: The configuration (.cfg) file can specify an array of dd files for collections that have multiple indexes. Currently these two mechanisms are mutually exclusive. Either a single collection can have multiple .dd files or a collection of multiple .sgm files will have a single index.