Copying Page-level Metadata from pageview.dat files to a Database Store

Copying Page-level Metadata from pageview.dat files to a Database Store

As of Release 8, DLXS Text Class middleware employs standard database mechanisms for storing page-level metadata (an improvement for scalability and management from the legacy pageview.dat file mechanism). DLXS encourages participants to move away from this legacy mechanism, and is currently working toward release of a utility that can be used during digitization to populate the database store directly as metadata is generated.

In the meantime, to help transition materials that are using the legacy mechanism, we've supplied a utility to copy metadata from pageview.dat files into the database store (the Pageview table in the DLXS database) used by Text Class.

The utility bin/t/text/importpageviewdata.pl can be run once (for a one-time cutover) or on a regular basis via cron or your preferred scheduling utility (for situations where pageview.dat files are still being maintained and updated).

The syntax is

  importpageviewdata.pl -d [directory] [-f]

The required [directory] argument specifies the directory to crawl for pageview.dat files. If there is more than one, use "directory1 directory2 directory3 ...", with the quotes.

Directories may be excluded from crawling (for performance or other reasons) by creating files named .importpageviewdata.skip directly inside them. There is no limit on the number of directories that may be skipped using mechanism.

The optional [-f] argument specifies a "full" run. If used, the recorded time of the previous run is ignored, and all pageview.dat files are processed regardless of their age. Without this argument, the utility runs in a "maintenance" mode, copying only metadata changed since the last run to the database store to increase performance and reduce database fragmentation and load.

importpageviewdata.pl will automatically use the database, user, and password that you entered when installing Text Class. If you manually change these configurations in Text Class, importpageviewdata.pl will honor the changes and connect using the new parameters.