Course Outline

  1. Introductions and course objectives
  2. DLPS and TextClass Overview
    1. Process Overview DIagram
    2. Environment
    3. Document Classes
    4. Directory Structure
    5. Data Preparation Overview
    6. Text Class Components
      1. Perl scripts
      2. Objects
      3. Modules
      4. Subsystems
    7. Image Class
      1. Images
      2. Metadata
  3. Environment & Architecture Details
  4. Directory Structure Details
  5. Image Class (John Weise, Coordinator of Image Services)
    1. Installation and Configuration
    2. Image Class Access Retrictions
    3. Discussion of Approaches to Batch Image Processing
      1. Image Class CD-ROM Loading and Processing
      2. Image Class Image Loading
      3. Image Class Image Processing
    4. Image Processing Software Links
      1. ImageMagick (multiple platforms, free)
      2. DeBabelizer (Win, Mac, not free)
      3. Graphic Converter (Macintosh only, inexpensive)
      4. LizardTech (MrSID)
  6. Data Preparation (Part 1) (Chris Powell, Coordinator of Encoded Text Services):
    Encoding & Transformation (see Encoding Workshop Information)
    1. Data Sources
      1. SGML encoded text
      2. Page Images
      3. Page Images to SGML encoded text
    2. GUMS/TextClass
    3. In line markup (text munging vs. new XPat functionality)
    4. Unnumbered, nested, identical elements
    5. SGML Tools
      1. nsgmls, XSLT, perl, sgmlnorm
    6. Transformation
    7. Normalization
    8. TermMapper & Fabricated regions
    9. Levels of Encoding
  7. Installation of DLXS TextClass Middleware & Content
    1. extraction of Middleware and content tar files
    2. edit of configuration files
  8. Makefile
  9. Normalization using sgmlnorm
  10. XPat Search Engine
    1. History of the software
      1. OpenText 5.0, Patricia tree structure
      2. OpenText 6.0, token based
      3. Acquired source code 1999
      4. XPat Enhancements to include: nsgmls, XML, Unicode, DOM?
    2. Indexing
      1. What's needed
        1. Normalized SGML
        2. DTD
        3. doctype declaration (in a file separate from DTD)
        4. Data dictionary file
          1. multi-file indexing
          2. structure
          3. how it is created
          4. extra.srch (discussed later)
      2. SGML text data indexing
      3. Region indexing
      1. Query Language
        1. syntax and usage
        2. programming XPat queries in Perl
        3. named result sets
        4. handled by XPatResultSet object
      2. Fabricated regions
  11. Related / Derivative Data
    1. TextClass
      1. CollDb
      2. Mapper
    2. Pageview Data Preparation
      1. page images as TIFF, delivered as GIF and PDF
      2. pageview.dat files
        1. structure
        2. use by PageView object
    3. WordWheel Data Preparation
      1. History
      2. Overview of wordwheel data creation
      3. directory structure
      4. need for previously indexed collection
      5. work with Content Specialist / encoders on realms
      6. configuration of makewordwheel.cfg
  12. Program Architecture
    1. text-idx
      1. Functional Requirements
        1. Cross collection searching
        2. Cross machine searching
      2. cfg configuration files
      3. Objects used
        1. CGI
        2. TextClass
        3. CollsInfo
        4. XPat
        5. SearchSet and XPatResultSet
        6. RemoteXPatConnect
        7. QueryFactory and TerminolgyMapper
        8. DlpsSession / Apache::Session
        9. SearchHistory
        10. Bookbag
        11. ProcIns
      4. text-idx walkthrough
        1. URL parameters
        2. CGI code
      5. URL parameters
    2. pageviewer-idx
      1. TIFF, GIF and PDF
      2. Objects used
        1. CGI, CollsInfo, DlpsSession, etc.
        2. PageView
      3. pageview.dat file
      4. URL parameters
      5. pageviewer-idx.cfg file
      6. pageview-idx walkthrough
    3. ww2-idx
      1. WW object, and others
      2. XPat indexed word data
      3. walkthrough
  13. User Interface Issues (Matt Stoeffler, User Interface Specialist)
    1. TBD
  14. Subclassing the TextClass
    1. editing of depths for DIVs
    2. specific searching
    3. specific filtering
  15. In the works
  16. Q & A
  17. The Future
    1. XML
    2. XSLT
    3. Java