Course Outline
- Introductions and course objectives
- DLPS and TextClass Overview
- Process Overview
DIagram
- Environment
- Document Classes
- Directory Structure
- Data Preparation Overview
- Text Class Components
- Perl scripts
- Objects
- Modules
- Subsystems
- Image Class
- Images
- Metadata
- Environment & Architecture
Details
- Directory Structure Details
- Image Class (John Weise, Coordinator of Image Services)
- Installation
and Configuration
- Image
Class Access Retrictions
- Discussion of Approaches to Batch Image Processing
- Image
Class CD-ROM Loading and Processing
- Image
Class Image Loading
- Image
Class Image Processing
- Image Processing Software Links
- ImageMagick
(multiple platforms, free)
- DeBabelizer
(Win, Mac, not free)
- Graphic Converter
(Macintosh only, inexpensive)
- LizardTech
(MrSID)
- Data Preparation (Part 1) (Chris Powell, Coordinator of Encoded Text
Services):
Encoding & Transformation (see Encoding
Workshop Information)
- Data Sources
- SGML encoded text
- Page Images
- Page Images to SGML encoded text
- GUMS/TextClass
- In line markup (text munging vs. new XPat functionality)
- Unnumbered, nested, identical elements
- SGML Tools
- nsgmls, XSLT, perl, sgmlnorm
- Transformation
- Normalization
- TermMapper & Fabricated regions
- Levels of Encoding
- Installation of DLXS TextClass
Middleware & Content
- extraction of Middleware and content tar files
- edit of configuration files
- Makefile
- Normalization using sgmlnorm
- XPat Search Engine
- History of the software
- OpenText 5.0, Patricia tree structure
- OpenText 6.0, token based
- Acquired
source code 1999
- XPat Enhancements to include: nsgmls, XML, Unicode, DOM?
- Indexing
- What's needed
- Normalized SGML
- DTD
- doctype declaration (in a file separate from DTD)
- Data dictionary file
- multi-file indexing
- structure
- how it is created
- extra.srch (discussed later)
- SGML text data
indexing
- Region indexing
- Query Language
- syntax and usage
- programming XPat queries in Perl
- named result sets
- handled by XPatResultSet object
- Fabricated regions
- Related / Derivative Data
- TextClass
- CollDb
- Mapper
- Pageview Data Preparation
- page images as TIFF, delivered as GIF and PDF
- pageview.dat files
- structure
- use by PageView
object
- WordWheel Data Preparation
- History
- Overview of wordwheel data creation
- directory structure
- need for previously indexed collection
- work with Content Specialist / encoders on realms
- configuration of makewordwheel.cfg
- Program Architecture
- text-idx
- Functional Requirements
- Cross collection searching
- Cross machine searching
- cfg configuration files
- Objects used
- CGI
- TextClass
- CollsInfo
- XPat
- SearchSet
and XPatResultSet
- RemoteXPatConnect
- QueryFactory
and TerminolgyMapper
- DlpsSession
/ Apache::Session
- SearchHistory
- Bookbag
- ProcIns
- text-idx
walkthrough
- URL
parameters
- CGI code
- URL
parameters
- pageviewer-idx
- TIFF, GIF and PDF
- Objects used
- CGI, CollsInfo, DlpsSession, etc.
- PageView
- pageview.dat
file
- URL
parameters
- pageviewer-idx.cfg
file
- pageview-idx
walkthrough
- ww2-idx
- WW object,
and others
- XPat indexed word data
- walkthrough
- User Interface Issues (Matt Stoeffler, User Interface Specialist)
- TBD
- Subclassing the TextClass
- editing of depths for DIVs
- specific searching
- specific filtering
- In the works
- Q & A
- The Future
- XML
- XSLT
- Java