Character Sets/Representation, Bibliographic Class

Last updated	2003-03-04 09:41:50 EST
Doc Title	Character Sets/Representation, Bibliographic Class
Author 1	Hagedorn, Kat
CVS Revision	$Revision: 1.6 $

At the current time, XPAT (the DLXS search engine) supports both 7- and 8-bit character sets, with no specific ties to any standard or non-standard character set. The following approaches are equally possible:

An 8-bit ISO Latin 1 approach in the data dictionary (the .dd file), mapping characters with diacritics to their unaccented form (e.g., an "a" with umlaut would be searchable as an "a").
An 8-bit ISO Latin 1 approach in the data dictionary, leaving characters with diacritics searchable in their accented form (e.g., an "a" with umlaut would be searchable only as an "a" with umlaut).
An 8-bit ISO Latin 2 approach in the data dictionary, mapping characters with diacritics to their unaccented form (e.g., an "a" with macron would be searchable as an "a").
... and so forth

The most common approach is to rely primarily on 8-bit representations of ISO Latin 1 mappings for the bulk of the characters, and character entity references for most other cases (e.g., &amacr;). A sample data dictionary using 8-bit ISO Latin 1 values is linked here as an example of this approach.

We do not yet provide filtering of these character entity set-based characters to displayable representations, but will probably use transparent GIFs as an interim strategy (as we do elsewhere in DLXS classes).

Currently, we offer no documentation here for converting characters to ISO Latin 1 values. In the interim, see the Image Class Character Set Conversion.