XML & XSL in DLXS

 


General principles and rules of thumb

In re-architecting the DLXS system to use XML and XSL, certain principles were established. These guide the writing of code, the creation of XML, the writing of XSL in most cases. Only when there are good reasons to break the rules, have we done so. Here is a list of some of them and their respective rationales:

Principle
Rationale
PIs are wrapped in the XML files with an XML element The XSL templates, at least at the highest levels of the XML tree, can expect tags that are explicit in the XML file.
XSL files to be used are listed in the XML file rather than using the more conventional <?xml-stylesheet ?> PI The building of a virtual stylesheet from the listed XSL files, allows those XSL files to be arrived at through fallback
All URLs to be used by the CGI are built by the CGI The XSL stylesheets should not have to "know" anything about what URL parameters are needed for the CGI to work
Cookie rather than URL parameter holds session IDs This allows for dynamic browsing (no longer need .tpl files to be built for browsing); allows one session even if a user switches from one class to another; cookie can be deleted with browser is quit; etc.
Filtering of XML data is done by the XSL stylesheets Separation of content and display, of perl code and user interface.

Most of the principles are really about division of labor between the different subsystems of DLXS.

There are rare exceptions to some of these principles. For example, data that comes from XPat is not always well-formed and needs to be "massaged" into well-formed-ness. Here are several instances where this occurs:

Some notes about internal changes

Quite a lot of changes in the middleware and supporting systems were necessary in the move to XML. Some examples:

Working with XML and XSL in DLXS

Working with an XML file that has not had its dynamic content filled in yet or a virtual XSL stylesheet can be difficult. That is where two new debug values come in handy. To see the XML file, with all its dynamic content filled in, that the CGI will send to be processed by the XSLT transformation engine, add debug=xml the URL. The CGI will fill in all the PIs and then send the untransformed block of XML to the browser. If you use a browser that can display untransformed XML, you will see the content and the form of the XML data.

Since the virtual stylesheet is created only at run time by importing a number of XSLT files, each of which is likely to have been gotten via fallback, the best way to see the full paths of the files which are being imported and used is to add debug=xslt to the URL. The contents of the virtual stylesheet will be delivered to the browser as XML. If need be, you can copy the source and paste it into an XSLT processor or debugger.

Here at DLPS, we use a variety of tools for editing and debugging XML and XSL, everything from Oxygen to Dreamweaver to xemacs to Saxon to XMLSpy. The debug switches allow us to get at what's inside the CGI code as it runs.

Data migration issues

Data conversion

See Data Conversion to XML / Unicode.

Writing your own XSL to customize your collections' look and feel

There is no easy way to convert the HTML templates from previous releases of DLXS into XSL. Your best course of action will likely be to start with the XSL files that are delivered with Release 12. Then, for any collection that needs a specfic look, behavior, or filtering into HTML, copy, place in the collection subdirectory, and modify only those XSL files or templates that you need to modify.

collmgr entries

The dlxs database has been changed to store and retrieve UTF-8 encoded strings in its tables. Therefore, when entering data into collmgr fields, be sure to enter proper UTF-8. This can include the actual character, if you have the ability to enter things directly through an alternate keyboard mapping; a hexadecimal entity; or a decimal entity (though hex may be the best choice).

Implications for users

You may receive a variety of questions from users, some of which may be difficult to track down. We are still learning ourselves about the vagaries of how different platforms, browsers, etc. handle UTF-8. For example, if a user copies Latin 1 data from a web page or application and pastes it into a UTF-8 web form on a DLXS search page, will their applications and operating system properly transcode the pasted data into UTF-8? Will the data that is received by the CGI be UTF-8 or will it be improperly encoded? Will older browsers or operating systems have the fonts available to display the languages you can now index and include in your collections? We've also learned that different browsers handle character interpretation issues differently, with everything from the classic Mac empty box to a triangle with a question mark to offering up what "seems" to be the right choice (inevitably Chinese). The move to cookies for tracking sessions also means that the problems with users who refuse all cookies are now an issue for DLXS, as it has been for other products in the library world for some time.