Findaid Class Collection Implementation

DLXS Workshop

Findaid Class Instructors: Tom Burton-West, Chris Powell

This portion of the DLXS Workshop focuses on implementing a collection in the Findaid Class. Since EAD encoding practices vary widely we will highlight issues arising from different encoding practices and how to resolve them.

More general documentation:

Morning

Overview and Introduction
Data Preparation
- Set up directories and files
- Validate, Concatenate, Normalize
Findaid Class Index Building with XPAT

Afternoon

Findaid Class Collection to Web
Linking from Finding Aids
Customizing and Troubleshooting Findaid Class

Overview

go to table of contents

Overview of Preprocessing, Data Preparation and Indexing steps:

Data Preparation

validate the files individually against the EAD 2002 DTD
make validateeach
concatenate the files into one larger XML file
make prepdocs
validate the concatenated file against the dlxsead2002 DTD:
make validate
"normalize" the concatenated file.
make norm
validate the normalized concatenated file against the dlxsead2002 DTD
make validate2

The end result of these steps is a file containing the concatenated EADs wrapped in a <COLL> element which validates against the dlxsead2002 and is ready for indexing:

<COLL>
    <ead><eadheader><eadid>1</eadid>...</eadheader>... content</ead>
    <ead><eadheader><eadid>2</eadid>...</eadheader>... content</ead>
    <ead><eadheader><eadid>3</eadid>...</eadheader>... content</ead>
</COLL>

Indexing

make singledd indexes words for EADs that have been concatenated into one large file for a collection..
make xml indexes the XML structure by reading the DTD. Validates as it indexes.
make post builds and indexes fabricated regions based on the XPAT queries stored in the workshopfa.extra.srch file.

Findaid Class Encoding Practices and Processes

go to table of contents

In Findaid Class Encoding Practices and Processes we discuss the elements and attributes required for "out of the box" Findaid Class delivery, various encoding issues, and preparing the work environment and validating the data.

EAD 2002 DTD Overview

These instructions assume that you have already encoded your finding aids files in the XML-based EAD 2002 DTD. If you have finding aids encoded using the older EAD 1.0 standard or are using the SGML version of EAD2002, you will need to convert your files to the XML version of EAD2002. If you use a conversion program such as the one supplied by the Library of Congress, make sure you read the documentation, and change the settings according to your local practices before converting a large number of EADS. For example if you use the LC converter, you probably will want to change the xsl that inserts the string "hdl:loc" in the eadid so that the output follows your local practices. When converting from SGML to XML a number of character set issues may arise. These are disccused in Data Conversion: Unicode, XML, and Normalization .

Resources for converting from EAD 1.0 to EAD2002 and/or from SGML EAD to XML EAD and good sources of information about EAD encoding practices and practical issues involved with EADs are described in the documentation wiki: EAD 2002 DTD Overview

The EAD standard was designed as a “loose” standard in order to accommodate the large variety in local practices for paper finding aids and make it easy for archives to convert from paper to electronic form. As a result, conformance with the EAD standard still allows a great deal of variety in encoding practices.

The DLXS software is primarily designed as a system for mounting University of Michigan collections. In the case of finding aids, the software has been designed to accommodate the encoding practices of the Bentley Historical Library. The more similar your data and setup is to the Bentley’s, the easier is will be to integrate your finding aids collection with DLXS. If your practices differ significantly from the Bentley’s, you will probably need to do some preprocessing of your files and/or modifications to various files in DLXS. We have found that the largest number of issues in implementing Findaid Class for member institutions is dealing with differences in encoding practices. We will cover various issues that commonly arise.

Links to more information on the Bentley's encoding practices and workflow are available Practical EAD Encoding Issues You may also want to look at Examples of Findaid Class Implementations and Practices

Some of the types of changes you may need to make to DLXS to accomodate differences are listed at: Types of changes to accomodate differing encoding practices and will be discussed later in the section on Customizing Findaid Class

Practical EAD Encoding Issues

There are a number of encoding issues that may affect the data preparation, indexing, searching, and rendering of your finding aids. A discussion of many of these can be found at Specific Encoding Issues We will discuss many of these in the section on Customization A few of the more important ones are:

Character Encoding issues
Attribute ids must be unique within the entire collection
<eadid> should be less than about 20 characters in length
If your DOCTYPE declaration contains entitys, you need to modify the appropriate *dcl files accordingly

Data Preparation

For today, we are going to be working with some EADs that are already in Findaid Class. We will be building them into a collection we are going to call workshopfa. We will be doing a number of steps one by hand one at a time. There are some scripts which automate much of this work, but we have found that when people have the experience of doing each of the steps by hand, they are better able to understand and troubleshoot if there is any problem running the automated scripts.

More details on the scripts:

This documentation will make use of the concept of the $DLXSROOT, which is the place at which your DLXS directory structure starts. We generally use /l1/, but for the workshop, we each have our own $DLXSROOT in the form of /l1/workshop/userX/dlxs/. To check your $DLXSROOT, type the following commands at the command prompt:

cd $DLXSROOT
pwd

The prep directory under $DLXSROOT is the space for you to take your encoded finding aids and "package them up" for use with the DLXS middleware. Create your basic directory $DLXSROOT/prep/w/workshopfa and its data subdirectory with the following command:

mkdir -p $DLXSROOT/prep/w/workshopfa/data

Move into the prep directory with the following command:

cd $DLXSROOT/prep/w/workshopfa

This will be your staging area for all the things you will be doing to your texts, and ultimately to your collection. At present, all it contains is the data subdirectory you created a moment ago. We will be populating it further over the course of the next two days. Unlike the contents of other collection-specific directories, everything in prep should be ultimately expendable in the production environment.

Copy the necessary files into your data directory with the following commands:

cp $DLXSROOT/prep/s/samplefa/data/*.xml $DLXSROOT/prep/w/workshopfa/data/.

We'll also need a few files to get us started working. They will need to be copied over as well, and also have paths adapted and collection identifiers changed. Follow these commands:

cp $DLXSROOT/prep/s/samplefa/samplefa.ead2002.dcl $DLXSROOT/prep/w/workshopfa/workshopfa.ead2002.dcl
cp $DLXSROOT/prep/s/samplefa/samplefa.concat.ead.dcl $DLXSROOT/prep/w/workshopfa/workshopfa.concat.ead.dcl
mkdir -p $DLXSROOT/obj/w/workshopfa
mkdir -p $DLXSROOT/bin/w/workshopfa
cp $DLXSROOT/bin/s/samplefa/preparedocs.pl $DLXSROOT/bin/w/workshopfa/preparedocs.pl
cp $DLXSROOT/bin/s/samplefa/Makefile $DLXSROOT/bin/w/workshopfa/Makefile

Now you'll need to edit the copy of the Makefile to ensure that the path matches your $DLXSROOT and that the collection name is workshopfa instead of samplefa .

You will want to change lines 1- 3 to point to your $DLXSROOT and replace s/samplefa with w workshopfa: Change:

   1  DLXSROOT=/l1
   2  NAMEPREFIX = samplefa
   3  FIRSTLETTERSUBDIR = s

To:

   1  DLXSROOT=/your/dlxsroot/here
   2  NAMEPREFIX = workshopfa
   3  FIRSTLETTERSUBDIR = w

cd $DLXSROOT/bin/w/workshopfa
vi Makefile

STOP!! Make sure you edit the Makefile before going to the next steps!!

You can run this command to check to see if you forgot to change samplefa to workshopfa:

grep "samplefa" $DLXSROOT/bin/w/workshopfa/* $DLXSROOT/prep/w/workshopfa/* |grep -v "#"

With the ready-to-go ead2002 encoded finding aids files in the data directory, we are ready to begin the preparation process. This will include:

validating the files individually against the EAD 2002 DTD
concatenating the files into one larger XML file
validating the concatenated file against the dlxsead2002 DTD
"normalizing" the concatenated file.
validating the normalized concatenated file against the dlxsead2002 DTD

These steps are generally handled via the Makefile in $DLXSROOT/bin/s/samplefa which we have copied to $DLXSROOT/bin/w/workshopfa. To see the Makefile and how it is used, click here.

Tip: Be sure not to add any space after the workshopfa or w. The Makefile ignores space immediately before and after the equals sign but treats all other space as part of the string.

Further note on editing the Makefile: If you modify or write your own Make targets, you need to make sure that a real "tab" starts each command line rather than spaces. The easiest way to check for these kinds of errors is to use "cat -vet Makefile" to show all spaces, tabs and newlines.

If you are doing this at your home institution instead of at the workshop, please refer to the more detailed instrctions on the wiki: Step by Step Instructions for setting up Directories for Data Preparation

Step 1: Validating the files individually against the EAD 2002 DTD

cd $DLXSROOT/bin/w/workshopfa
make validateeach

The Makefile runs the following command:

% $DLXSROOT/bin/f/finadaid/validateeach.sh

What's happening: The makefile is running the bourne-shell script validateeach.sh in the $DLXSROOT/bin/f/findaid directory. The script processes each *.xml file in the data directory. For each file, it creates a temporary file without the public DOCTYPE declaration, and then runs onsgmls on each of the resulting XML files in the data subdirectory to make sure they conform with the EAD 2002 DTD. If validation errors occur, error files will be in the data subdirectory with the same name as the finding aids file but with an extension of .err. If there are validation errors, fix the problems in the source XML files and re-run.

Check the error files by running the following commands

 ls -l $DLXSROOT/prep/w/workshopfa/data/*err

If there are any *err files, you can look at them with the following command:

 less  $DLXSROOT/prep/w/workshopfa/data/*err

There are not likely to be any errors with the workshopfa data, but tell the instructor if there are.

Step 2: Concatentating the files into one larger XML file (and running some preprocessing commands)

cd $DLXSROOT/bin/w/workshopfa
make prepdocs

The Makefile runs the following command:
$DLXSROOT/bin/w/workshopfa/preparedocs.pl   
  -d $DLXSROOT/prep/w/workshopfa/data    
  -o $DLXSROOT/obj/w/workshopfa/workshopfa.xml 
  -l $DLXSROOT/prep/w/workshopfa/logfile.txt
This runs the preparedocs.pl script on all the *xml files in the specified data directory and writes the output to the workshopfa.xml file in the appropriate /obj subdirectory. It also outputs a logfile to the /prep directory:

The Perl script does two sets of things:

Concatenates all the files
Runs a number of preprocessing steps on all the files

Concatenating the files

The script finds all XML files in the data subdirectory,and then strips off and xml declaration and doctype declaration from each file before concatenating them together. It also wraps the concatenated EADs in a <COLL> tag . The end result looks like:

<COLL>
    <ead><eadheader><eadid>1</eadid>...</eadheader>... content</ead>
    <ead><eadheader><eadid>2</eadid>...</eadheader>... content</ead>
    <ead><eadheader><eadid>3</eadid>...</eadheader>... content</ead>
</COLL>

WARNING! If are extra characters or some other problem with the part of the program that strips out the xml declaration and the docytype declearation the file will end up like:

<COLL>
    baddata<ead><eadheader><eadid>1</eadid>...</eadheader>... content</ead>
    baddata<ead><eadheader><eadid>2</eadid>...</eadheader>... content</ead>
    baddata<ead><eadheader><eadid>3</eadid>...</eadheader>... content</ead>
</COLL>

This will cause the document to be invalid since the dlxsead2002.dtd does not allow anything between the closing tag of one </ead> and the opening tag of the next one <ead>

Some of the possible causes of such a problem are:

UTF-8 Byte Order Marks at the beginning of the file
DOCTYPE declaration on more than one line
XML processing instructions

Preprocessing steps

The perl program also does some preprocessing on all the files. These steps are customized to the needs of the Bentley. You should look at the perl code and modify it so it is appropriate for your encoding practices.

The preprocessing steps are:

finds all id attributes and prepends a number to them
adds a prefix string "dao-bhl" to all DAO links (You probably will want to change this)
removes empty persname, corpname, and famname elements

The output of the combined concatenation and preprocessing steps will be the one collection named xml file which is deposited into the obj subdirectory.

If your collections need to be transformed in any way, or if you do not want the transformations to take place (the DAO changes, for example), edit preparedocs.pl file to effect the changes. Some changes you may want to make include:

Changing the algorithm used to make id attibute unique. For example if your encoding practices use id attributes and targets, the out-of-the-box algorithm will remove the relationship between the attributes and targets. One possible modification might be to modify the algorithm to prepend the eadid or filename to all id and target attributes.(There is sample code in preparedocs.pl.)

Concatenating files in a different order or only concatenating a subset of files

If you want to concatenate the files in a different order or only concatenate a subset of files, you can make a list of the files you wish to concatenate and put the list in a file in $DLXSROOT/prep/w/workshopfa called list_of_eads. You can then run the "make prepdocslist" command which will run the preparedocs.pl with the -i inputfilelist flag instead of the -d dir flag. This tells the program to read a list of files instead of processing all the xml files in the specified directory. The default sort order for search results is in occurance order, which translates to the order in which the eads are concatenated. If you write a script which looks at the eads for some element that you want to sort by and then outputs a list of filenames sorted by that order, you could then pass that file to preparedocs.pl so it would concatenate the files in the order listed.

For more information on options to the preparedocs.pl, run the command:

$DLXSROOT/bin/s/samplefa/preparedocs.pl --man

Step 3: Validating the concatenated file against the dlxsead2002 DTD

make validate

The Makefile runs the following command:

onsgmls -wxml -s -f $DLXSROOT/prep/w/workshopfa/workshopfa.errors 
$DLXSROOT/misc/sgml/xml.dcl   
$DLXSROOT/prep/w/workshopfa/workshopfa.concat.ead.dcl 
$DLXSROOT/obj/w/workshopfa/workshopfa.xml

This runs the onsgmls command against the concatenated file using the dlxs2002dtd, and writes any errors to the workshopfa.errors file in the appropriate subdirectory in $DLXSROOT/prep/c/collection.. More details

Note that we are running this using workshopfa.concat.ead.dcl not workshopfa.ead2002.dcl. The workshopfa.concat.ead.dcl file points to $DLXSROOT/misc/sgml/dlxsead2002.ead which is the dlxsead2002 DTD. The dlxsead2002 DTDis exactly the same as the EAD2002 DTD, but adds a wrapping element, <COLL>, to be able to combine more than one ead element, more than one finding aid, into one file. The larger file will be indexed with XPAT tomorrow. It is, of course, a good idea to validate the file now before going further.

Check for errors by looking for the file $DLXSROOT/prep/w/workshopfa/workshopfa.errors which will be present and contain messages about what caused the file to be considered invalid if there are errors.

If you see errors at this point (assuming there were no errors during the validateeach step), there was a problem with the preparedocs.pl processing.

Run the following command

 ls -l $DLXSROOT/prep/w/workshopfa/workshopfa.errors

If there is a workshopfa.errors file then run the following command to look at the errors reported

 less $DLXSROOT/prep/w/workshopfa/workshopfa.errors

If you see this warning in the errors file:

  onsgmls:/l1/dev/tburtonw/misc/sgml/xml.dcl:1:W: SGML declaration was not  implied

You can ignore it, but if you see any other errors STOP! You need to determine the cause of the problem, fix it, and rerun the steps until there are no errors from make validate. If you continue with the next steps in the process with an invalid xml document, the errors will compound and it will be very difficult to trace the cause of the problem.

Step 4: Normalizing the concatenated file

make norm

The Makefile runs a series of copy statements and two main commands:

 1.)   /l/local/bin/osgmlnorm -f $DLXSROOT/prep/w/workshopfa/workshopfa.osgmlnorm.errors 
       $DLXSROOT/misc/sgml/xml.dcl 
       $DLXSROOT/prep/w/workshopfa/workshopfa.concat.ead.dcl 
       $DLXSROOT/obj/w/workshopfa/workshopfa.xml.prenorm > $DLXSROOT/obj/w/workshopfa/workshopfa.xml.postnorm

 2.)  /l/local/bin/osx -E0 -bUTF-8 -xlower -xempty -xno-nl-in-tag 
      -f $DLXSROOT/prep/w/workshopfa/workshopfa.osx.errors 
      $DLXSROOT/misc/sgml/xml.dcl 
      $DLXSROOT/prep/w/workshopfa/workshopfa.concat.ead.dcl 
      $DLXSROOT/obj/w/workshopfa/workshopfa.xml.postnorm > $DLXSROOT/obj/w/workshopfa/workshopfa.xml.postnorm.osx

These commands ensure that your collection data is normalized. What this means is that any attributes are put in the order in which they were defined in the DTD. Even though your collection data is XML and attribute order should be irrelevant (according to the XML specification), due to a bug in one of the supporting libraries used by xmlrgn (part of the indexing software), attributes must appear in the order that they are defined in the DTD. If you have "out-of-order" attributes and don't run make norm, you will get "invalid endpoints" errors during the make post step.

Step one, which normalizes the document writes its errors to $DLXSROOT/prep/w/workshopfa/workshopfa.osgmlnorm.errors. Be sure to check this file.

 less $DLXSROOT/prep/w/workshopfa/workshopfa.osgmlnorm.errors

Step 2, which runs osx to convert the normalized document back into XML produces lots of error messages which are written to

 $DLXSROOT/prep/w/workshopfa/workshopfa.osx.errors.

These will also result in the following message on standard output:

  make: [norm] Error 1 (ignored)

These errors are caused because we are using an XML DTD (the EAD 2002 DTD) and osx is using it to validate against the SGML document created by the osgmlnorm step. These are the only errors which may generally be ignored. However, if the next recommended step, which is to run "make validate" again reveals an invalid document, you may want to rerun osx and look at the errors for clues. (Only do this if you are sure that the problem is not being caused by XML processing instructions in the documents as explained below)

Step 5: Validating the normalized file against the dlxsead2002 DTD

  make validate2

Check the resulting error file:

  less $DLXSROOT/prep/w/workshopfa/workshopfa.errors2

We run this step again to make sure that the normalization process did not produce an invalid document. This is necessary because under some circumstances the "make norm" step can result in invalid XML. One known cause of this is the presense of XML processing instructions. For example:"<?Pub Caret1?>". Although XML processing instructions are supposed to be ignored by any XML application that does not understand them, the problem is that when we use sgmlnorm and osx, which are SGML tools, they end up munging the output XML. The preparedocs.pl script used in the "make prepdocs" step should have removed any XML processing instructions.

If this second make validate step fails, but the "make validate" step before "make norm" succeeded, there is some kind of a problem with the normalization process. You may want to start over by running "make clean" and then going through steps 1-4 again. If that doesn't solve the problem you may want to check your EADs to make sure they do not have XML processing instructions and if they don't, you will then need to look at the error messages from the second make validate.

Findaid Class Index Building with XPAT

go to table of contents

In this section the workshopfa XML will be indexed with the XPAT search engine, preparing it for use with the DLXS middleware.

Set Up Directories and Files for XPAT Indexing

First, we need to create the rest of the directories in the workshopfa environment with the following commands:

mkdir -p $DLXSROOT/idx/w/workshopfa

The bin directory we created yesterday holds any scripts or tools used for the collection specifically; obj ( created earlier) holds the "object" or XML file for the collection, and idx holds the XPAT indexes. Now we need to finish populating the directories.



cp $DLXSROOT/prep/s/samplefa/samplefa.blank.dd $DLXSROOT/prep/w/workshopfa/workshopfa.blank.dd

cp $DLXSROOT/prep/s/samplefa/samplefa.extra.srch $DLXSROOT/prep/w/workshopfa/workshopfa.extra.srch

Each of these files need to be edited to reflect the new collection name and the paths to your particular directories. This will be true when you use these at your home institution as well, even if you use the same directory architecture as we do, because they will always need to reflect the unique name of each collection.

(The following commands will change samplefa to workshopfa but if you are not at the workshop, you may also have to change $DLXSROOT)

cd $DLXSROOT/prep/w/workshopfa
vi workshopfa.blank.dd
{esc}
:%s,s/samplefa,w/workshopfa,
:%s,samplefa,workshopfa,
:wq

cd $DLXSROOT/prep/w/workshopfa
vi workshopfa.extra.srch
{esc}
:%s,s/samplefa,w/workshopfa,
:%s,samplefa,workshopfa,

Failure to change even one file can result in puzzling errors, because the scripts are working, just not necessarily in the directories you are looking at.

grep -l "samplefa" $DLXSROOT/prep/w/workshopfa/*

will check for changing s/samplefa to w/workshopfa. If you are at the workshop that should be all you need. However if you are doing this at your home institution you need to replace "/l1/" by whatever $DLXSROOT is on your server. If you don't have an /l1 directory on your server (which is very likely if you are not here using a DLPS machine) you can check with:

grep -l "l1" $DLXSROOT/prep/w/workshopfa/*

These steps: creating the directories and modifying the files to use workshopfa instead of samplefa can be done using the $DLXSROOT/bin/f/findaid/setup_newcoll script. See setup_newcoll manpage for more information.

Build the XPAT Index

Everything is now set up to build the XPAT index. The Makefile in the bin directory contains the commands necessary to build the index, and can be executed easily.

To create an index for use with the Findaid Class interface, you will need to index the words in the collection, then index the XML (the structural metadata, if you will), and then finally "fabricate" structures based on a combination of elements (for example, defining who the "main author" of a finding aid is, without adding a <mainauthor> tag around the appropriate <author> in the eadheader element). The following commands can be used to make the index:

make singledd indexes words for EADs that have been concatenated into on large file for a collection.

make xml indexes the XML structure by reading the DTD. Validates as it indexes.

make post builds and indexes fabricated regions based on the XPAT queries stored in the workshopfa.extra.srch file.

cd $DLXSROOT/bin/w/workshopfa
make singledd

The Makefile runs the following commands:

  cp $DLXSROOT/prep/w/workshopfa/workshopfa.blank.dd $DLXSROOT/idx/w/workshopfa/workshopfa.dd
  /l/local/xpat/bin/xpatbld -m 256m -D $DLXSROOT/idx/w/workshopfa/workshopfa.dd
  cp $DLXSROOT/idx/w/workshopfa/workshopfa.dd 	$DLXSROOT/prep/w/workshopfa/workshopfa.presgml.dd

make xml

The Makefile runs the following commands:

  cp $DLXSROOT/prep/w/workshopfa/workshopfa.presgml.dd 	$DLXSROOT/idx/w/workshopfa/workshopfa.dd
  /l/local/xpat/bin/xmlrgn -D 
    $DLXSROOT/idx/w/workshopfa/workshopfa.dd
    $DLXSROOT/misc/sgml/xml.dcl
    $DLXSROOT/prep/w/workshopfa/workshopfa.concat.ead.dcl 
    $DLXSROOT/obj/w/workshopfa/workshopfa.xml
 
 cp $DLXSROOT/idx/w/workshopfa/workshopfa.dd $DLXSROOT/idx/w/workshopfa/workshopfa.prepost.dd

After running this step, if you wish, you can see the indexed regions by issuing the following commands:

xpatu $DLXSROOT/idx/w/workshopfa/workshopfa.dd
>> {ddinfo regionnames}
>> quit

make post

The Makefile runs the following commands:

cp $DLXSROOT/prep/w/workshopfa/workshopfa.prepost.dd  $DLXSROOT/idx/w/workshopfa/workshopfa.dd
touch $DLXSROOT/idx/w/workshopfa/workshopfa.init
/l/local/xpat/bin/xpat -q 
        $DLXSROOT/idx/w/workshopfa/workshopfa.dd
        < $DLXSROOT/prep/w/workshopfa/workshopfa.extra.srch
        | $DLXSROOT/bin/t/text/output.dd.frag.pl
        $DLXSROOT/idx/w/workshopfa/
        > $DLXSROOT/prep/w/workshopfa/workshopfa.extra.dd

$DLXSROOT/bin/t/text/inc.extra.dd.pl
        $DLXSROOT/prep/w/workshopfa/workshopfa.extra.dd
        $DLXSROOT/idx/w/workshopfa/workshopfa.dd

You should not see any errors at the workshop. If you do, please let the instructor know. If "make post" produces any errors, you need to fix them before moving on. Assuming there were no errors in the previous steps, the most likely cause of errors are either errors in your "*.extra.srch" file or your EADs may not have an element that is in the "*.extra.srch"

If you get an "invalid endpoints" message from "make post", the most likely cause is XML processing instructions or some other corruption. The second "make validate" step should have caught these. Other possible causes of errors during the "make post" step include syntax errors in workshopfa.extra.srch, or the absense of a particular region that is listed in the *.extra.srch file but not present in your collection. For example if you don't have a <famname> in any of the EADs in your collection you would get this error:

Error found:
No information for region famname in the data dictionary.

To fix this you would have to edit the

*.extra.srch file.

For more information see:

Configuring fabricated regions
Indexing fabricated regions
Working with Fabricated regions

Testing the index

At this point it is a good idea to do some testing of the newly created index. Invoke xpat with the following commands

xpatu $DLXSROOT/idx/w/workshopfa/workshopfa.dd

>> region "ead"
  1: 3 matches

>> region "eadheader"
  2: 3 matches

>> region "mainauthor"
  3: 3 matches

>> region "maintitle"
  4: 3 matches

>> region "admininfo"
  5: 3 matches

Fabricated Regions in FindaidClass

The make post step and the testing steps above leads us into a discussion of the use of fabricated regions in FindaidClass. Findaid Class uses the workshopfa.extra.srch file to add fabricated regions to the XPAT index.

"Fabricated" is a term we use to describe what are essentially virtual regions in an XPat indexed text. See a basic description of what a fabricated region is and how they are created.

In Finding Aids, we use fabricated regions for certain uninteresting regions simply so that some code can be shared. For example, the fabricated region "main" is set to refer to <ead> in FindaidClass with:

(region ead); {exportfile "/l1/idx/b/bhlead/main.rgn"}; export; ~sync "main";

whereas in TextClass "main" can refer to <TEXT>. Therefore, both FindaidClass and TextClass can share the Perl code, in a higher level subclass, that creates searches for "main".

Findaid Class uses fabricated regions for several purposes

To share code with Text Class (e.g. region "main")
Fabricated regions for searching (e.g. region "names")
Fabricated regions to produce the Table of Contents and to implement display of EAD sections as focused regions such as the "Title Page" or "Arrangement" ( See Working with the table of contents and TOC image for more information on the use of fabricated regions for the table of contents.)
Other regions specifically used in a PI (region "maintitle" is used by the PI used to display the title of a finding aid at the top of each page)

The majority of the fabricated regions for Findaid Class are used for the creation and display of the left hand table of contents in the "outline" view. The fabricated regions are used so XPAT can have binary indexes ready to use for fast retrieval of these EAD sections when the user clicks on an entry in the table of contents.

A number of issues related to varying encoding practices can be resolved by the appropriate edits to the *.extra.srch file. (Although some of them may require changes to other files as well)

If your <unititle> element precedes your <origination> element in the top level <did>, you will have to modify the "maintitle" fabricated region query in *.extra.srch
If you do not use a <frontmatter> element, you will have to make modifications to various files including modifying *.extra.srch to provide an appropriate "Title Page" region based on the <eadheader>
If your encoding practices for <biohist> differ from the Bentley's, you may need to make changes in the <bioghist> fabricated region although changes to other files may be suffient. The changes might include: modifying findaidclass.cfg or creating a subclass of FindaidClass and override FindaidClass:: GetBioghistTocHead, and/or changing the appropriate XSL files.
If you want sections of the finding aid that are not completely within a well-defined element such as <relatedmaterial>or <separatedmaterial> to show up in the table of contents, you may have to create a fabricated region using the appropriate xpat query and then modify findaidclass.cfg and make other modifications to the code.

Working with Fabricated regions

Customizing Findaid Class

Findaid Class Collection to Web

go to table of contents

These are the final steps in deploying an Findaid Class collection online. Here the Collection Manager will be used to review the Collection Database entry for workshopfa and configure it for browse building. The Collection Manager will also be used to check the Group Database. Finally, we need to work with the collection map and the set up the collection's web directory.

Review the Collection Database Entry with CollMgr

Each collection has a record in the collection database that holds collection specific configurations for the middleware. CollMgr (Collection Manager) is a web based interface to the collection database that provides functionality for editing each collection's record. Collections can be checked-out for editing, checked-in for testing, and released to production. In general, a new collection needs to have a CollMgr record created from scratch before the middleware can be used. If you are starting with the samplefa collmgr as a model make sure to change references from s/samplefa to w/workshopfa or whatever you are using for your collection name.

More Documentation

Collection Manager Field Descriptions

Configure the Collection for Dynamic Browsing Using CollMgr

Dynamic browsing is a feature available since DLXS release 12. Adding dynamic browsing to a collection is a matter of simple configuration in CollMgr and then running a script on the command line to populate the browse tables with data to facilitate browsing.

Collmgr field: browseable

To enable browsing, the browseable field must be set to "yes".

Collmgr field: browsenav

The browsenav field must have a value of 0, 1 or 2. Small collections should use 0. Medium collections 1. Large collections 2. This is the number of layers of browse tabs you want for the collection. 0 means that all the items are on one page -- no tabs. 1 means you have one layer of tabs with the letters of the alphabet, and 2 means you have two layers of tabs -- one for a letter, and another for the two-letter subdivisions under it.

Collmgr field: browsefields

browsefields holds the list of fields you would like to be browseable. This list is used to prepare the data for browsing, and also to present browsing options to the user. Currently, author and title are the canonical Findaid Class browse fields. You will need fabricated regions of mainauthor (as appropriate) and maintitle to support browsing.

Now that we are finished updating CollMgr, we need to release our collection to production.

With the above fields properly configured and CollMgr released, the updatebrowsedb.pl script can be run. It populates the ItemColl, ItemBrowse and ItemBrowseCounts tables with information from the collection's data dictionary. You should use the "wrapper" shell script provided in the same subdirectory, ub .

cd $DLXSROOT/bin/browse

./ub -C findaid  -c workshopfa

Review the Groups Database Entry with CollMgr

Another function of CollMgr allows the grouping of collections for cross-collection searching. Any number of collection groups may be created for Findaid Class. Findaid Class supports a group with the groupid "all". It is not a requirement that all collections be in this group, though that's the basic idea. Groups are created and modified using CollMgr.

http://username.ws.umdl.umich.edu/cgi/c/collmgr/collmgr

We won't be doing anything with much with groups, but you can add your workshopfa to the Sample group, if you'd like. Be sure to release the group to production if you want any changes to be available in your interface.

Make Collection Map

Collection mapper files exist to identify the regions and operators used by the middleware when interacting with the search forms. Each collection will need one, but most collections can use a fairly standard map file, such as the one in the samplefa collection. The map files for all Findaid Class collections are stored in $DLXSROOT/misc/f/findaid/maps

Map files take language that is used in the forms and translates it into language for the cgi and for XPAT. For example, if you want your users to be able to search within names, you would need to add a mapping for how you want it to appear in the search interface (case is important, as is pluralization!), how the cgi variable would be set (usually all caps, and not stepping on an existing variable), and how XPAT will identify and retrieve this natively (in XPAT search language).

The first part of the map file is operator mapping, for the form, the cgi, and XPAT. The second part is for region mapping, as in the example above.

cd $DLXSROOT/misc/f/findaid/maps

cp samplefa.map workshopfa.map

You might note that some of the fields that are defined in the map file correspond to some of the fabricated regions.

Set Up the Collection's Web Directory

Each collection may have a web directory with custom Cascading Style Sheets, interface templates, graphics, and javascript. The default is for a collection to use the web templates at $DLXSROOT/web/f/findaid. Of course, collection specific templates and other files can be placed in a collection specific web directory, and it is necessary if you have any customization at all. DLXS Middleware uses fallback to find HTML related templates, chunks, graphics, js and css files.

For a minimal collection, you will want two files: index.html and FindaidClass-specific.css.

mkdir -p $DLXSROOT/web/w/workshopfa
cp $DLXSROOT/web/s/samplefa/index.html $DLXSROOT/web/w/workshopfa/index.html
cp $DLXSROOT/web/s/samplefa/findaidclass-specific.css $DLXSROOT/web/w/workshopfa/findaidclass-specific.css

As always, we'll need to change the collection name and paths. You might want to change the look radically, if your HTML skills are up to it.

Note that the browse link on the index.html page is hard-coded to go to the samplefa hard-coded browse.html page. You may want to change this to point to a dynamic browse page (see below). The url for the dynamic browse page is ".../cgi/f/findaid/findaid-idx?c=workshopfa;page=browse".

If you would prefer a dynamic home page, you can copy and modify the home.xml and home.xsl files from $DLXSROOT/web/f/findaid/. Note that they are currently set up to be the home page for all finding aids collections, so you will have to do some considerable editing. However they contain a number of PIs that you may find useful. In order to have these pages actually be used by DLXS, they have to be present in your $DLXSROOT/web/w/workshopfa/ directory and there can't be an index.html page in that directory. The easiest thing to do, if you have an existing index.html page is to rename it to "index.html.foobar" or something.

Try It Out

http://username.ws.umdl.umich.edu/cgi/f/findaid/findaid-idx

Linking from Finding Aids

go to table of contents

There are a number of reasons you might like to link from a finding aid to another website. Perhaps you have digitized artifacts you'd like to link from DAOs. Maybe your Additional Descriptive Data contains a bibliography, and you'd like to link to published volumes in your OPAC. There might be another website out there that covers your topic well and you just want to create an external pointer to it. How do you do this?

Linking from Finding Aids Using HREF Attributes

Findaid Class is coded so that if DaoResolution is enabled and there is an href attribute to the <dao> element, it will check to see if it contains the string "http". If it does, FindaidClass will create a link based on the content of the href attribute of the <dao>. For example, the Archives of Michigan have DAOs in this form:

<dao linktype="simple" href="http://haldigitalcollections.cdmhost.com/u?/p4006coll15,18455" 
show="new" actuate="onrequest"><daodesc><p>[view image]</p></daodesc></dao>

That displays like this in the container list:

Records of the Michigan Military Establishment

If you wish to link from an element other than a <dao>, you'll have a little bit of work to do, depending on which element will contain the link, and what the value of the show attribute might be. The Clements Manuscript Division often has extenisve bibliographies as Additional Descriptive Data, and they wanted links from the titles to their entries in Mirlyn.

Lydia Maria Child papers

Here is an example of their encoding:

<title render="italic" linktype="simple" 
href="http://mirlyn.lib.umich.edu:80/F/?func=direct&amp;doc_number=004618335&amp;local_base=AA_PUB" 
show="replace" actuate="onrequest">The family nurse; or Companion of The frugal housewife. By Mrs. Child... 
Revised by a member of the Massachusetts medical society</title>

In order to make this work as they wished, we had to adapt the way that title handling was done in the XSLT. In this case, it is in the text.components.xsl file, which had a copy made into the collection web directory with this style:

 <xsl:template match="title[@linktype='simple']">
    <xsl:choose>
      <xsl:when test="@href">
        <xsl:element name="a">
          <xsl:attribute name="href">
            <xsl:value-of select="@href"/>
          </xsl:attribute>
          <xsl:if test="@render = 'italic'">
            <xsl:attribute name="class">
              <xsl:text>title</xsl:text>
            </xsl:attribute>
          </xsl:if>
          <xsl:if test="@show = 'new'">
            <xsl:attribute name="target">
              <xsl:text>linkWindow</xsl:text>
            </xsl:attribute>
          </xsl:if>
          <xsl:value-of select="."/>
        </xsl:element>      
      </xsl:when>
<xsl:otherwise>
   <xsl:element name="i"><xsl:value-of select="."/>
        </xsl:element></xsl:otherwise>
    </xsl:choose>
  </xsl:template>

Linking from Finding Aids Using the ID Resolver

If there is no "http" string in the href attribute of a <dao>, FindaidClass assumes that the href attribute is actully an id and will look up that id in in the idresolver and build a link if it finds the ID in the IDRESOLVER table. The method FilterAllDaos_XML in $DLXSROOT/cgi/f/findaid/FindaidClass.pm can be overridden per collection if different behavior is needed.

If you decide to use this feature, you will want to modify the preprocessing script preparedocs.pl which out-of-the-box inserts the string 'dao-bhl-' after the href. Below is an example of a Bentley <dao> where the id number is 91153-1.

<dao linktype="simple" href="91153-1" show="new" actuate="onrequest">
         <daodesc>
          <p>[view selected images]</p>
         </daodesc>
        </dao>

The preparedocs.pl program would change this to:

<dao linktype="simple" href="dao-bhl-91153-1" show="new" actuate="onrequest">
         <daodesc>
          <p>[view selected images]</p>
         </daodesc>
        </dao>

The ID resolver would look up the id "dao-bhl-91153-1" and replace it with the appropriate URL.

<dao linktype="simple" href="http://images.umdl.umich.edu/cgi/i/image/image-idx?q1=91153-1;rgn1=bhl_href;op2=And;q2=;rgn2=bhl_al;type=boolean;med=1;view=thumbnail;c=bhl" show="new" actuate="onrequest">
         <daodesc>
          <p>[view selected images]</p>
         </daodesc>
        </dao>

In order to make use of ID resolution in Findaid Class :

$gEnableDaoResolution must be set to 1 in $DLXSROOT/cgi/f/findaid/findaidclass.cfg
You probably want to remove the processing in preparedocs.pl that adds the"dao-bhl-" prefix to DAOs
The Id resolver must be set up as detailed in the link below

ID Resolver Data Transformation and Deployment

The ID Resolver is a CGI that takes as input a unique identifier and returns a URI. It is used, for example, by Harper's Weekly to link the text pages in Text Class middleware to the image pages in the Image Class middleware, and vice versa.

Plug something like the following in to your web browser and you should get something back. If you choose to test middleware on a development machine that uses the id resolver, make sure that the middleware on that machine is calling the resolver on the machine with the data, and not the resolver on the production server.

http://clamato.hti.umich.edu/cgi/i/idresolver/idresolver?id=dao-bhl-bl000684
which should yield...
<ITEM MTIME="20030728142225"><ID>dao-bhl-bl000684 </ID><URI>http://images.umdl.umich.edu/cgi/i/image/image-idx?&q1=bl000684&rgn1=bhl_href&type=boolean&med=1&view=thumbnail&c=bhl </URI></ITEM>

Information on how to set up the ID resolver

Customizing and Troubleshooting Findaid Class

Outline

Overview
Types of changes to accomodate differing encoding practices and/or interface changes
Specific encoding issues
Demonstration of customizing and troubleshooting
Modifying the Table of Contents
General troubleshooting techniques

go to table of contents

Overview

The EAD standard was designed as a "loose" standard in order to accommodate the large variety in local practices for paper finding aids and make it easy for archives to convert from paper to electronic form. As a result, conformance with the EAD standard still allows a great deal of variety in encoding practices.

Most of the questions we get from sites implementing FindaidClass for the first time involve dealing with encoding practices that are different than the Bentley's.

We will first look at a number of issues involved in data preparation, then we will look at making changed to the Table of Contents. The modifications to the Table of Contents will involve a number of useful techniques such as

Creating custom fabricated regions
Subclassing Findaid Class to create collection specific behavior
Creating collection specific XSL templates

Types of changes to accomodate differing encoding practices and/or interface changes

Custom preprocessing
Add dummy EAD to data
Modify prep scripts (Makefile, preparedocs.pl, validateeach.sh)
Modify *dcl files (DOCTYPE declarations and entities)
Modify fabricated regions (*.extra.srch)
Modify CollMgr entries
Modify findaidclass.cfg (change table of contents sections)
Subclass FindaidClass.pm
Modify XSL
Modify XML templates
Modify CSS

Specific Encoding Issues

There are a number of encoding issues that may affect the data preparation, indexing, searching, and rendering of your finding aids. Some of them are:

Preprocessing and Data Prep issues

<eadid> should be less than about 20 characters in length
Attribute ids must be unique within the entire collection
If you use attribute ids and corresponding targets within your EADs preparedocs.pl may need to be modified.
Character Encoding issues
UTF-8 Byte Order Marks (BOM) should be removed from EADs prior to concatenation
XML processing instructions should be removed from EADs prior to concatenation
If your DOCTYPE declaration contains entities, you need to modify the appropriate *dcl files accordingly, or you may want to remove them and and any references to them and instead use various DLXS functions to replace the functionality previously provided by entityrefs. (See $DLXSROOT/prep/s/samplefa/samplefa.ead2002.entity.example.dcl for an example )
Out-of-the-box <dao> handling may need to be modified for your needs

Fabricated region issues (some of these involve XSL as well)

If your <unititle> element precedes your <origination> element in the top level <did>, you will have to modify the maintitle fabricated region query in *.extra.srch See Troubleshooting:Title of Finding Aid does not show up
If you do not use a <frontmatter> element, you will either have to either a) create and populate frontmatter elements in your EADs manually, or b) run your EADs through some preprocessing XSL to create and populate frontmatter elements, or c) you will have to create a fabricated region to provide an appropriate "Title Page" region based on the <eadheader> and you may also need to change the XSL and/or subclass FindaidClass to change the code that handles the Title Page region.

Table of Contents and Focus Region issues

If you do not use a <frontmatter> element you may have to make the changes mentioned above to get the title page to show in the table of contents and when the user clicks on the "Title Page" link in the table of contents
If your encoding practices for <biohist> differ from the Bentley's, you may need to make changes in findaidclass.cfg or create a subclass of FindaidClass and override FindaidClass:: GetBioghistTocHead, and/or change the appropriate XSL files. See: Changing the Bioghist labels
If you want <relatedmaterial> and/or <separatedmaterial> to show up in the table of contents (TOC) on the left hand side of the Finding Aids, you may have to modify findaidclass.cfg and make other modifications to the code. This also applies if there are other sections of the finding aid not listed in the out-of-the-box findaidclass.cfg %gSectHeadsHash. See

XSL issues

If you have encoded <unitdate>s as siblings of <unittitle>s, you may have to modify the appropriate XSL templates.
If you want the middleware to use the <head> element for labeling sections instead of the default hard-coded values in findaidclass.cfg, you may need to change fabricated regions and/or make changes to the XSL and/or possibly modify findaidclass.cfg or subclass FindaidClass.

Demonstration of customizing and troubleshooting techniques

Data Prep issues

No information for region "foo" in the data dictionary
Title doesn't show up
No information for region c05 in the data dictionary

Modifying the Table of Contents

Changing labels in TOC (See Wiki for details)
Adding related and separated material

Changing Biohist labels to use the <head> element

In this default implementation you can see that the middleware is labeling the bioghist "Biography", despite the much more descriptive text in the head element (see below)

<bioghist>
     <head>
          Biographies of people featured in Soviet photographs
     </head>
 ....
</bioghist>

Once the changes have been made the text in the first head element in the bioghist will display bioghist with label coming from the head element

More information on changing bioghist

Making changes for all Collections vs changes per collection (findaidclass.cfg vs subclass)
Subclassing FindaidClass

More Information

Collection specific XSL

More Information

General troubleshooting techniques

running extra.srch queries in xpat
debugging switches
xsltwrite and oxygen
perl debugger