broker20: An OAI-compliant Metadata Server

Overview and Contents

broker20 is the CGI program that produces XML responses to OAI verbs as dictated by version 2.0 of the OAI protocol. Setting up broker20 will allow service providers to access and harvest metadata about your collections, for the purpose of aggregating and making this metadata, and consequently the collections, more broadly available to the public.

broker20 also produces responses to CGM verbs as dictated by the CGM Protocol, a protocol for distributed searching. This protocol was developed by the University of Michigan, Cornell University, and Göttingen University with support provided by the National Science Foundation. Working from the roots of the DIENST protocol developed at Cornell and the then-emergent OAI protocols, the project team focused on creating a new protocol--dubbed CGM, for "Cornell, Göttingen, Michigan"--that was consistent with OAI, borrowed from DIENST, and added mechanisms for full text searching. The broker20 verbs that support the CGM protocol are intended to be used only at the University of Michigan.

Contents:

OAI Verbs

Setting up broker20 involves understanding the six verbs behind the OAI protocol. To learn more about the OAI protocol, see http://www.openarchives.org/.

Identify

e.g., http://www.hti.umich.edu/cgi/b/broker20/broker20/?verb=Identify

This verb identifies the data provider (i.e., you). The response of this verb is created based on the following parameters that reside in $DLXSROOT/cgi/b/broker20/broker20.cfg, and that are configurable when the dlxs middleware is installed:

$gRepositoryID : for DLPS, the value is lib.umich.edu. Note that this must be a domain name.

$gRepositoryName : for DLPS, the value is The University of Michigan. University Library. Digital Library Production Service.

$BaseUrl : for DLPS, the value is http://www.hti.umich.edu/cgi/b/broker20/broker20

$AdminEmail : for DLPS, the value is dlps-broker@umich.edu

$EarliestDateStamp : for DLPS, the value is 2000-08-17. Enter the ealiest date stamp of the records you are making available.

$DeletedRecord : for DLPS, the value is NO. This flag indicates the manner in which the repository supports the notion of deleted records. Legitimate values are no, transient, or persistent. broker20 does not support transient or persistent deleted records at this time.

$Granularity : for DLPS, the value is YYYY-MM-DD. This is the resolution of the datestamp for your repository. The legitimate values are YYYY-MM-DD and YYYY-MM-DDThh:mm:ssZ with meanings as defined in ISO8601. The default value is the granularity used in the preparation of bib data.

$SampleID : for DLPS, the value is oai:lib.umich.edu:YC023, with YC023 being a record id from the oaiall:yeatsbib collection/set. It is a best practice to use a real record id from your repository.

ListSets

e.g., http://www.hti.umich.edu/cgi/b/broker20/broker20/?verb=ListSets

ListSets will list the sets in your repository. broker20 views sets as collections of BibClass data. These collections can be ordered into groups that have OAI access privileges (e.g., for DLPS, oaiall is one of our OAI-accessible groups that contains the majority of our BibClass collections). Create such a group using collmgr (see collmgr documentation for specific steps to do this), and set the OAI parameter for that group to be Y or y. Then select (check) the collections you want in that group, and these collections will show as sets when the ListSets verb is issued to broker20.

Because the setSpec values, which broker20 builds using the values collid and groupid, need to be alphanumeric according to the OAI protocol, all groupid and collid values need to be alphanumeric. Here at DLPS we have collids ending in "-bib". broker20 will remove the hyphens to make them OAI compliant, and when a set with bib at the end is requested using the OAI protocol, it is translated to "-bib" so we can access it internally. The ramifications of this are that we are limited to not using collids with "bib" suffixes unless they are preceded by a hyphen. For example, a BibClass collid of yeats-bib will be turned into yeatsbib by broker20, but a BibClass collid like yeatsbib should not be created.

You may add collection/set descriptions in collmgr -- in the colldescr field -- that will show up as set descriptions when the ListSets verb is issued to broker20.

Set information is used as an optional input by ListIdentifiers and ListRecords.

ListMetadataFormats

e.g., http://www.hti.umich.edu/cgi/b/broker20/broker20/?verb=ListMetadataFormats&identifier=oai:lib.umich.edu:YC023

ListMetadataFormats responds with a list of all the formats supported by broker20. Currently, it responds with: oai_dc (simple Dublin Core) used alone or with a valid identifier passed in.

ListIdentifiers

e.g., http://www.hti.umich.edu/cgi/b/broker20/broker20/?verb=ListIdentifiers&metadataPrefix=oai_dc&set=oaiall:yeatsbib

This verb will list the identifiers, i.e., the unique record locators, in the repository. If a set is not specified, it will list all the identifiers in groups that have been made OAI enabled. If a set is specified, it will list identifiers for the requested set.

GetRecord

e.g., http://www.hti.umich.edu/cgi/b/broker20/broker20/?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lib.umich.edu:YC023

GetRecord will return a single record for the identifier requested, in the metadata format requested.

ListRecords

e.g., http://www.hti.umich.edu/cgi/b/broker20/broker20/?verb=ListRecords&metadataPrefix=oai_dc&set=oaiall:yeatsbib

This verb works very much like GetRecord, but instead of returning one record, it returns a list of records based on the input parameters. This is the verb harvesters generally use to harvest your collections.

In broker20 there is a routine that converts BibClass data to Dublin Core data for this verb (and the GetRecord verb) and in the case where the BibClass data is bad (missing closing K tag, for example), the record will not be output, but an entry into the ErrorLogFor_broker20 log file in /l1/cgi/b/broker20 will be made. In the log file you will find the time the error took place, the id of the record, and a copy of the record with the problem.

OAI Sets

In order for broker20 to work, you need to create a group or groups made up of collections that you wish harvesters to gather. You do this through collmgr. Be sure to set the OAI parameter to Y or y for these groups. Most institutions will probably only create one group with the collections they feel a harvester should have access to, but there are cases in which you will want to use different schema to reflect your sets, e.g., topical, geographical, access-related. Create different groups containing different collections to reflect these schema. If this method is taken, please inform harvesters of the different group schemas.

To put collections online, you should follow the procedures to get BibClass collections online, since broker20 works against BibClass collections. Also, remember to add the collection(s) to the AUTH system giving broker20 access to them.

All searches for data are done with XPAT.

OAI is Unicode compliant. If your institution has character entity references, you will need to add them to the broker20 code with the appropriate Unicode values. You will need to add the conversions in the routine ConvertStandardCharEnt.

There is another routine, ConvertCollectionChars, that converts Latin-1 characters (x0a1 to x0ff) and a few other characters from ISO-8859-* into their Unicode equivalents. This routine is commented out as we are now providing only Unicode BibClass records in DLPS. You may need to enable conversions in this routine if your records contain any characters from ISO encoding not currently handled by broker20.

In instances where a character entity reference does not have an obvious Unicode equivalent, the character entity reference is unchanged in the output. The user interface will simply display this string.

When you complete your installation and testing of broker20 at your institution, you will want to register your broker20 implementation with the OAI website at http://www.openarchives.org/data/registerasprovider.html. This site will run your broker20 implementation through a series of tests, and once it passes the tests you will be prompted to register. This is the official place to register to let harvesters know you are available for harvesting. You can test your repository first at the Repository Explorer which allows you to see the output of your data provider without harvesting yourself.