Broker

Understanding Broker

Broker20 (we'll just refer to it as Broker) is an application that takes BibClass collection data and makes it available to be harvested by other institutions using the OAI (Open Archives Initiative) protocol.

The benefit of this is that you can share your collection resources with a wider audience. For instance, we have created and maintain OAIster, which harvests collection data from a large variety of institutions, and we make that data available to any interested end-user for searching.

OAI Verbs

The OAI protocol looks for particular pieces of information associated with the Broker as it's been installed. These "verbs" are essentially metadata about your collections and your institution, which you configure during installation of DLXS 10.

The examples show the CGI parameters that get sent to the Broker base URL.

Identify

This verb is used to retrieve information about a collection. When you install DLXS 10, some of the values you enter during configuration will be made available to the Identify verb.

Example:
http://www.hti.umich.edu/cgi/b/broker20/broker20?verb=Identify

ListMetadataFormats

This verb is used to retrieve the metadata formats available from a repository. You must make your records available in simple Dublin Core, but can include other metadata formats as you wish.

Example:
http://www.hti.umich.edu/cgi/b/broker20/broker20?verb=ListMetadataFormats

ListSets

This verb is used to retrieve the set structure of a collection, which is useful for selective harvesting. Sets can be organized in any manner you wish, e.g., subject, format, chronology.

Example:
http://www.hti.umich.edu/cgi/b/broker20/broker20?verb=ListSets

ListIdentifiers

This verb is an abbreviated form of ListRecords (see below), retrieving only headers rather than records. It requires the metadataPrefix (i.e., see the ListMetadataFormats verb) and, optionally, the set information (i.e., see the ListSets verb).

Example:
http://www.hti.umich.edu/cgi/b/broker20/broker20?verb=ListIdentifiers&metadataPrefix=oai_dc&set=oaiall:freeicbib

ListRecords

This verb lists the records from a collection. It requires the metadataPrefix (i.e., see the ListMetadataFormats verb) and, optionally, the set information (i.e., see the ListSets verb).

Example:
http://www.hti.umich.edu/cgi/b/broker20/broker20?verb=ListRecords&metadataPrefix=oai_dc&set=oaiall:freeicbib

GetRecord

This verb is used to retrieve an individual collection record. It requires the metadataPrefix (i.e., see the ListMetadataFormats verb) and the identifier (i.e., see the Identify verb).

Example:
http://www.hti.umich.edu/cgi/b/broker20/broker20?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lib.umich.edu:BAC7121.0001.001

You can use these verbs to test your collection (we'll use the term repository from now on) as it has been served up through Broker. Any errors in your configuration will show up in one of these verbs.

Installing Broker

1. There are a series of parameters that are configured during the installation of DLXS 10. These parameters live in the /$DLXSROOT/cgi/b/broker20/broker20.cfg file. If after installation you need to change any of these parameters, you can make the changes to this file directly, or you can run the install script again. The install script will remember the entries you made previously.

The parameters are the following:

2. Naturally, you'll need to have repository records that have been prepared and indexed in BibClass.

3. Verify that the /$DLXSROOT/local/apache/conf/httpd.conf file contains the following lines:

<Directory "$DLXSROOT/cgi/b/broker20">
   SetEnv AUTHZD_COLL ":samplebc:"
   SetEnv PUBLIC_COLL ":"
</Directory>

This will give Broker access to the sample BibClass collection, samplebc. Enter the value of $DLXSROOT, don't leave it as "$DLXSROOT". You will want to list all the BibClass collections as you make them available, so that Broker will have access to them.

4. Go into collmgr and create individual entries for the BibClass collections you want Broker to access. Then create a group selecting these collections and enter "Y" in the OAI field. Each collection in the group will be treated as a set, and will look like "groupid:collid" in Broker, e.g., "oaiall:freeicbib". Note that the set can only by alpha-numeric, so use alpha-numeric values for your groupids and collids. At the University of Michigan Libraries we have some collids with "-bib" in them, and for this Broker does some special processing in which it converts the "-bib" to "bib" when creating OAI responses. When a set with "bib" is requested using the OAI protocol, it is translated to "-bib" so we can access it internally. So, it's probably best if you don't create any collids with "bib" in them, as it will confuse Broker.

5. If all of these steps are done correctly Broker should work. You can test it with your browser by trying a few verbs: (these are University of Michigan specific)

http://hti.umich.edu/cgi/b/broker20/broker20?verb=Identify
http://hti.umich.edu/cgi/b/broker20/broker20?verb=ListSets
http://hti.umich.edu/cgi/b/broker20/broker20?verb=ListRecords;metadataPrefix=oai_dc

6. It's a good idea to go register your Broker in two places so that harvesters will know that you have records available.

http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai: This is an unofficial repository explorer, which is great for testing. This site will run your Broker through a series of tests, and once it passes the tests you will be prompted to register. Select "Test and Add an archive to this list".

http://www.openarchives.org/data/registerasprovider.html: This is the official web site. At the bottom of this page you will see a place where you can register your Broker. The registration folks will send you an email letting you know if your Broker passed all the tests and was registered. If broker fails any test they will let you know which ones.

Important Broker Information

  1. The routine "ConvertStandarCharEnt" converts standard character entity references to their corresponding Unicode character reference values in order to be UTF-8 compliant. For example, "&gamma;" is converted to "&#915;" for output.

  2. The routine "ConvertSpecialCharEnt" converts those character entity references which are displayed with GIF files to their corresponding Unicode character reference values.

  3. The routine "ConvertCollectionChars" converts Latin 1 characters and a couple of special math characters used in one of our own collections, since the protocol is based on UTF-8, and so characters like é are outside its scope.

  4. In the event that there are character entity references for which the conversion routines in broker cannot find the corresponding Unicode values, Broker will output the encoding for the ampersand and then the remaining string. For example, if Broker were to come accross a string like "&abc;", it will output "&#38;abc;".

  5. Broker is used at the University of Michigan Libraries to implement the CGM (Cornell, Göttingen, Michigan) protocol, so it contains code that supports the verbs used in that protocol.

  6. The routine "GetRecordFilt" converts a BibClass record to a Dublin Core record. The way this routine works is that it loops through a BibClass record looking for the tags noted below, e.g., for example <K></K>. If there is a case in which the BibClass data is bad, e.g., missing a </K>, the record will not be output, but an entry will be placed into a log file in /$DLXSROOT/cgi/b/broker20/ErrorLogFor_broker20. In the log file you will find the time the error took place, the ID of the record, and a copy of the record with the problem. You may want to create a CRON job to clean this log file periodically and to notify you if there are entries. If you run your BibClass data through an XML validator and it validates, you should never get an entry in this error log. The mapping from BibClass to Dublin Core is as follows: (and can be modified for your Broker as you see fit)

  7. BibClass Elements Dublin Core Elements
    K dc:title
    L dc:creator
    SU dc:subject
    AF dc:subject
    AA dc:description
    T dc:publisher
    M dc:contributor
    F dc:description
    YR dc:date
    X dc:rights
    URL dc:identifier
    FMT dc:format
    LANG dc:language
    TYPE dc:type
    H dc:source