MarcEdit OAI harvester changes

By reeset / On / In Uncategorized

Just a small update.  I’d run into a case where I needed to download some data being provided via OAI in a non-Library specific metadata schema.  While the MarcEdit OAI Harvester allows for the definition of your own crosswalks, it limits the types of metadata_Prefixes that can be sent via the harvester (by defining 4 common types).  I have some command-line tools that I generally use for something like this — but this time I just decided to make this work with MarcEdit.  So, the oai harvester now has the following new functionality:

  1. The drop-down box that provides the defined metadata types can be augmented by simply typing into the text box.  This way you can harvest any metadata type provided via an OAI server.  So for example:
    This would be an example of how this works.  This is an example of downloading Picture Australia metadata from one of their participating institutions.  The metadata itself is just Dublin core with two additionally defined values — but without this change, you would not have been able to harvest this particular metadata prefix.  Once the metadata prefix is set, the rest of the harvest works as before.  The Crosswalk path defines the path to an XSLT the translates the requested metadata to MARC21XML. 
  2. Ability to download the raw OAI metadata files themselves (without translating the data to MARC).  Clicking on the Advanced Settings link expands the dialog showing a new checkbox called “Harvest Raw Data (save OAI data to local file system)”:
    When this checkbox is checked, the Crosswalk Path text box behavior changes to the following:
    The text box behavior changes to expecting a file directory (to save the files) rather than a crosswalk path.  The program will save files numerically (i.e., 0.xml, 1.xml, 2.xml). 
  3. The last change to the program is cosmetic.  For those that have used this function, you will see an Advanced Settings link.  I wanted to clean up the interface a bit so that the program only shows common options by default — but still makes it easy for users to utilize advanced OAI harvesting functionality like Getting individual records by identifier, starting the process at a specific ResumptionToken as well as setting start and end (from and Until) options.  Hopefully, this will make the layout a little easier on the eyes.

The update can be found at: MarcEdit51_Setup.exe