MarcEdit Update

By reeset / On / In MarcEdit

Posted a MarcEdit update that includes the Batch Classification service.  This service utilizes OCLC’s Classify API to generate call numbers for records.  I originally noted the work on this tool while I was at code4lib here.  The original tool was pretty simple – it was created to examine a record, extract a control number, and simply query the API to see if a call number stem would be returned.  After getting some feedback and questions, I’ve made some changes, updated the code and have created a tool that looks like the following:

image

Features:

  1. Ability to add a call number to either a selected record or all records a file.
  2. Ability to select from either dewey or library of congress classifications.
  3. Ability to calculate dates and cutters.  MarcEdit will utilize the date found in the 008 and generate a cutter up to 4 characters in length for a particular call number
  4. Ability to conditionally insert items based on the presence of other call number fields.
  5. Ability to set the field that the generated call number data should be placed.
  6. Very granular control data extraction and normalization.  Data is extracted from the following fields:
    • OCLC Number: 001, 035, 776$w$z
    • LCCN:  010, 776$w$z
    • ISBN: 020$a$z
    • ISSN: 022$a$z
    • UPC: 024$a
  7. Support for multiple fields.  I.E., if an OCLC number exists in the 001, 035 and 776$w$z – the program will store and query each of these values and perform a search on each.  The same is true for lccn, isbn and issn.
  8. Supports normalization of field data.  Normalization is most important for the 020, 022 and 776 – the program normalizes the data to ensure the best chance of receiving a match through the API service.
  9. Supports the ability to follow works when multiple records are retrieved.  I.E., if an ISBN search is done that returns 4 items and 2 works, the program will determine the work with the highest number of holdings, retrieve an internal ID and resolve that data.

I imagine at some point, there will be additional functionality added to this tool, but I think that this will be a good place to start.

–TR

2 thoughts on “MarcEdit Update

  1. What’s your experience with 020$z and 022$z? “cancelled/invalid”. My experience is that this can often contain a _wrong_ ISBN/ISSN for the item in hand, that actually belongs to a _different_, and entirely unrelated, record. It’s basically just a marker of a historical error, sort of a ‘log’ line. So fetching classification on that $z might get you classification belonging to a different record.

    On the other hand, catalogers have (inexplicably to me, since it creates such a mess) started putting “alternate format” ISBN/ISSN there too, like the print ISBN on a record for an e-book. So it can _sometimes_ include an ISBN which represents the same work, and thus for which fetching classification on the $z would give you a correct answer.

    But without being able to tell if it’s an identifier for an alternate work, or an identifier for a completely unrelated “incorrect/invalid” record, I generally try to avoid fetching or matching on the $z, thinking it’s better to miss out on some matches then to get a lot of unavoidable false matches. If catalogers want this info to be useful, they have to encode it differently, not put a valid alternate format identifier in a field also used for (and officially labelled as) “incorrect/invalid” data.

    But I’m curious if you’ve had different experience, and perhaps the problems I worry about don’t happen as much in practice? (I have definitely encountered isolated specific examples of actual invalid/incorrect ISBNs creating false matches when matching on these $z’s, but haven’t tried to survey a sample).

  2. Jonathan,

    I actually had some of the same thoughts with the $z in the 020. Being someone that had the opportunity to work as a cataloger for a number of years, my initial inclination was to keep the $z out of any process used for records identification. However, as I worked on developing this process, I spoke to a number of different catalogers that would make real world use of this function and there was near universal agreement that the $z be utilized as a the match of last resort. So, that’s how it works. MarcEdit will look at all other data first and failing that, will then utilize the $z if present to attempt to find a valid match point.

    –TR