Posted a MarcEdit update that includes the Batch Classification service. This service utilizes OCLC’s Classify API to generate call numbers for records. I originally noted the work on this tool while I was at code4lib here. The original tool was pretty simple — it was created to examine a record, extract a control number, and simply query the API to see if a call number stem would be returned. After getting some feedback and questions, I’ve made some changes, updated the code and have created a tool that looks like the following:
Features:
- Ability to add a call number to either a selected record or all records a file.
- Ability to select from either dewey or library of congress classifications.
- Ability to calculate dates and cutters. MarcEdit will utilize the date found in the 008 and generate a cutter up to 4 characters in length for a particular call number
- Ability to conditionally insert items based on the presence of other call number fields.
- Ability to set the field that the generated call number data should be placed.
- Very granular control data extraction and normalization. Data is extracted from the following fields:
- OCLC Number: 001, 035, 776$w$z
- LCCN: 010, 776$w$z
- ISBN: 020$a$z
- ISSN: 022$a$z
- UPC: 024$a
- Support for multiple fields. I.E., if an OCLC number exists in the 001, 035 and 776$w$z — the program will store and query each of these values and perform a search on each. The same is true for lccn, isbn and issn.
- Supports normalization of field data. Normalization is most important for the 020, 022 and 776 — the program normalizes the data to ensure the best chance of receiving a match through the API service.
- Supports the ability to follow works when multiple records are retrieved. I.E., if an ISBN search is done that returns 4 items and 2 works, the program will determine the work with the highest number of holdings, retrieve an internal ID and resolve that data.
I imagine at some point, there will be additional functionality added to this tool, but I think that this will be a good place to start.
–TR
Comments
2 responses to “MarcEdit Update”
What’s your experience with 020$z and 022$z? “cancelled/invalid”. My experience is that this can often contain a _wrong_ ISBN/ISSN for the item in hand, that actually belongs to a _different_, and entirely unrelated, record. It’s basically just a marker of a historical error, sort of a ‘log’ line. So fetching classification on that $z might get you classification belonging to a different record.
On the other hand, catalogers have (inexplicably to me, since it creates such a mess) started putting “alternate format” ISBN/ISSN there too, like the print ISBN on a record for an e-book. So it can _sometimes_ include an ISBN which represents the same work, and thus for which fetching classification on the $z would give you a correct answer.
But without being able to tell if it’s an identifier for an alternate work, or an identifier for a completely unrelated “incorrect/invalid” record, I generally try to avoid fetching or matching on the $z, thinking it’s better to miss out on some matches then to get a lot of unavoidable false matches. If catalogers want this info to be useful, they have to encode it differently, not put a valid alternate format identifier in a field also used for (and officially labelled as) “incorrect/invalid” data.
But I’m curious if you’ve had different experience, and perhaps the problems I worry about don’t happen as much in practice? (I have definitely encountered isolated specific examples of actual invalid/incorrect ISBNs creating false matches when matching on these $z’s, but haven’t tried to survey a sample).
Jonathan,
I actually had some of the same thoughts with the $z in the 020. Being someone that had the opportunity to work as a cataloger for a number of years, my initial inclination was to keep the $z out of any process used for records identification. However, as I worked on developing this process, I spoke to a number of different catalogers that would make real world use of this function and there was near universal agreement that the $z be utilized as a the match of last resort. So, that’s how it works. MarcEdit will look at all other data first and failing that, will then utilize the $z if present to attempt to find a valid match point.
–TR