MarcEdit and Alma Integration: Working with holdings data

By reeset / On / In Cataloging, MarcEdit

Ok Alma folks,

 I’ve been thinking about a way to integrate holdings editing into the Alma integration work with MarcEdit.  Alma handles holdings via MFHDs, but honestly, the process for getting to holdings data seems a little quirky to me.  Let me explain.  When working with bibliographic data, the workflow to extract records for edit and then update, looks like the following:

 Search/Edit

  1. Records are queried via Z39.50 or SRU
  2. Data can be extracted directly to MarcEdit for editing

 

Create/Update

  1. Data is saved, and then turned into MARCXML
  2. If the record has an ID, I have to query a specific API to retrieve specific data that will be part of the bib object
  3. Data is assembled in MARCXML, and then updated or created.

 

Essentially, an update or create takes 2 API calls.

For holdings, it’s a much different animal.

Search/Edit:

  1. Search via Z39.50/SRU
  2. Query the Bib API to retrieve the holdings link
  3. Query the holdings link api to retrieve a list of holding ids
  4. Query each holdings record API individually to retrieve a holdings object
  5. Convert the holdings object to MARCXML and then into a form editable in the MarcEditor
    1. As part of this process, I have to embed the bib_id and holdin_id into the record (I’m using a 999 field) so that I can do the update

 

For Update/Create

  1. Convert the data to MARCXML
  2. Extract the ids and reassemble the records
  3. Post via the update or create API

 

Extracting the data for edit is a real pain.  I’m not sure why so many calls are necessary to pull the data.

 Anyway – Let me give you an idea of the process I’m setting up.

First – you query the data:

Couple things to note – to pull holdings, you have to click on the download all holdings link, or right click on the item you want to download.  Or, select the items you want to download, and then select CTRL+H.

When you select the option, the program will prompt you to ask if you want it to create a new holdings record if one doesn’t exist. 

 

The program will then either download all the associated holdings records or create a new one.

Couple things I want you to notice about these records.  There is a 999 field added, and you’ll notice that I’ve created this in MarcEdit.  Here’s the problem…I need to retain the BIB number to attach the holdings record to (it’s not in the holdings object), and I need the holdings record number (again, not in the holdings object).  This is a required field in MarcEdit’s process.  I can tell if a holdings item is new or updated by the presence or lack of the $d. 

 

Anyway – this is the process that I’ve come up with…it seems to work.  I’ve got a lot of debugging code to remove because I was having some trouble with the Alma API responses and needed to see what was happening underneath.  Anyway, if you are an Alma user, I’d be curious if this process looks like it will work.  Anyway, as I say – I have some cleanup left to do before anyone can use this, but I think that I’m getting close.

 

–tr

Truncating a field by a # of words in MarcEdit

By reeset / On / In Cataloging, MarcEdit

This question came up on the listserv, and I thought that it might be generically useful that other folks might find it interesting.  Here’s the question:

I’d like to limit the length of the 520 summary fields to a maximum of 100 words and adding the punctuation “…” at the end. Anyone have a good process/regex for doing this?
Example:
=520  \\$aNew York Times Bestseller Award-winning and New York Times bestselling author Laura Lippman’s Tess Monaghan—first introduced in the classic Baltimore Blues—must protect an up-and-coming Hollywood actress, but when murder strikes on a TV set, the unflappable PI discovers everyone’s got a secret. {esc}(S2{esc}(B[A] welcome addition to Tess Monaghan’s adventures and an insightful look at the desperation that drives those grasping for a shot at fame and those who will do anything to keep it.{esc}(S3{esc}(B—San Francisco Chronicle When private investigator Tess Monaghan literally runs into the crew of the fledgling TV series Mann of Steel while sculling, she expects sharp words and evil looks, not an assignment. But the company has been plagued by a series of disturbing incidents since its arrival on location in Baltimore: bad press, union threats, and small, costly on-set “accidents” that have wreaked havoc with its shooting schedule. As a result, Mann’s creator, Flip Tumulty, the son of a Hollywood legend, is worried for the safety of his young female lead, Selene Waites, and asks Tess to serve as her bodyguard. Tumulty’s concern may be well founded. Recently, a Baltimore man was discovered dead in his home, surrounded by photos of the beautiful—if difficult—aspiring star. In the past, Tess has had enough trouble guarding her own body. Keeping a spoiled movie princess under wraps may be more than she can handle since Selene is not as naive as everyone seems to think, and instead is quite devious. Once Tess gets a taste of this world of make-believe—with their vanities, their self-serving agendas, and their remarkably skewed visions of reality—she’s just about ready to throw in the towel. But she’s pulled back in when a grisly on-set murder occurs, threatening to topple the wall of secrets surrounding Mann of Steel as lives, dreams, and careers are scattered among the ruins.
So, there isn’t really a true expression that can break on number of words, in part, because how we define word boundaries will vary between different languages.  Likewise, the MARC formatting can cause a challenge.  So, the best approach is to look for good enough – and in this case, good enough is likely breaking on spaces.  My suggestion is to look for 100 spaces, and then truncate.
In MarcEdit, this is easiest to do using the Replace function.  The expression would look like the following:
Find: (=520.{4})(\$a)(?<words>([^ ]*\s){100})(.*)
Replace: $1$2${words}…
Check the use regular expressions option. (image below).
So why does this work.  Let’s break it down.
Find:
(=520.{4}) – this matches the field number, the two spaces related to the mnemonic format, and then the two indicator values.
(\$a) – this matches on the subfield a
(?<words>([^ ]*\s){100}) – this is where the magic happens.  You’ll notice two things about this.   First, I use a nested expression, and second, I name one.  Why do I do that?  Well, the reason is because the group numbering gets wonky once you start nesting expressions.  In those cases, it’s easier to name them.  So, in this case, I’ve named the group that I want to retrieve, and then have created a subgroup that matches on characters that aren’t a space, and then a space.  I then use the qualifier {100}, which means, must match at least 100 times.
(.*) — match the rest of the field.
Now when we do the replace, putting the field back together is really easy.  We know we want to reprint the field number, the subfield code, and then the group that captured the 100 units.  Since we named the 100 units, we call that directly by name.  Hence,
Replace:
$1 — prints out =520  \\
$2 — $a
${words} — prints 100 words
… — the literals
And that’s it.  Pretty easy if you know what you are looking for.
–tr

Proof of concept redux

By reeset / On / In Cataloging, MarcEdit

So I’ve been spending my time making a few changes to my proof of concept cataloging application using my phone.  A couple of things that I’ve learned along the way:

  1. No matter how good the OCR is, I’m not sure it ever gets to a point where you can just happily scan a catalog card and get all the data perfectly.  You can thank ISBD punctuation for that.
  2. Setting holds data in OCLC is much easier than you’d think it would be, thanks to the Z39.50 Extended properties.
  3. Adding a barcode reader really was easier than I thought it would be

Right now, the proof of concept allows users search (and set holdings) to OCLC (using their login credentials) or search and download records from US LC.  You can scan a barcode to get the record, or you can scan a library card and allow the program to attempt to disassemble the metadata to determine the best search profile.  Obviously, of the methods, this one is the most dodgy, but it’s interesting to see how it works and how OCR incrementally improves. 

As I’ve been working on this, it’s been making me wonder what are the real life implications for a project like this.  Obviously, one of the goals was to make taking catalog cards and making them easier to recon.  But the ability to use the phone as a barcode scanner and catalog on the fly also makes me wonder if a tool like this could be used while shelf reading or at point of acquisition of a text, or at a circulation desk when working with a book without a record. 

One benefit of this work as well, is that since this code is being written in C#, I’m starting to think about how I might co-op some of this work in MarcEdit.  The idea being that a user could upload a set of images to a folder and then MarcEdit could OCR those images and utilize the data from those images to automatically retrieve records for that content.  I’m not quite sure how reasonable of an idea this really is at this point due to limitations with OCR, but from a technical standpoint, I have all the components I would need to make this happen.  So who knows, maybe this work will spawn something new and innovative yet.  Well see.

 

–tr