MarcEdit 7: ISBD Normalization Experimentation

Since the early 2000’s, when libraries started moving traditional cataloging data into and out of MARC, the presence and necessity of ISBD punctuation has been a topic of much debate.  Generally, if you have to take data from MARC to XML, ISBD punctuation is a scourge, one of many past sins that tied MARC a little too closely to its historical presentation mediums, that had to be banished before real work could be done with the data.  If you had to move data from XML back to MARC, ISBD punctuation was a necessary evil if your data was to have any chance of looking relatively normal when it entered one’s ILS.  I tend to fall somewhere in the middle, as I’ve found ISBD punctuation, when applied consistently, can be utilized to make migrations or transformations of data much easier.  Of course, there was a big, glaring caveat in that statement…is useful when applied consistently.  The reality is, it’s never applied consistently, and in most cases, I find that the easiest way to migrate data is to remove all ISBD markers and simply put it back if necessary, which as you might imagine, can lead to many moments of self reflection, wondering if this is really want I want to be doing with my life.

As libraries and the PCC continue to march towards a world where library data finds ways to intersect with and make use of semantic data to varying degrees — this issue of ISBD normalization continues to raise its ugly head.  So, when the PCC reached out to ask how ISBD normalization could be implemented within MarcEdit — I was interested.

Of course, here’s the challenge when talking about MarcEdit.  MarcEdit is MARC agnostic (it believes in MARC, but not an ultimate higher MARC power) — and tools placed into MarcEdit’s core toolset need to be applicable to all MARC formats.  That makes this kind of work, a poor fit.  The reality is that even if MARC21 decides to forgo ISBD punctuation (or offer that as an option), many libraries will continue to use it, and flavors of MARC like UNIMARC may as well.  This means that any tool related to ISBD punctuation would need to fall into a different scope.  Fortunately, I have a place for these kinds of functions in MarcEdit.

Within the MarcEditor — under the edit menu, there is a set of tools called Edit Shortcuts.

image

These are often tools that are what I consider one-off operations that generally are tied to specific rules or practice found within MARC21/RDA.  These often can be used in TASKS, and provide a place to make these kinds of rule specific processes available to users.  This is where I’m implementing the ISBD Normalization.

If you look at the image above, you will see a new option — Clean ISBD Punctuation.  I think this is misnamed — I think I will eventually go back and call it Normalize ISBD Punctuation — because at this moment in time, this seems to be what we are talking about.  The PCC has a working group currently asking this question, and they are creating a spreadsheet with regular expressions to discuss desired outcomes.  This function utilizes those expressions.  Currently, the tool links to a new configuration file that looks like this:

 

 

# Using the Expressions created by the 
# PCC working group.
# The problem here is that this will be really, really slow.
# Adding the first part to limit the fields for processing
# This will make the process faster.
# Better option may be to identify only the elements that 
# need regular expressions and the rest will be more of a 
# basic process removing trailing punctuation.
# --TR
=LDR	(=LDR\s\s.{18})[a|i](.+)	$1c$2
=020	(=020\s{2}\\{2}\$.+?)(\$q)\((.+)\)	$1 $2$3
=020	(=020\s{2}\\{2}.+?\$q.+?)\s\;(\$q.+)	$1 $2
=245	(=245\s{2}.{2}\$.+?)\s\:(\$b.+)	$1 $2
=245	(=245\s{2}.{2}\$.+?)\s\/(\$c.+)	$1 $2
=245	(=245\s{2}.{2}\$.+?)\s\=(\$.)(.+)	$1 $2= $3
=245	(=245\s{2}.{2}\$.+?)\.(\$p.+)	$1 $2
=245	(=245\s{2}.{2}\$.+?)\.(\$n.+)	$1 $2
=245	(=245\s{2}.{2}\$.+?)\,(\$p.+)	$1 $2
=245	(=245\s{2}.{2}\$.+?)\s;(\$b)(.+)	$1 $2; $3
=250	(=250\s{2}.{2}\$.+?)\s\/(\$b.+)	$1 $2
=250	(=250\s\s.{2}\$.+\s)\=\$b(.+)	$1$b= $2
=264	(=264\s{2}.{2}\$.+?)\s\:(\$b.+)	$1 $2
=264	(=264\s{2}.{2}\$.+?)\s\;(\$a.+)	$1 $2
=264	(=264\s{2}.{2}\$.+?)\,(\$c.+)	$1 $2
=260	(=260\s{2}.+?)\s\:(.+)$	$1 $2
=260	(=260\s{2}.+?)\s\;(\$a.+)	$1 $2
=260	(=260\s{2}.+?)\s\;(\$a.+)$	$1 $2
=260	(=260\s{2}.+?)\,(\$c.+)	$1 $2
=300	(=300\s\s..\$a.+\s);\$c(.+)	$1$c$2
=300	(=300\s\s..\$.+\s)\+\$e(.+)	$1$e$2
=490	(=490\s\s..\$.+\s)\;\$v(.+)	$1$v$2
=490	(=490\s\s..\$.+)\,\$x(.+)	$1 $x$2
=5	(=5[0-9]{2}.{4}\$.+)\."$	$1"
=2	(=[2-5][0-9]{2}.+)\.$	$1
=3	(=[2-5][0-9]{2}.+)\.$	$1
=4	(=[2-5][0-9]{2}.+)\.$	$1
=5	(=[2-5][0-9]{2}.+)\.$	$1

 

Format of the file is field [tab] expression [tab] replacement.  This doesn’t feel like the best way to do this work, as many of the processes noted in these expressions are more trim operations (removing punctuation from specific fields/subfields) so I’ll likely be updating this file at some point to include options to use regular expressions or character stripping operations.  Likewise, many of these expressions could be optimized a bit (for better use within .NET) — but that seems like work that could be done at a different time.  The gist here, is that using this model, the PCC or community members can currently participate in the discussion around ISBD normalization — and for me, that was the goal.

So, do you want to use this.  Short and long answer — it depends.  This is experimental meaning that it was created specifically for the purpose of demonstrating and spurring conversation.  The output is not supported in any ILS, in part, because it implements a recommendation by the PCC to update a value in the LDR that currently isn’t defined.  This would break your records in most ILS Systems.  Longer-term, I hope the answer is yes.  The same way that the RDA Helper is helper libraries convert AACR2 records to a hybrid version of RDA, this tool will enable the quick conversation of ISBD Normalization — and could be run over an entire database to help users ensure, for the first time, consistency in the process.  This of course, assumes the ILS has been updated to fix the display issues that will surely arise when ISBD punctuation is normalized away.

So, if you are interested in seeing how this works — you can see a video here:

This demonstrates the current process and how you can update and change the configuration file.  As the PCC makes updates, I will pass these updates on as well.

If you have questions — please feel free to reach out and let me know.

–tr


Posted

in

by

Tags: