A question that comes up occasionally is the need to be able to conditionally add or replace a set of character data within a MARC Field. For example, consider this use case:
I’d like to add a period to the end of a field (like say, the 650 field), but only under the following conditions:
- The field ends with a word (a-z) character.
- The field doesn’t already end in a period or parenthesis
- If the field ends with any other punctuation, that value is replaced with a period.
Doing option 1 and 2 is easy and straightforward. For that option, I’d probably do something like this:
Find: (=650.*[^.;])$
Replace With: $1.
This allows MarcEdit to match any line that doesn’t end in a period or a parenthesis. However, the conditional makes this more difficult. In C#’s implementation of regular expressions, you can use substitutions and conditional matching to achieve the above result. Consider the following data:
=650 \6$aMusique populaire$zQuébec (Province)$y1951-1960.
=650 \6$aMusique populaire$zQuébec (Province)$y1961-1970,
=650 \6$aMusique populaire$zQuébec (Province)$y1961-1970)
=650 \6$aMusique populaire$zQuébec (Province)$y1961-1970;
=650 \6$aMusique populaire$zQuébec (Province)$y1961-1970
=650 \6$aMusique populaire$zQuébec
Using the above criteria, I’d like to be able to run a process that will turn the comma in line two, into a period, the semi-colon in line 4 into a period and add a period to the end of line 5 and 6. To do this, you’d setup a substitution.
Find: ((?<one>=650.*[\w])|(?<one>=650.*)(?<two>[^.)]))$
Replace With: ${one}.
So what exactly is happening here. In the .NET regular expressions, you can use named substitutions to represent groups. In this case, we create a conditional using an ‘or’ clause, using the same substitution name for each element of the clause. We then push out the replacement clause and give it a separate grouping. Now, we have isolated the data we want to keep, and can use the same statement to get all the data we want to keep/append to. Using the above, you will receive the following output:
=650 \6$aMusique populaire$zQuébec (Province)$y1951-1960.
=650 \6$aMusique populaire$zQuébec (Province)$y1961-1970.
=650 \6$aMusique populaire$zQuébec (Province)$y1961-1970)
=650 \6$aMusique populaire$zQuébec (Province)$y1961-1970.
=650 \6$aMusique populaire$zQuébec (Province)$y1961-1970.
=650 \6$aMusique populaire$zQuébec.
Obviously, the above is a fairly simple example — but the concept should can be applied to much more complicated workflows. If you are interested in reading more about the Regular Expression implementation used in MarcEdit, please see: https://msdn.microsoft.com/en-us/library/vstudio/az24scfc(v=vs.100).aspx.
Questions, let me know.
–tr