Regular Expression Recursive Replacement in MarcEdit

One of the new functions in MarcEdit is the inclusion of a Field Edit Data function.  For the most part, batch edits on field data has been handled primarily via regular expressions using the traditional replace function.  For example, if I had the following field:

=999  \\$zdata1$ydata2

And I wanted to swap the subfield order, I’d use a regular expression in the Replace function and construct the following:
Find: (=999.{4})(\$z.*[^$])(\$y.*)
Replace: $1$3$2

This works well — when needing to do advanced edits.  The problem was that any field edits that didn’t fit into the Edit Subfield tool needed to be done as a regular expression.  In an effort to simplify this process, I’ve introduced an Edit Field Data tool.

image

This tool exposes the data, following the indicators, for edit.  So, in the following field:
=999  \\$aTest Data

The Edit Field Data tool could interact with the data: “$aTest Data”.  Ideally, this will potentially simplify the process of doing most field edits.  However, it also opens up the opportunity to do recursive group and replacements.

When harvesting data, often times subjects may be concatenated with a delimiter, so for example, a single 650 field may represent multiple subjects, separated by a delimiter of a semicolon.  The new function will allow users to capture data, and recursively create new fields from the groups.  So for example, if I had the following data:
=650  \7$adata — data; data — data; data — data;

And I wanted the output to look like:
=650  \7$adata — data
=650  \7$adata — data
=650  \7$adata — data

I could now use this function to achieve that result.  Using a simple regular expression, I can create a recursively matching group, and then generate new fields using the “/r” parameter.  So, to do this, I would use the following arguments:
Field: 650
Find: (data — data[; ]?)+
Replace: $$a$+/r
Check Use Regular Expressions.

 image

The important part of the above expression is in the Replacement syntax.  To tell MarcEdit that the recursion should result in a new line, the mnemonic /r is used at the end of the string to tell the tool that the recursion should result in a new line.

This new function will be available for use as of 4/7/2014. 

–tr


Posted

in

by

Tags:

Comments

3 responses to “Regular Expression Recursive Replacement in MarcEdit”

  1. Steve Booth Avatar
    Steve Booth

    This section is very helpful, but in my case, we have the following field :
    =653 \\$amovies$areviews$ahumor$acomedy$asketches$avideo$aviral$a’video games’$aanime$anostalgia$afunny$a’nostalgia critic’$a’nostalgia chick’$a’bum reviews’$a’5 second movies

    And we would like to have all subfields $a un seperate fields.

    I cannot make this work, maybe I don’t totally understand how this function works.

    1. reeset Avatar

      So, the function works by being able to create regular expression groupings. To make this work, you need to be able to articulate the grouping logic for the function. In this case, you’d want to capture data starting with $a, but doesn’t have a $ (delimiter) in the string. Once you have that, writing the argument is straightforward.

      Field: 653
      Find: (\$a[^$]*)
      Replace: $+/r
      Check the regular expression option and then process.

      When I try this on your data — works like a treat.

      –tr

  2. Steve Booth Avatar
    Steve Booth

    I also tried to mimic the operation where you seperated the data
    Field: 650
    Find: (data — data[; ]?)+
    Replace: $$a$+/r
    Check Use Regular Expressions.

    I found that in this example, the following arguments provided the same results
    Field: 650
    Find: (data — data[;])
    Replace: $$a$+/r
    Check Use Regular Expressions.

    I don’t get what the ? and + do.