MarcEdit Custom Report Writer

I periodically get requests for a variety of different types of custom, one-off reports for addition to MarcEdit.  Some of these can be accommodated in the current tooling, some of these can’t.  Often times, I encourage folks to look at the COM or API components as this provides full access to the records and allows users to create and prepare the data output in whatever format is most appropriate.  However, I do realize that this can be challenging for users without a programming background or someone to help create the custom reports. 

Since I was already planning to make a small update to the program to correct a few odd issues when working the SRU data in Alma — I went ahead and added a custom report writer.  The tool is pretty simple.  Right now, it’s basically designed around creating counts.  You can search for specific data, either as a match case or regular expression, and return back a report noting # of times in the file and # of records. 

Here’s an example of how this could work on a recent request.  A user was interested in retrieving data in the 008 related to language and see how many different types of languages occurred within a record set.  To generate this report, the user would do the following:

1) Open the file in the MarcEditor

2) Select Reports/Custom Report.  This will generate the following Window.
image

3) Here, the user can search for data inside the record or by regular expression.  I envision that the lion’s share of queries built with this tool will be regular expression in nature.  To answer the language question, we need to read bytes 35, 36, and 37.  Since MARC starts count at zero, that means in a base 1 (which regex uses), we’ll be reading bytes 36,37, and 38.  So, we create the expression:
(=008.{2}.{35})(.{3})

Let’s see what this is actually doing.  (=008.{2}.{35}) is identifying that the search should be done in the 008 field, and that we can skip the first two bytes (blank spaces in MarcEdit’s mnemonic format).  Then the tool will read forward 35 bytes.  I could have written this as .{37} and not broken these into two separate read operations, but I personally like to keep them separate because the expression is then easier to read.  This ends group 1.
(.{3})

That reads the next 3 bytes (the language code)

We now have three regular expression groups.
$0 — matches the entire field
$1 — matches =008 to byte 37
$2 — matches the 3 language codes

Assuming that we want to group on the language codes, we need to now set some values.  First, we check the Use Regular Expression option.  This will display a textbox next to the GroupBy Checkbox.  This represents the regular expression group value to group data by.  In our case, we want to use group 2.  We then set a save file and check the desired output options.  The windows should look like this:
image

When the user runs this operation, a tab delimited report is generated at the save to location.  In the case of a sample file, that output would look like the following:

Key ((=008.{2}.{35})(.{3}))	Total	Total Records
|||	476	476
eng	748	748
ger	2	2

Data is output in tab delimited format, and includes a header with the search criteria, and then the group by value, and the specified count values.

I could envision the custom report writer being expanded based on user feedback.  The idea here is to create a tool that is a bit more flexible than the canned tools and provide users with one more tool for their toolbelt.

–tr


Posted

in

by

Tags: