Merry Christmas–MarcEdit 5.7 Available

By reeset / On / In MarcEdit

Merry Christmas everyone.  I hope that everyone has a safe, and happy holidays with their family and their friends.  In what has become a bit of a holiday tradition, I’m releasing an update to MarcEdit, MarcEdit 5.7.  Yep, this shifts from version number from 5.6 to 5.7, and there are some pretty good reasons why – so lets get to it.

Updates

Native MARCXML Processing

I’ve talked about this change at length in an earlier post, but in order to facilitate some of the work that I’m interested in doing situated around MarcEdit and Linked Data, I had to improve the XML processing related to MARCXML.  Previously, MarcEdit utilized XSLT processing for all XML conversions.  This works great, provides a lot of flexibility, but has a fairly substantial memory footprint with visible performance issues when dealing with larger (500 MB+) MARCXML file sets.  To deal with these issues, I’ve updated MarcEdit so that I’ve now included native processing of MARCXML data using a SAX style XML processor.  This means validation of the document happens as the document is processed, but the take away is that MarcEdit’s MARCXML process has nearly no additional memory footprint and processes data approximately 190 times faster than the current process.  Of course, some people may have good reason to want to continue to use the XSLT style processing (for example, they may have customized the MARCXML=>MARC xslt), so I’ve also maintained the ability for users to continue to use the previous XSLT style MARCXML processing (though the new method is the default).

You can modify the MARCXML processing preferences within the Application Preferences window.

image

Users wanting to disable the native XSLT processing function and utilize the previous XSLT process simply need to uncheck the Use Native Option (Non-XSLT Process).  When this option is unchecked, the non-native option will be used.

This change has an impact in other parts of the program as well.  If you use the MarcEdit COM based API or .NET API to access the MARCEngine – API calls to the engine for MARCXML=>MARC processing will utilize the XSLT translation process if an XSLT is passed into the function.  If you want to use the native process, simply pass an empty string (or null value) to the function.

Likewise, individuals using the cmarcedit.exe program (MarcEdit’s Console Program).  If you want to use the native process, simply do not provide an xslt when calling the MARCXML=>MARC translation.

UTF8=>MARC8 conversion updates

The UTF8=>MARC character conversion process wasn’t treating combining characters for diacritics represented as {dotb} or {commab} correctly.  These diacritics were recognized, but the combining byte wasn’t being moved properly within the string causing the diacritic to modify the wrong value.  I’d like to thank Joe Altimus at Arizona State University for bringing this to my attention this week.

Multiple File Record Deduplication Utility

One of the feature requests that I get every now and again, is a request to update the MarcEdit duplication record function found in the MarcEditor.  Very often, users want to run this tool over multiple files, rather than find duplication records in a single source file.  So, I’ve modified the existing function so that you can now perform this function outside of the MarcEditor, and upon multiple files.

You find this function on the main MarcEdit window, under Tools/Find Duplicate Records.

image

When you run this function, you get the following window.

image

Simply click on the Open folder, and select a file.  To add another file, simply select the open icon again and select another file.  You’ll see selected files added to the dropdown list.  MarcEdit will then utilize the files in this list to perform the stated operation.  At this point, this function is an extension of the existing deduplication tool.  I was considering making a tool that did a more heuristically analysis of the records to determine duplicate records, but at this point, I’m going to wait for users to give this a try and provide some feedback so I can target my development accordingly.

Introduction of MarcEditor Editing Shortcuts

I was spending some time looking through the MarcEdit listserv the past few weeks, and one of the things that I have noticed is that a lot of questions to the listserv revolve around regular expressions.  Generally, these are questions from catalogers that have used regular expressions in the past, but just need a little nudge to solve a problem.  That’s great…but I also notice that there are a few questions that come up a lot.  One of these questions revolve around the character case within records (specifically titles).  So what I’ve done (and if it’s useful we’ll keep it, if it’s not, I can retire it quickly), is added a new menu entry in the MarcEditor/Tools menu called Edit Shortcuts.

image

As you can see from the screenshot, the first set of Edit Shortcuts that I’ve added to the program deal with changing character case within the program.  Essentially, these are shortcuts that initialize specific regular expressions for you over a defined set of MARC data (field/subfield combination).  My hope is that people will find these shortcuts useful, and will suggest additional shortcuts that I can add to the program.  Moreover, at this point, you cannot add these shortcuts to an Automation Task.  This is primarily because these shortcuts are virtual placeholders within the program – they only are meta-functions.  However, if people thing that this would be useful, I’m certainly happy to go back and figure out a way to make these a part of the task automation function.

MODS=>RDF XSLT stylesheet added to the XSLT repository

I’ve starting to look at ways to

    1. Make the generation of linked data easier
    2. Provide tangible linked data examples from MARC

As part of that work, I’ve been working with a MODS=>RDF (linked data) example created by Stefano Mazzocchi in 2006, with edits.  For users interested in following that work, or playing with it themselves, they can download the stylesheet from the MarcEdit XSLT repository.  As part of this work, I’ve found on enhancement that I’ve started working on – and that is the ability to chain XSLT processes together.  Currently, if you want to use this stylesheet from MARC, you will need to translate the data from MARC=>MODS, and then run a second process translating the data from MODS to RDF triples.  Ideally, I’d like to make that one step – so I’ll be spending some time looking at how that might be accomplished.

Getting the update

In addition to the updates listed above, I made a handful of minor changes to the program.  The majority of these changes represent usability or code optimizations, but there are there nevertheless.  If you want to get the update and you currently have MarcEdit, you can download the updated application through the automated updater found within MarcEdit, or you can get the update from:

    1. MarcEdit Website:  http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html
    2. Windows 32-bit download:  MarcEdit_Setup.msi
    3. Windows 64-bit download:  MarcEdit_Setup64.msi
    4. Alternative Windows/Linux/Mac Download:  marcedit.zip

Again, have a safe and merry Christmas everybody,

 

–TR

3 thoughts on “Merry Christmas–MarcEdit 5.7 Available

  1. It would be great if you could make MarcEdit portable. At work, ours computers are not often upgraded and .NEt 2.0 is installed. Our tech departement won’t install 4.0 becase many of our web bases applications could stop working. I’ve installed Mono on my USB drive whit many other applications (Firefox, etc.). Could MarcEdit be configured to work with Mono for Windows on a USB drive ?

    1. It probably can — you’d just need to use the zip install and run the program through the command line too, using the mono path to start the program.