Nov 082015

I’ve posted a new MarcEdit update.  You can get the builds directly from: or using the automated update tool within MarcEdit.  Direct links:

The change log follows:



MarcEdit Mac ChangeLog: 11/8/2015

MarcEdit Applications Changes:
* Build New Field Tool Added
** Added Build New Field Tool to the Task Manager
* Validate Headings Tool Added
* Extract/Delete Selected Records Tool Added

* Updates to Linked Data tool
** Added option to select oclc number for work id embedding
** Updated Task Manager signatures

* Edit Indicators
** Removed a blank space as legacy wildcard value.  Wildcards are now strictly “*”

Merge Records Tool
* Updated User defined fields options to allow 776$w to be used (fields used as part of the MARC21 option couldn’t previously be redefined to act as a single match point)

* Results page will print UTF8 characters (always) if present

* Adding an option so if selected, 880 will be sorted as part of their paired field.

Z39.50 Client
* Supports Single and Batch Search Options

 Posted by at 8:52 am
Nov 082015

I’ve posted a new MarcEdit update.  You can get the builds directly from: or using the automated update tool within MarcEdit.  Direct links:

The change log follows:



MarcEdit Windows/Linux ChangeLog: 11/8/2015

MarcEdit Application Changes:
* Updates to the Build New Field Tool
** Code moved into meedit code library (for portability to the mac system)
** Separated options to provide an option to add new field only, add when not present, replace existing fields
** Updated Task Manager signatures — if you use this function in a task, you will need to update the task

* Updates to Linked Data tool
** Added option to select oclc number for work id embedding
** Updated Task Manager signatures
** Updated cmarcedit commandline options

* Edit Indicators
** Removed a blank space as legacy wildcard value.  Wildcards are now strictly “*”

Merge Records Tool
* Updated User defined fields options to allow 776$w to be used (fields used as part of the MARC21 option couldn’t previously be redefined to act as a single match point)

* Results page will print UTF8 characters (always) if present

Validate ISBN/ISSN
* Results page now includes the 001 if present in addition to the record # in the file

* Adding an option so if selected, 880 will be sorted as part of their paired field.

* Added Sorting Preferences
* Added New Options Option, shifting the place where the folder settings are set.

UI Improvements
* Various UI improvements made to better support Windows 10.

 Posted by at 8:51 am
Oct 172015

An interesting question came up on the ListServ this week – a user was wondering if a task could be created with the option that data sorted in the task was variable.  An example user-case might be something like, a task where the replace all function may be variable depending which vendor file might be processed. 

By default, the Task Automation tool has been designed to be pretty much like a macro recorder.  You set values, it simply uses those values.  However, at it’s core, the task automation tool is just a script engine – the tasks represent a simple set of commands that get interpreted by the automation engine.  Given that, it would be pretty easy to provide the ability to support user defined values within a task.  So, I’m giving it a go.  I’ve defined a special mnemonic – {inputbox_[yourvalue]} which can be defined within a task – and when encountered, the task engine will prompt the user for data. 

The important part of the mnemonic – the part the tells the engine that user data is required, is the first part of the mnemonic: {inputbox_.  When this statement is seen, the engine pauses and passes the command to the pre-processor.  The pre-processor looks at the start of the mnemonic, and then pulls the data after the {inputbox_ to give the user a prompt regarding the data that is being requested.  

For example, say the user is creating a Replace All task and the program should request data for both the Find and the Replace strings.  The mnemonic should look like the following for the Find expression: {inputbox_Find} and for the replace: {inputbox_Replace}. 


When run, the pre-parser, when coming across these values, will break them down and prompt the user for input:



The pre-parser will then substitute the user provided values into the task and process the data accordingly.  If the user cancels the dialog – the pre-parser will take that as an indication that this process should be skipped, and will move on to the next operation in the task. 

This change will be part of the next update.


 Posted by at 8:18 pm
Oct 172015

MarcEdit’s Validate Headings tool is getting a refresh to add a few missing elements.  Two new features are being added to the tool – the ability to automatically correct variants when they are detected, and the ability to automatically generate preliminary authority records for personal (100/700) records. 

The new interface looks like:



Example of a sample generated authority record:

=LDR  00000nz\a2200000o\4500
=008  151016n|\acannaabn\\\\\\\\\\|n\a|d\\\\||
=100  10$aWillson, Meredith,$d1902-
=670  \\$aWillson, Meredith,1902-. $bWhat every young musician should know.

The records are generated directly off the data in the record.  This means that if the heading is coded incorrectly (dates not in the $d, etc.) – then the generated data will be as well, but this is a start.  You’ll notice that the data is coded as being preliminary because these are automated generated, and probably should be evaluated at some point.


 Posted by at 2:00 pm
Sep 262015

After about a month of working with the headings validation tool, I’m ready to start adding a few enhancements to provide some automated headings corrections.  The first change to be implemented will be automatic correction of headings where the preferred heading is different from the in-use headings.  This will be implemented as an optional element.  If this option is selected, the report will continue to note variants are part of the validation report – but when exporting data for further processing – automatically corrected headings will not be included in the record sets for further action.


Additionally – I’ll continue to be looking at ways to improve the speed of the process.  While there are some limits to what I can do since this tool relies on a web service (outside of providing an option for users to download the ~10GB worth of LC data locally), there are a few things I can to do continue to ensure that only new items are queried when resolving links.

These changes will be made available on the next update.


 Posted by at 2:24 pm

MarcEdit 6.1 (Windows/Linux)/MarcEdit Mac (1.1.25) Update

 MarcEdit  Comments Off on MarcEdit 6.1 (Windows/Linux)/MarcEdit Mac (1.1.25) Update
Sep 072015

So, this update is a bit of a biggie.  If you are a Mac user, the program officially moves out of the Preview and into release.  If you are a Mac user, this version brings the following changes:

** 1.1.25 ChangeLog

  • Bug Fix: MarcEditor — changes may not be retained after save if you make manual edits following a global updated.
  • Enhancement: Delimited Text Translator completed.
  • Enhancement: Export Tab Delimited complete
  • Enhancement: Validate Headings Tool complete
  • Enhancement: Build New Field Tool Complete
  • Enhancement: Build New Field Tool added to the Task Manager
  • Update: Linked Data Tool — Added Embed OCLC Work option
  • Update: Linked Data Tool — Enhance pattern matching
  • Update: RDA Helper — Updated for parity with the Windows Version of MarcEdit
    * Update: MarcValidator — Enhancements to support better checking when looking at the mnemonic format.

If you are on the Windows/Linux version – you’ll see the following changes:

* 6.1.60 ChangeLog

  • Update: Validate Headings — Updated patterns to improve the process for handling heading validation.
  • Enhancement: Build New Field — Added a new global editing tool that provides a pattern-based approach to building new field data.
  • Update: Added the Build New Field function to the Task Management tool.
  • UI Updates: Specific to support Windows 10.

The Windows update is a significant one.  A lot of work went into the Validate Headings function, which impacts the Linked Data tools and the underlying linked data engine.  Additionally, the Build New Fields tool provides a new global editing function that should simplify complex edits.  If I can find the time, I’ll try to mark up a youtube video demoing the process.

You can get the updates from the MarcEdit downloads page: or if you have MarcEdit configured to check automated updates – the tool will notify you of the update and provide a method for you to download it.

If you have questions – let me know.


 Posted by at 7:23 pm

MarcEdit Mac–Release Version 1 Notes

 MarcEdit  Comments Off on MarcEdit Mac–Release Version 1 Notes
Sep 062015

This has been a long-time coming – making up countless hours and the generosity of a great number of people to test and provide feedback (not to mention the folks that crowd sourced the purchase of a Mac) – but MarcEdit’s Mac version is coming out of Preview and will be made available for download on Labor Day.  I’ll be putting together a second post officially announcing the new versions (all versions of MarcEdit are getting an update over labor day), so if this interests you – keep an eye out.

So exactly what is different from the Preview versions?  Well, at this point, I’ve completed all the functions identified for the first set of development tasks – and then some.  New to this version will be the new Validate Headings tool just added to the Windows version of MarcEdit, the new Build New Field utility (and inclusion into the Task Automation tool), updates to the Editor for performance, updates to the Linking tool due to the validator, inclusion of the Delimited Text Translator and the Export Tab Delimited Text Translator – and a whole lot more.

At this point, the build is made, the tests have been run – so keep and eye out tomorrow – I’ll definitely be making it available before the Ohio State/Virginia Tech football game (because everything is going to stop here once that comes on).  Smile

To everyone that has helped along the way, providing feedback and prodding – thanks for the help.  I’m hoping that the final result will be worth the wait and be a nice addition to the MarcEdit family.  And of course, this doesn’t end the development on the Mac – I have 3 additional sprints planned as I work towards functional parity with the Windows version of MarcEdit.


 Posted by at 7:45 pm
Aug 232015

Last week, I posted an update that included the early implementation of the Validate Headings tool.  After a week of testing, feedback and refinement, I think that the tool now functions in a way that will be helpful to users.  So, let me describe how the tool works and what you can expect when the tool is run.


The Validate Headings tool was added as a new report to the MarcEditor to enable users to take a set of records and get back a report detailing how many records had corresponding Library of Congress authority headings.  The tool was designed to validate data in the 1xx, 6xx, and 7xx fields.  The tool has been set to only query headings and subjects that utilize the LC authorities.  At some point, I’ll look to expand to other vocabularies.

How does it work

Presently, this tool must be run from within the MarcEditor – though at some point in the future, I’ll extract this out of the MarcEditor, and provide a stand alone function and a integration with the command line tool.  Right now, to use the function, you open the MarcEditor and select the Reports/Validate Headings menu.


Selecting this option will open the following window:


Options – you’ll notice 3 options available to you.  The tool allows users to decide what values that they would like to have validated.  They can select names (1xx, 600,10,11, 7xx) or subjects (6xx).  Please note, when you select names, the tool does look up the 600,610,611 as part of the process because the validation of these subjects occurs within the name authority file.  The last option deals with the local cache.  As MarcEdit pulls data from the Library of Congress – it caches the data that it receives so that it can use it on subsequent headings validation checked.  The cache will be used until it expires in 30 days…however, a user at any time can check this option and MarcEdit will delete the existing cache and rebuild it during the current data run. 

Couple things you’ll also note on this screen. There is an extract button and it’s not enabled.  Once the Validate report is run, this button will become enabled if there are any records that are identified as having headings that could not be validated against the service. 

Running the Tool:

Couple notes about running the tool.  When you run the tool, what you are asking MarcEdit to do is process your data file and query the Library of Congress for information related to the authorized terms in your records.  As part of this process, MarcEdit sends a lot of data back and forth to the Library of Congress utilizing the service.  The tool attempts to use a light touch, only pulling down headings for a specific request – but do realize that a lot of data requests are generated through this function.  You can estimate approximately how many requests might be made on a specific file by using the following formula: (number of records x 2)  + (number of records), assuming that most records will have 1 name to authorize and 1 subjects per record.  So a file with 2500 records would generate ~7500 requests to the Library of Congress.  Now, this is just a guess, in my tests, I’ve had some sets generate as many as 12,000 requests for 2500 records and as few as 4000 requests for 2500 records – but 7500 tended to be within 500 requests in most test files.

So why do we care?  Well, this report has the potential to generate a lot of requests to the Library of Congress’s identifier service – and while I’ve been told that there shouldn’t be any issues with this – I think that question won’t really be known until people start using it.  At the same time – this function won’t come as a surprise to the folks at the Library of Congress – as we’ve spoken a number of times during the development.  At this point, we are all kind of waiting to see how popular this function might be, and if MarcEdit usage will create any noticeable up-tick in the service usage.

Validation Results:

When you run the validation tool, the program will go through each record, making the necessary validation requests of the LC ID service.  When the service has completed, the user will receive a report with the following information:

Validation Results:
Process completed in: 121.546001431667 minutes. 
Average Response Time from LC: 0.847667984420415
Total Records: 2500
Records with Invalid Headings: 1464
1xx Headings Found: 1403
6xx Headings Found: 4106
7xx Headings Found: 1434
1xx Headings Not Found: 521
6xx Headings Not Found: 1538
7xx Headings Not Found: 624
1xx Variants Found: 6
6xx Variants Found: 1
7xx Variants Found: 3
Total Unique Headings Queried: 8604
Found in Local Cache: 1001

This represents the header of the report.  I wanted users to be able to quickly, at a glance, see what the Validator determined during the course of the process.  From here, I can see a couple of things:

  1. The tool queried a total of 2500 records
  2. Of those 2500 records, 1464 of those records had a least one heading that was not found
  3. Within those 2500 records, 8604 unique headers were queried
  4. Within those 2500 records, there were 1001 duplicate headings across records (these were not duplicate headings within the same record, but for example, multiple records with the same author, subject, etc.)
  5. We can see how many Headings were found by the LC ID service within the 1xx, 6xx, and 7xx blocks
  6. Likewise, we can see how many headings were not found by the LC ID service within the 1xx, 6xx, and 7xx blocks.
  7. We can see number of Variants as well.  Variants are defined as names that resolved, but that the preferred name returned by the Library of Congress didn’t match what was in the record.  Variants will be extracted as part of the records that need further evaluation.

After this summary of information, the Validation report returns information related to the record # (record number count starts at zero) and the headings that were not found.  For example:

Record #0
Heading not found for: Performing arts--Management--Congresses
Heading not found for: Crawford, Robert W

Record #5
Heading not found for: Social service--Teamwork--Great Britain

Record #7
Heading not found for: Morris, A. J

Record #9
Heading not found for: Sambul, Nathan J

Record #13
Heading not found for: Opera--Social aspects--United States
Heading not found for: Opera--Production and direction--United States

The current report format includes specific information about the heading that was not found.  If the value is a variant, it shows up in the report as:

Record #612
Term in Record: bible.--criticism, interpretation, etc., jewish
LC Preferred Term: Bible. Old Testament--Criticism, interpretation, etc., Jewish
Heading not found for: Bible.--Criticism, interpretation, etc

Here you see – the report returns the record number, the normalized form of the term as queried, the current LC Preferred term, and the URL to the term that’s been found.

The report can be copied and placed into a different program for viewing or can be printed (see buttons).


To extract the records that need work, minimize or close this window and go back to the Validate Headings Window.  You will now see two new options:


First, you’ll see that the Extract button has been enabled.  Click this button, and all the records that have been identified as having headings in need of work will be exported to the MarcEditor.  You can now save this file and work on the records. 

Second, you’ll see the new link – save delimited.  Click on this link, and the program will save a tab delimited copy of the validation report.  The report will have the following format:

Record ID [tab] 1xx [tab] 6xx [tab] 7xx [new line]

Each column will be delimited by a colon, so if two 1xx headings appear in a record, the current process would create a single column, but with the headings separated by a colon like: heading 1:heading 2. 

Future Work:

This function required making a number of improvements to the linked data components – and because of that, the linking tool should work better and faster now.  Additionally, because of the variant work I’ve done, I’ll soon be adding code that will give the user the option to update headings for Variants as is report or the linking tool is running – and I think that is pretty cool.  If you have other ideas or find that this is missing a key piece of functionality – let me know.


 Posted by at 7:16 pm

MarcEdit 6 Wireframes — Validating Headings

 MarcEdit  Comments Off on MarcEdit 6 Wireframes — Validating Headings
Aug 092015

Over the last year, I’ve spent a good deal of time looking for ways to integrate many of the growing linked data services into MarcEdit.  These services, mainly revolving around vocabularies, provide some interesting opportunities for augmenting our existing MARC data, or enhancing local systems that make use of these particular vocabularies.  Examples like those at the Bentley ( are real-world demonstrations of how computers can take advantage of these endpoints when they are available.

In MarcEdit, I’ve been creating and testing linking tools for close to a year now, and one of the areas I’ve been waiting to explore is whether libraries can utilize linking services to build their own authorities workflows.  Conceptually, it should be possible – the necessary information exists…it’s really just a matter of putting it together.  So, that’s what I’ve been working on.  Utilizing the linked data libraries found within MarcEdit, I’ve been working to create a service that will help users identify invalid headings and records where those headings reside.

Working Wireframes

Over the last week, I’ve prototyped this service.  The way that it works is pretty straightforward.  The tool extracts the data from the 1xx, 6xx, and 7xx fields, and if they are tagged as being LC controlled, I query the service to see what information I can learn about the heading.  Additionally, since this tool is designed for work in batch, there is a high likelihood that headings will repeat – so MarcEdit is generating a local cache of headings as well – this way it can check against the local cache rather than the remote cache when possible.  The local cache will constantly be grown – with materials set to expire after a month.  I’m still toying with what to do with the local cache, expirations, and what the best way to keep it in sync might be.  I’d originally considered pulling down the entire LC names and subjects headings – but for a desktop application, this didn’t make sense.  Together, these files, uncompressed, consumed GBs of data.  Within an indexed database, this would continue to be true.  And again, this file would need to be updated regularly.  To, I’m looking for an approach that will give some local caching, without the need to make the user download and managed huge data files.

Anyway – the function is being implemented as a Report.  Within the Reports menu in the MarcEditor, you will eventually find a new item titled Validate Headings.


When you run the Validate Headings tool, you will see the following window:


You’ll notice that there is a Source file.  If you come from the MarcEditor, this will be prepopulated.  If you come from outside the MarcEditor, you will need to define the file that is being processed.  Next, you select the elements to authorize.  Then Click Process.  The Extract button will initially be enabled until after the data run.  Once completed, users can extract the records with invalid headings.

When completed, you will receive the following report:


This includes the total processing time, average response from LC’s service, total number of records, and the information about how the data validated.  Below, the report will give you information about headings that validated, but were variants.  For example:

Record #846
Term in Record: Arnim, Bettina Brentano von, 1785-1859
LC Preferred Term: Arnim, Bettina von, 1785-1859

This would be marked as an invalid heading, because the data in the record is incorrect.  But the reporting tool will provide back the Preferred LC label so the user can then see how the data should be currently structured.  Actually, now that I’m thinking about it – I’ll likely include one more value – the URI to the dataset so you can actually go to the authority file page, from this report.

This report can be copied or printed – and as I noted, when this process is finished, the Extract button is enabled so the user can extract the data from the source records for processing.

Couple of Notes

So, this process takes time to run – there just isn’t any way around it.  For this set, there were 7702 unique items queried.  Each request from LC averaged 0.28 seconds.  In my testing, depending on the time of day, I’ve found that response rate can run between 0.20 seconds per request to 1.2 seconds per response.  None of those times are that bad when done individually, but when taken in aggregate against 7700 queries – it adds up.  If you do the math, 7702*0.2 = 1540 seconds to just ask for the data.  Divide that by 60 and you get 25.6 minutes.  The total time to process that means that there are 11 minutes of “other” things happening here.  My guess, that other 11 minutes is being eaten up by local lookups, character conversions (since LC request UTF8 and my data was in MARC8) and data normalization.  Since there isn’t anything I can do about the latency between the user and the LC site – I’ll be working over the next week to try and remove as much local processing time from the equation as possible.

Questions – let me know.


 Posted by at 7:44 am
Aug 022015

MarcEdit Mac users, a new preview update has been made available.  This is getting pretty close to the first “official” version of the Mac version.  And for those that may have forgotten, the preview designation will be removed on Sept. 1, 2015.

So what’s been done since the last update?  Well, I’ve pretty much completed the last of the work that was scheduled for the first official release.  At this point, I’ve completed all the planned work on the MARC Tools and the MarcEditor functions.  For this release, I’ve completed the following:

** 1.0.9 ChangeLog

  • Bug Fix: Opening Files — you cannot select any files but a .mrc extension. I’ve changed this so the open dialog can open multiple file types.
  • Bug Fix: MarcEditor — when resizing the form, the filename in the status can disappear.
  • Bug Fix: MarcEditor — when resizing, the # of records per page moves off the screen.
  • Enhancement: Linked Data Records — Tool provides the ability to embed URI endpoints to the end of 1xx, 6xx, and 7xx fields.
  • Enhancement: Linked Data Records — Tool has been added to the Task Manager.
  • Enhancement: Generate Control Numbers — globally generates control numbers.
  • Enhancement: Generate Call Numbers/Fast Headings – globally generated call numbers/fast headings for selected records.
  • Enhancement: Edit Shortcuts — added back the tool to enabled Record Marking via a comment.

Over the next month, I’ll be working on trying to complete four other components prior to the first “official” release Sept. 1.  This means that I’m anticipating at least 1, maybe 2 more large preview releases before Sept. 1, 2015.  The four items I’ll be targeting for completion will be:

  1. Export Tab Delimited Records Feature — this feature allows users to take MARC data and create delimited files (often for reporting or loading into a tool like Excel).
  2. Delimited Text Translator — this feature allows users to generate MARC records from a delimited file.  The Mac version will not, at least initially, be able to work with Excel or Access data.  The tool will be limited to working with delimited data.
  3. Update Preferences windows to expose MarcEditor preferences
  4. OCLC Metadata Framework integration…specifically, I’d like to re-integrate the holdings work and the batch record download.

How do you get the preview?  If you have the current preview installed, just open the program and as long as you have the notifications turned on – the program will notify that an update is available.  Download the update, and install the new version.  If you don’t have the preview installed, just go to: and select the Mac app download.

If you have any questions, let me know.


 Posted by at 4:42 pm