MarcEdit 6 Update

 Uncategorized  Comments Off
Nov 272014

Happy Thanksgiving to those celebrating.  Rather than over indulging in food, my family and I spent our day relaxing and enjoying some down time together.  After everyone went to bed, I had a little free time and decided to wrap up the update I’ve been working on.  This update includes the following changes:

  • Language File changes.
  • Export/Delete Selected Records: UI changes
  • Biblinker — updated the tool to provide support for linking to FAST headings when available in the record
    • Updated the fields processed (targeted to ignore uncontrolled or local items)
  • Z39.50 Client — Single Search, multiple databases selected bug when number of results exceed data limit, blank data would be returned.
  • RDA Helper Bug Fix — Updated an error where under certain conditions, bracketed data would be incorrectly parsed.
  • Miscellaneous UI changes to support language changes


The Language file changes represent a change in how internationalization of the interface works.  Master language files are now hosted on GItHub, with new files added on update.  The language files are automatically generated, so they are not as good as if they were done by an individual – though some individuals are looking at the files and providing updates.  My hope is that through this process of automated language generation, coupled with human intervention, the new system will significantly help non-English speakers.  But I guess time will tell.

The download can be found by using the automated update tool in MarcEdit, or downloading the update from:

 Posted by at 10:27 pm
Oct 162014

As libraries begin to join and participate in systems to test Bibframe principles, my hope is that when possible, I can provide support through MarcEdit to provide these communities a conduit to simplify the publishing of information into those systems.  The first of these test systems is the Libhub Initiative, and working with Eric Miller and the really smart folks at Zepheira (, have created a plug-in specifically for libraries and partners working with the LibHub initiative.  The plug-in provides a mechanism to publish a variety of metadata formats into the system – from MARC, MARCXML, EAD, and MODS data – the process will hopefully help users contribute content and help spur discussion around the data model Zepheira is employing with this initiative.

For the time being, the plug-in is private, and available to any library currently participating in the LibHub project.  However, my understanding is that as they continue to ramp up the system, the plugin will be made available to the general community at large.

For now, I’ve published a video talking about the plug-in and demonstrating how it works.  If you are interested, you can view the video on YouTube.



 Posted by at 8:19 pm
Oct 162014

We hear the refrain over and over – we live in a global community.  Socially, politically, economically – the ubiquity of the internet and free/cheap communications has definitely changed the world that we live in.  For software developers, this shift has definitely been felt as well.  My primary domain tends to focus around software built for the library community, but I’ve participated in a number of open source efforts in other domains as well, and while it is easier than ever to make one’s project/source available to the masses, efforts to localize said projects is still largely overlooked.  And why?  Well, doing internationalization work is hard and often times requires large numbers of volunteers proficient in multiple languages to provide quality translations of content in a wide range of languages.  It also tends to slow down the development process and requires developers to create interfaces and inputs that support language sets that they themselves may not be able to test or validate.   


If your project team doesn’t have the language expertise to provide quality internalization support, you have a variety of options available to you (with the best ones reserved for those with significant funding).  These range of tools available to open source projects like: TranslateWiki ( which provides a platform for volunteers to participate in crowd-sourced translation services.  There are also some very good subscription services like Transifex (, a subscription service that again, works as both a platform and match-making service between projects and translators.  Additionally, Amazon’s Mechanical Turk can be utilized to provide one off translation services at a fairly low cost.  The main point though, is that services do exist that cover a wide spectrum in terms of cost and quality.   The challenge of course, is that many of the services above require a significant amount of match-making, either on the part of the service or the individuals involved with the project and oftentimes money.  All of this ultimately takes time, sometimes a significant amount of time, making it a difficult cost/benefit analysis of determining which languages one should invest the time and resources to support.

Automated Translation

This is a problem that I’ve been running into a lot lately.  I work on a number of projects where the primary user community hails largely from North America; or, well, the community that I interact with most often are fairly English language centric.  But that’s changing — I’ve seen a rapidly growing international community and increasing calls for localized versions of software or utilities that have traditionally had very niche audiences. 

I’ll use MarcEdit ( as an example.  Over the past 5 years, I’ve seen the number of users working with the program steadily increase, with much of that increase coming from a growing international user community.  Today, 1/3-1/2 of each month’s total application usage comes from outside of North America, a number that I would have never expected when I first started working on the program in 1999.  But things have changed, and finding ways to support these changing demographics are challenging.. 

In thinking about ways to provide better support for localization, one area that I found particularly interesting was the idea of marrying automated language transcription with human intervention.  The idea being that a localized interface could be automatically generated using an automated translation tool to provide a “good enough” translation, that could also serve as the template for human volunteers to correct and improve the work.  This would enable support for a wide range of languages where English really is a barrier but no human volunteer has been secured to provide localized translation; but would enable established communities to have a “good enough” template to use as a jump-off point to improve and speed up the process of human enhanced translation.  Additionally, as interfaces change and are updated, or new services are added, automated processes could generate the initial localization, until a local expert was available to provide the high quality transcription of the new content, to avoid slowing down the development and release process.

This is an idea that I’ve been pursing for a number of months now, and over the past week, have been putting into practice.  Utilizing Microsoft’s Translation Services, I’ve been working on a process to extract all text strings from a C# application and generate localized language files for the content.  Once the files have been generated, I’ve been having the files evaluated by native speakers to comment on quality and usability…and for the most part, the results have been surprising.  While I had no expectation that the translations generated through any automated service would be comparable to human mediated translation, I was pleasantly surprised to hear that the automated data is very often, good enough.  That isn’t to say that it’s without its problems, there are definitely problems.  The bigger question has been, do these problems impede the use of the application or utility.  In most cases, the most glaring issue with the automated translation services has been context.  For example, take the word Score.  Within the context of MarcEdit and library bibliographic description, we know score applies to musical scores, not points scored in a game…context.  The problem is that many languages do make these distinctions with distinct words, and if the translation service cannot determine the context, it tends to default to the most common usage of a term – and in the case of library bibliographic description, that would be often times incorrect.  It’s made for some interesting conversations with volunteers evaluating the automated translations – which can range from very good, to down right comical.  But by a large margin, evaluators have said that while the translations were at times very awkward, they would be “good enough” until someone could provide better a better translation of the content.  And what is more, the service gets enough of the content right, that it could be used as a template to speed the translation process.  And for me, this is kind of what I wanted to hear.

Microsoft’s Translation Services

There really aren’t a lot of options available for good free automated translation services, and I guess that’s for good reason.  It’s hard, and requires both resources and adequate content to learn how to read and output natural language.  I looked hard at the two services that folks would be most familiar with: Google’s Translation API ( and Microsoft’s translation services (  When I started this project, my intention was to work with Google’s Translation API – I’d used it in the past with some success, but at some point in the past few years, Google seems to have shut down its free API translation services and replace them with a more traditional subscription service model.  Now, the costs for that subscription (which tend to be based on number of characters processed) is certainly quite reasonable, my usage will always be fairly low and a little scattershot making the monthly subscription costs hard to justify.  Microsoft’s translation service is also a subscription based service, but it provides a free tier that supports 2 million characters of through-put a month.  Since that more than meets my needs, I decided to start here. 

The service provides access to a wide range of languages, including Klingon (Qo’noS marcedit qaStaHvIS tlhIngan! nuq laH ‘oH Dunmo’?), which made working with the service kind of fun.  Likewise, the APIs are well-documented, though can be slightly confusing due to shifts in authentication practice to an OAuth Token-based process sometime in the past year or two.  While documentation on the new process can be found, most code samples found online still reference the now defunct key/secret key process.

So how does it work?  Performance-wise, not bad.  In generating 15 language files, it took around 5-8 minutes per file, with each file requiring close to 1600 calls against the server, per file.  As noted above, accuracy varies, especially when doing translations of one word commands that could have multiple meanings depending on context.  It was actually suggested that some of these context problems may actually be able to be overcome by using a language other than English as the source, which is a really interesting idea and one that might be worth investigating in the future. 

Seeing how it works

If you are interested in seeing how this works, you can download a sample program which pulls together code copied or cribbed from the Microsoft documentation (and then cleaned for brevity) as well as code on how to use the service from:–Language-Translator.  I’m kicking around the idea of converting the C# code into a ruby gem (which is actually pretty straight forward), so if there is any interest, let me know.


 Posted by at 6:13 pm
Oct 072014

Here’s a snapshot of the server log data as reported through Awstats for the subdomain. 

Server log stats for Sept. 2014:

  • Logged MarcEdit uses: ~190,000
  • Unique Users: ~17,000
  • Bandwidth Used: ~14 GB

Top 10 Countries by Bandwidth:

  1. United States
  2. Canada
  3. China
  4. India
  5. Australia
  6. Great Britain
  7. Mexico
  8. Italy
  9. Spain
  10. Germany

Countries by Use (with at least 100+ reported uses)


United States










Great Britain


















New Zealand






Russian Federation


Hong Kong












Saudi Arabia
















Czech Republic


















El Salvador








European country






United Arab Emirates


South Africa












South Korea
















Slovak Republic






Costa Rica






















Republic of Serbia








Sri Lanka


Puerto Rico


Dominican Republic




















Ivory Coast (Cote D’Ivoire)












Papua New Guinea






Netherlands Antilles










Palestinian Territories


Aland islands















 Posted by at 7:23 pm
Oct 072014

I sent this note to the MarcEdit listserv late last night, early this morning, but forgot to post here.  Over the weekend, the Ohio State University Libraries hosted our second annual hackaton on the campus.  It’s been a great event, and this year, I had one of the early morning shifts (12 am-5 am) so I decided to use the time to do a little hacking myself.  Here’s a list of the changes:

  • Bug Fix: Merge Records Function: When processing using the control number option (or MARC21 primarily utilizing control numbers for matching) the program could merge incorrect data if large numbers of merged records existed without the data specified to be merged.  The tool would pull data from the previous record used and add that data to the matches.  This has been corrected.
  • Bug Fix: Network Task Directory — this tool was always envisioned as a tool that individuals would point to when an existing folder existed.  However, if the folder doesn’t exist prior to pointing to the location, the tool wouldn’t index new tasks.  This has been fixed.
  • Bug Fix: Task Manager (Importing new tasks) — When tasks were imported with multiple referenced task lists, the list could be unassociated from the master task.  This has been corrected.
  • Bug Fix:  If the plugins folder doesn’t exist, the current Plugin Manager doesn’t create one when adding new plugins.  This has been corrected.
  • Bug Fix: MarcValidator UI issue:  When resizing the form, the clipboard link wouldn’t move appropriately.  This has been fixed.
  • Bug Fix: Build Links Tool — relator terms in the 1xx and 7xx field were causing problems.  This has been corrected.
  • Bug Fix: RDA Helper: When parsing 260 fields with multiple copyright dates, the process would only handle one of the dates.  The process has been updated to handle all copyright values embedded in the 260$c.
  • Bug Fix: SQL Explorer:  The last build introduced a regression error so that when using the non-expanded SQL table schema, the program would crash.  This has been corrected.
  • Enhancement:  SQL Explorer expanded schema has been enhanced to include a column id to help track column value relationships.
  • Enhancement: Z39.50 Cataloging within the MarcEditor — when selecting the Z39.50/SRU Client, the program now seemlessly allows users to search using the Z39.50 client and automatically load the results directly into the open MarcEditor window.

Two other specific notes.  First, a few folks on the listserv have noted trouble getting MarcEdit to run on a Mac.  The issue appears to be MONO related.  Version 3.8.0 appears to have neglected to include a file in the build (which caused GUI operations to fail), and 3.10.0 brings the file back, but there was a build error with the component so the issue continues.  The problems are noted in their release notes as known issues and the bug tracker seems to suggest that this has been corrected in the alpha channels, but that doesn’t help anyone right now.  So, I’ve updated the Mac instruction to include a link to MONO 3.6.0, the last version tested as a stand alone install that I know works.  From now on, I will include the latest MONO version tested, and a link to the runtime to hopefully avoid this type of confusion in the future.

Second – I’ve created a nifty plugin related to the LibHub project.  I’ve done a little video recording and will be making that available shortly.  Right now, I’m waiting on some feedback.  The plugin will be initially released to LibHub partners to provide a way for them to move any data into the project for evaluation – but hopefully in time, it will be able to be more made more widely available.

Updates can be downloaded automatically via MarcEdit, or can be found at:

Please remember, if you are running a very old copy of MarcEdit 5.8 or lower, it is best practice to uninstall the application prior to installing 6.0.



 Posted by at 6:47 am

MarcEdit 6.0 Update

 MarcEdit  Comments Off
Sep 222014

This update is coming a little later than I’d hoped, but I’ve been busying myself with a couple of projects that have been consuming some of my off hours.  Today’s update deals with a handful of issues, as well as provides some new functionality. 


  • Bug Fix: Edit Field Function: Field recursion switch (/r) was broken in the last update.  This has been corrected.
  • Enhancement: Edit Field Function: LDR editing support has been added to the function.
  • Enhancement: MarcEditor: Keyboard shortcut for jump to page and jump to records have been added.
  • Enhancement: RDA Helper:  Added a new option to the 260/264 translation that enables users to always utilize a copyright or phonograph symbol.
  • Enhancement: RDA Helper:  Updated the RDA Helper to support the manufacturer or distributor subfields.  When the program encounters these in the 260, the appropriate 264 with second indicator 2 or 3 will be created.
  • Enhancement: RDA Helper:  The new option has been added to the task list.
  • Enhancement: Linked Records Tool:  I’ve added a new option to the Linked Records to allowing the program to embed $0 links to VIAF. 
  • Enhancement: MARCSplit:  The save directory now automatically sets to the desktop rather than the root drive.

You can get the updates via MarcEdit’s automated update tool or at:


 Posted by at 9:39 pm
Sep 022014

I’ve just posted a new update to MarcEdit.  In addition to fixing the following three issues:

  • Check URL crashes when running…this has been fixed.
  • Delimited Text Translator doesn’t show finishing message…fixed
  • Debugging messagebox shows when processing mnemonic files not using MarcEdit’s documented format.

In addition to these three bug fixes, MarcEdit is including a new tool called MARCNext for testing BibFrame principles. Please note, the BibFrame Testbed currently *does not* work on the MAC platform under MONO.  This is due to an incompatibility in the current version of saxon with the runtime.  It appears that downgrading the version will correct the problem, but I need to make sure there are not any unforeseen issues.  I’ll be working to correct this during the week.

I’ve recorded a couple videos documenting the new functionality.  You can find there here:

You can download the update via MarcEdit’s automated update tool or view the MarcEdit downloads page at:


 Posted by at 7:58 pm
Aug 252014

As I noted in my last post (, I’ll be adding a new area to the MarcEdit application called MARCNext.  This will be used to expose a number of research tools for users interested in working with BibFrame data.  In addition to the BibFrame Testbed, I’ll also be releasing a JSON Object Viewer.  The JSON Object Viewer is a specialized viewer designed to parse JSON text and provide an object visualization of the data.  The idea is that this tool could be utilized to render MARC data translated into Bibframe as JSON for easy reading.  However, I’m sure that there will be other uses as well.  I’ve tried to keep the interface simple.  Essentially, you point the tool at a JSON file and the tool will render the file as objects.  From there, you can search and query the data, view the JSON file in Object or Plain text mode, and ultimately, copy data for use elsewhere. 


Some additional testing needs to be done to make sure the program works well when coming across poorly formed data – but this tool will be a part of the next update.


 Posted by at 9:31 pm
Aug 232014

While developing MarcEdit 6, one of the areas that I spent a significant amount of time working on was the MarcEdit Research Toolkit.  The Research Toolkit is an easter egg of sorts – it’s a set of tools and utilities that I’ve developed to support my own personal research interests around library metadata – specifically, around the future of library metadata including topics the current BibFrame testing and linked data.  I’ve kept these tools private because they tend to not be fully realized concepts or ideas and have very little in the way of a user interface.  Just as important, many of these tools represent work being created to engage in the conversation that the library community is having around library metadata formats and standards, so things can and do change or drop out of the conversation and are then removed from my toolkit.

While developing MarcEdit 6, one of the goals of the project was to find a way to make some or parts of these tools available to the general MarcEdit community.  To that end, I’ll be making a new area available within MarcEdit called MARCNext.  MARCNext will provide a space to make proof of concept tools available for anyone to use, and offer a simple to use interface that anyone can use to test new bibliographic concepts like BibFrame. 

Presently, I’m evaluating my current workbench to see which of the available tools can be made public.  I have a handful that I think may be applicable – but will need some time to move them from concept to a utility for public consumption.  With that said, I will be making one tool immediately available as part of the next MarcEdit update, and that will be the BibFrame Testbed.  This is code that utilizes the LC XQuery files being developed and distributed at: with a handful of changes made to provide better support within MarcEdit.  These are my base files that will enable librarians to easily model their MARC metadata in a variety of serializations.  And using this initial work, I’ll likely add some additional serializations to the list. 

I have two goals for making this particular tool available.  First and foremost, I would like to enable anyone that is interested the ability to take their existing library metadata and model it using Bibframe concepts.  Currently, Library of Congress makes available a handful of commandline tools that users can utilize to process their metadata – but these tools tend to not be designed for the average user.  By making this information available in MarcEdit – I’m hoping to lower the barrier so that anyone can model their data and then engage in the larger discussion around this work. 

Secondly, I’m currently engaging in some work with Zepheira and other early implementers to take Bibframe testing mainstream.  Given the number of users working with MarcEdit, it made a lot of sense to provide tools to support this level of integration.  Likewise, by taking the time to move this work from the concept stage, I’ve been able to develop the start of a framework around these concepts. 

So how is this going to work?  On the next update, you will see a new link within the Main MarcEdit Window called MARCNext. 

MarcEdit Main Window

Click on the MARCNext link, and you will be taken to the public version of the Research Toolkit.  At this point, the only tool being made publically available is the BibFrame Testbed, though this will change.

MarcEdit’s MARCNext Window

Selecting the BibFrame Testbed initializes a simple dialog box to allow a user to select from a variety of library metadata types and convert them using BibFrame principles into a user-defined serialization. 

BibFrame Testbed window

As noted above, this test bed will be the first of a handful of tools that I will eventually be making available.  Will they be useful to anyone – who knows.  Honestly, the questions that these tools are working to answer are not ones that come up on the list serv, and at present, aren’t going to help much in one’s daily cataloging work.  But hopefully they will enable every cataloger that wants to, the ability to engage with some of these new metadata concepts and at least take their existing data and see how it may change utilizing different serializations and concepts.

Questions – feel free to ask.


 Posted by at 9:36 pm

MarcEdit 6 Update

 Uncategorized  Comments Off
Aug 152014

So I missed one – I made some changes to how MarcEdit loads data into the MarcEditor to improve performance, especially on newer equipment, and introduced a bug.  When making multiple global updates, the program may (probably) will lose track of the last page of records.  This slipped through my unit tests because the program was reporting the correct number of changes, but when the program analyzed the for indexing, it was dropping the last page.  Oops.  This was introduced in the update posted at 1 am on Aug. 15th.  I’ve corrected the problem and updated my unit tests so that this type of regression shouldn’t occur again.

One question that did come up to me privately was why make this change to begin with.  Primarily, it was about performance.  MarcEdit’s paging process requires the program to index the records within a MARC block.  In the previous approach, this resulted in a lot of disk reads.  Generally, this isn’t a problem, but I’ve had occasion where folks with lower disk speeds have had performance issues.  This change will correct that.  For files under 50 MB, the program will now read the entire file into memory, and process the data in memory to generate paging.  This is a more memory intensive task (the previous method utilized a small amount of memory, whereas the new process can require allocations of 100-120 MB of system memory for processing) but removes the disk reads which is the largest bottleneck within the process.  The effect of this change is a large performance gain.  On my development system, which has a solid state drive, the improvement loading a 50 MB file is over a second, going from 3.3 seconds to 1.8 seconds.  That’s a pretty significant improvement – especially on a system where disk reads tend to happen very quickly.  On my secondary systems, the improvements are more noticeable.  On an Intel I-5 with a non-solid state drive and 6 GB of RAM, the old process took between 3.7 to 4.1 seconds, while the new method loaded the file between 1.6-1.8 seconds.  And on a tablet with an older Atom processor and 2 GB of RAM, the old process took approximately 22 seconds, while the new only 9 seconds.  These are big gains that I hope users will be able to see and benefit from.


Testing Results Old Process

Machine Description File Description Time to Load
I-7 Dell XPS Ultrabook, 8 GB RAM, SSD 45,922 records; 50 MB 1st load: 3.4s
2nd load: 3.3s
3rd load: 3.2s
I-5 Dell Workstation, 6 GB RAM, 7200 rpm HD 45,922 records, 50 MB 1st load: 4.1 s
2nd load: 4.0s
3rd load: 3.8s
Atom 1.5 mhz ACER tablet, 2 GB RAM, SSD 45,922 records; 50 MB 1st load: 27s
2nd load: 22s
3rd load: 23s


Testing Results New Process

Machine Description File Description Time to Load Diff
I-7 Dell XPS Ultrabook, 8 GB RAM, SSD 45,922 records; 50 MB 1st load: 1.4s
2nd load: 1.3s
3rd load: 1.3s
I-5 Dell Workstation, 6 GB RAM, 7200 rpm HD 45,922 records, 50 MB 1st load: 1.8 s
2nd load: 1.6s
3rd load: 1.6s
Atom 1.5 mhz ACER tablet, 2 GB RAM, SSD 45,922 records; 50 MB 1st load: 10.1s
2nd load: 9.6s
3rd load: 9.7s


While the new process appears to provide better performance on many different types of systems, I realize that there may be some system variations that not benefit from this new method.  To that end, I’ve added a new configuration option in the MarcEditor Preferences that will allow users to decide to turn off the new paging method.  By default, this option is selected.


If you update the program via MarcEdit, the download will be offered automatically the next time you use the program.  Otherwise, you can get the update at:



 Posted by at 11:20 pm