This past weekend, I spent a good deal of time getting the MacOS version of MarcEdit synchronized with the Windows and Linux builds. In addition to the updates, there is a significant change to the program that needs to be noted as well.
First, let’s start with the changelog. The following changes were made in this version:
************************************************* ** 2.2.30 ************************************************* * Bug Fix: Delimited Text Translator — when receiving Unix formatted files on Windows, the program may struggle with determining new line data. This has been corrected. * Bug Fix: RDA Helper — when processing copyright information, there are occasions where the output can create double brackets ($c[[) — this should be corrected. * Behavior Change: Delimited Text Translator — I’ve changed the default value from on to off as it applies to ignoring header rows. * Enhancement: System Info (main window) — I’ve added information related to referenced libraries to help with debugging questions. * Bug fix/Behavior Change: Export Tab Delimited Records: Second delimiter insertion should be standardized with all regressions removed. * New Feature: Linked Data Tools: Service Status options have been included so users can check the status of the currently profiled linked data services. * New Feature: Preferences/Networked Tasks: MarcEdit uses a short timeout (0.03 seconds) when determining if a network is available. I’ve had reports of folks using MarcEdit have their network dropped from MarcEdit. This is likely because their network has more latency. In the preferences, you can modify this value. I would never set it above 500 milliseconds (0.05 seconds) because it will cause MarcEdit to freeze when off network, but this will give users more control over their network interactions. * Bug Fix: Swap Field Function: The new enhancement in the swap field function added with the last update didn’t work in all cases. This should close that gap. * Enhancement: Export Tab Delimited Records: Added Configurable third delimiter. * Enhancement: MarcEditor: Improvements in the Page Counting to better support invalid formatted data. * Enhancement: Extract/Delete MARC Records: Added file open button to make it easier to select file for batch search * Bug Fix: Log File locking and inaccessible till closed in very specific instances. * Enhancement: Compiling changes…For the first time, I’ve been able to compile as 64-bit, which has reduced download size. * Bug Fix: Deduplicate Records: The program would thrown an error if the dedup save file was left blank.
Application Architecture Changes
The first thing that I wanted to highlight is that the program is being built as a 64-bit application. This is a significant change to the program. Since the program was ported to MacOS, the program has been compiled as a 32-bit application. This has been necessary due to some of the requirements found in the mono stack. However, over the past year, Microsoft has become very involved in this space (primarily to make it easier to develop IOS applications on Windows via an emulator), and that has lead to the ability to compile MarcEdit as a 64-bit application.
So why do this if the 32-bit version worked? Well, what spurred this on was a conversation that I had with the homebrew maintainers. It appears that they are removing the universal compilation options which will break Z39.50 support in MarcEdit. They suggested making my own tap (which I will likely pursue), but it got me spending time seeing what dependencies were keeping me from compiling directly to 64-bit. It took some doing, but I believe that I’ve gotten all code that necessitated building as 32-bit out of the application, and the build is passing and working.
I’m pointing this out because I could have missed something. My tools for automated testing for the MacOS build are pretty non-existent. So, if you run into a problem, please let me know. Also, as a consequence of compiling only to 64-bit, I’ve been able to reduce the size of the download significantly because I am able to reduce the number of dependencies that I needed to link to. This download should be roughly 38 MB smaller than previous versions.
This weekend, I worked on a couple of updates related to MarcEdit. The updates applicable to the Windows and Linux builds are the following:
6.2.455 * Enhancement: Export Tab Delimited Records: Added Configurable third delimiter. * Enhancement: MarcEditor: Improvements in the Page Counting to better support invalid formatted data. * Enhancement: Extract/Delete MARC Records: Added file open button to make it easier to select file for batch search * Update: Field Count: The record count of the field count can be off if formatting is wrong. I’ve made this better. * Update: Extract Selected Records: Added an option to sort checked items to the top. * Bug Fix: Log File locking and inaccessible till closed in very specific instances.
I had a really interesting question make it into my email the other day. A user had configured MarcEdit to use a networked task folder, and in general, it was working. But then, it wouldn’t. The folder was there, the tasks were there, but the program simply wouldn’t see the network. Maybe that has happened to you – you’ve selected a network task folder, uploaded the changes, and then had MarcEdit fall back into offline mode. So what’s happening?
Well, the culprit here is network latency most likely. Here’s the problem – Windows, by default, will keep trying and trying and trying to connect to a networked folder. By default, the timeout to reconnect to a networked device is over 100 seconds. When you are offline, that would make performance simply unacceptable, because the areas where networked task directories need to be resolved would simply freeze, locking the program. To solve that issue, I have a small function in the application that checks to see if a directory exists (the Networked Task directory), and it sets a timeout. By default, I’ve set the timeout to be 300 milliseconds. This doesn’t sound like a very long time, but it’s ages in network time. All the network has to do is respond to a ping. To support this, I use a function that looks something like this:
private bool VerifyDirectoryExists(Uri uri, int timeout)
var task = new System.Threading.Tasks.Task(() =>
var fi = new System.IO.DirectoryInfo(uri.LocalPath);
return task.Wait(timeout) && task.Result;
If you look at the code, you’ll see I have a timeout that can end a specific thread and thus, allow the program to continue. This is how MarcEdit determines if the network folder is offline. It works well, but if there is significant latency on the network, 300 milliseconds may be too small.
To support users that may run into this problem, I’ve added a new preference. In the Locations tab, I’ve added the ability to change the latency timeout.
By default, this value will remain at 300 milliseconds, but uses have the option to change this value. Users also need to keep in mind, that the timeout is set in milliseconds, and there is a maximum value of 100 seconds (the windows default timeout, which is controlled via the registry). Personally, I would recommend against setting this value above 1 second or (1000 milliseconds), because you will notice program freezing when truly offline. What’s more, I’d argue that if your networks latency requires this kind of setting, using the networking options likely isn’t the best choice given your environment. But these options are now available for users. This feature has was rolled into the Windows version as of Update build 6.2452 and will be moved into the MacOS version of MarcEdit later this week.
6.2.452 * Bug fix/Behavior Change: Export Tab Delimited Records: Second delimiter insertion should be standardized with all regressions removed. * New Feature: Linked Data Tools: Service Status options have been included so users can check the status of the currently profiled linked data services. * New Feature: Preferences/Networked Tasks: MarcEdit uses a short timeout (0.03 seconds) when determining if a network is available. I’ve had reports of folks using MarcEdit have their network dropped from MarcEdit. This is likely because their network has more latency. In the preferences, you can modify this value. I would never set it above 500 milliseconds (0.05 seconds) because it will cause MarcEdit to freeze when off network, but this will give users more control over their network interactions. * Bug Fix: Swap Field Function: The new enhancement in the swap field function added with the last update didn’t work in all cases. This should close that gap.
Over the years, I’ve periodically gotten requests for a much more robust logger in MarcEdit. Currently, when the tool performs a global change, it reports the number of changes made to the user. However, a handful of folks have been wanting much more. Ideally, they’d like to have a log of every change the application makes, which is hard because the program isn’t built that way. I provided the following explanation to the MarcEdit list last week.
The question that has come up a number of times since posting notes about the logger is questions about granularity. There has been a desire to have the tool provide additional information (about the records), more information around change context, and also wondering if this will lead to a preview mode. I think other folks wondered why this process has taken so long to develop. Well, it stems from decisions I make around the development. MarcEdit’s application structure can be summed up by the picture below:
In developing MarcEdit, I have made a number of very deliberate decisions, and one of those is that no one component knows what the other one does. As you can see in this picture, the application parts of MarcEdit don’t actually talk directly to the system components. They are referenced through a messenger, which handles all interactions between the application and the system objects. However, the same is true of communication between the system objects themselves. The editing library, for example, knows nothing about MARC, validation, etc. – it only knows how to parse MarcEdit’s internal file format. Likewise, the MARC library doesn’t know anything about validation, MARC21, or linked data. Those parts live elsewhere. The benefit of this approach is that I can develop each component independent of the other, and avoid breaking changes because all communication runs through the messenger. This gives me a lot of flexibility and helps to enforce MarcEdit’s agnostic view of library data. It’s also how I’ve been able to start including support for linked data components – as far as the tool is concerned, it’s just another format to be messaged.
Of course, the challenge with an approach like this then, is that most of MarcEdit’s functions don’t have a concept of a record. Most functions, for reasons of performance, process data much like an XML sax processor. Fields for edit raise events to denote areas of processing, as do errors, which then push the application into a rescue mode. While this approach allows the tool to process data very quickly, and essentially remove size restrictions for data processing – it introduces issues if, for example, I want to expose a log of the underlying changes. Logs exist – I use them in my debugging, but they exist on a component level, and they are not attached to any particular process. I use messaging identifiers to determine what data I want to evaluate – but these logs are not meant to record a processing history, but rather, record component actions. They can be muddled, but they give me exactly what I need when problems arise. The challenge with developing logging for actual users, is that they would likely want actions associated with records. So, to do that, I’ve added an event handler in the messaging layer. This handles all interaction with the logging subsystem and essentially tracks the internal messaging identifier and assembles data. This means that the logger still doesn’t have a good concept of what a record is, but the messenger does, and can act as a translator.
Anyway – this is how I’ll be providing logging. It will also let me slowly expand the logging beyond the core editing functions if there is interest. It is also how I’ll be able to build services around the log file – to provide parsing and log enhancement, for users that want to add record specific information to a log file, that goes beyond the simple record number identifier that will be used to track changes. This would make log files more permanent (if for example the log was enhanced with a local identifier), but due to the way MarcEdit is designed, and the general lack of a standard control number across all MARC formats (in merging for example, merging on the 001 trips checks of 9 other fields that all could store associated control data), it is my belief that providing ways to enhance the log file after run, while an extra step, will allow me the most flexibility to potentially make greater user of the processing log in the future. It also enables me to continue to keep MARCisms out of the processing library – and focus only on handling data edits.
So that’s pretty much the work in a nut shell. So what do you get. Well, once you turn it on, you get lots of stuff and a few new tools. So, let’s walk through them.
Turning on Logging:
Since Logging only captures changes made within the MarcEditor, you find the logging settings in the MarcEditor Preferences Tab:
Once enabled, the tool will generate a new session in the Log folder each time the Editor starts a new Session. With the logs, come log management. From within the MarcEditor or the Main window, you find the following:
From the MarcEditor, you’ll find in Reports:
Functionally, both areas provide the same functionality, but the MarcEditor reports entry is scoped to the current session logfile and current record file loaded into the Editor (if one is loaded). To manage old sessions, use the entry on the Main Window.
Advanced Log Management
To of the use cases that were laid out for me were the need to be able to enhance logs and the ability to extract only the modified records from a large file. So, I’ve included an Advanced Management tool for just these kinds of queries:
This is an example run from within the MarcEditor.
Anyway – this is a quick write-up. I’ll be recording a couple sessions tomorrow. I’ll also be working to make a new plugin available.
I’ve posted a new update for all versions of MarcEdit, and it’s a large one. It might not look like it from the outside, but it represents close to 3 1/2 months of work. The big change is related to the inclusion of a more detailed change log. Users can turn on logging and see, at a low level, the actual changes made to specific data elements. I’ve also added some additional logging enhancement features to allow users to extract just changed records, or enhance the log files with additional data. For more information, see my next post on the new logging process.
The full change log:
6.2.447 * Enhancement: Z39.50: Sync’ng changes made to support Z39.50 servers that are sending records missing proper encoding guidelines. I’m seeing a lot of these from Voyager…I fixed this in one context in the last update. This should correct it everywhere. * Enhancement: MARCEngine: 008 processing was included when processing MARCXML records in MARC21 to update the 008, particularly the element to note when a record has been truncated. This is causing problems when working with holdings records in MARCXML – so I’ve added code to further distinguish when this byte change is needed. * Enhancement: MarcEdit About Page: Added copy markers to simplify capturing of the Build and Version numbers. * Enhancement: Build New Field: The tool will only create one field per record (regardless of existing field numbers) unless the replace existing value is selected. * Enhancement: Swap Field Function: new option to limit swap operations if not all defined subfields are present. * Bug Fix: MARCValidator: Potential duplicates were being flagged when records had blank fields (or empty fields) in the elements being checked. * Update: MarcEditor: UI responsiveness updates * New Feature: Logging. Logging has been added to the MarcEditor and supports all global functions currently available via the task manager. * New Feature: MarcEditor – Log Manager: View and delete log files. * New Feature: MarcEditor – Log Manager: Advanced Toolset. Ability to enhance logs (add additional marc data) or use the logs to extract just changed records.
One last note, on the downloads page, I’ve added a directly listing that will provide access to the most previous 6.2 builds. I’m doing this partly because some of these changes are so significant, that there may be behavior changes that crop up. If something comes up that is preventing your work – uninstall the application and pull the previous version from the archive and then let me know what isn’t working.
Happy 2017! I hope that everyone had a fine holiday season. I spent some time over the past couple weeks away from MarcEdit and doing a little bit of writing. This put my usual holiday update behind a bit (which I apologize about, since it included a couple bug fixes that folks were waiting for) – but it was nice to spend some time writing, building robots with my son, and otherwise taking it slow.
With that said, I have posted an Update for the Windows/Linux versions of MarcEdit and will be working to update the Mac version (with appropriate fixes and new stuff) later this week. I’m running slightly behind because of the above mentioned robot building. I coach/mentor a middle school robotics team, and their regional competition is this weekend, so it seems most of my free time is being spent with the kids working on their robot and research project.
Anyway – the Changelog:
Update: Z39.50: When downloading records that are in UTF8, the tool will ensure that the LDR byte position related to character encoding is set if the system is supplying MARC21 or USMARC records
Update: MARCEngine: When translating UTF8 conversions, the tool will always check the LDR position and set when appropriate (currently, this only happens when MarcEdit actually does a character conversion).
Bug Fix: Print A Record Per Page — Some record data would be deleted if the last line on a page should have printed onto the next page.
Bug Fix: Exporting Tab Delimited Records — the Secondary delimiter isn’t being placed correctly. This is a regression caused when adding the ability to protect context between subfields on export.
Enhancement: MARCValidator — program will now do a quick record deduplication check and if found, will provide a message letting folks know that duplicate records are likely in the file.
Enhancement: Console Program — added -dups? switch to allow a quick check to see if duplicate records are likely in a file.
Enhancement: Console Program — added an -experimental switch to enable direct processing of tasks. This improves task processing from within the Console, but is still a little experimental. This is the best method to run task editing via Linux.
Enhancement: About Windows — added the update code to the window. This is the code (not the build number) that is most important for me when debugging.
Two specific changes I want to highlight. One is the Console changes. I’ve added two new switches. The first affects running tasks via the console program. I’ve created an –experimental switch to move to a new way of processing tasks that is faster and more consistent on all platforms (Windows/Linux). This will eventually replace the old method, but for now, uses wanting to try the new method will need to use this flag to call it explicitly. So, for example:
That would initiate the new process. Leaving off the –experimental switch would use the older method which simply shells the task to the main MarcEdit.exe application, though, run silently. Since I’m using the console program more and more when helping users setting up automated workflows on Linux, I needed to move this process into the cmarcedit.exe stack – which also improves the speed and reliability, a lot.
The second change deals with the validator. A question on the list this week was complicated because the source file had duplicates (when it shouldn’t have). So, in thinking about this – I added a process to the Validator’s normal validation process that will check for likely duplicates. When you run the validator now, if it encounters likely duplicates, you’ll see a note at the top of the log. For example:
This won’t tell you which records are duplicates, or how many, but it will let you know, up front, that the records likely contain duplicates. Likewise, you can get this same information from the commandline. Run the following:
I’ve posted a small update over the weekend to correct an encoding issue when using the Z39.50 client in batch mode and doing a raw query. You can get the download from the downloads page (http://marcedit.reeset.net/downloads) or via the automated update tool.
Posted a MarcEdit Mac update. This syncs the task management and Edit shortcuts with the Windows version.
************************************************ ** 1.9.45 ************************************************ * Enhancement: Task Manager: Implemented the ability to include Edit Shortcuts in Tasks * Enhancement: Task Manager: Updated Task Manager to complete network task clean up (error messages, file locking) * Enhancement: Preferences: Updated preferences to include dialogs to find files and folders.
In what’s become a bit of a tradition, I took some of my time over the Thanksgiving holiday to work through a few things on my list and put together an update (posted last night). Updates were to all versions of MarcEdit and cover the following topics:
* Enhancement: Dedup Records – addition of a fuzzy match option * Enhancement: Linked Data tweaks to allow for multiple rules files * Bug Fix: Clean Smart Characters can now be embedded in a task * Enhancement: MARC Tools — addition of a MARC=>JSON processing function * Enhancement: MARC Tools — addition of a JSON=>MARC processing function * Behavior Change: SPARQL Browser updates — tweaks make it more simple at this point, but this will let me provide better support * Dependency Updates: Updated Saxon XML Engine * Enhancement: Command-Line Tool: MARC=>JSON; JSON=>MARC processes added to the command-line tool * Enhancement: continued updates to the Automatic updater (due to my webhost continuing to make changes) * removal of some deprecated dependencies
* Enhancement: Dedup Records – addition of a fuzzy match option * Enhancement: Linked Data tweaks to allow for multiple rules files * Enhancement: MARC Tools — addition of a MARC=>JSON processing function * Enhancement: MARC Tools — addition of a JSON=>MARC processing function * Behavior Change: SPARQL Browser updates — tweaks make it more simple at this point, but this will let me provide better support * Dependency Updates: Updated Saxon XML Engine * Enhancement: continued updates to the Automatic updater (due to my webhost continuing to make changes) * Enhancement: Linked data enhancement — allow selective collection processing * Enhancement: MarcEditor: Smart Character Cleaner added to the Edit ShortCuts menu * removal of some deprecated dependencies
Couple notes about the removal of deprecated dependencies. These were mostly related to a SPARQL library that I’d been using – but having some trouble with due to changes a few institutions have been making. It mostly was a convenience set of tools for me, but they were big and bulky. So, I’m rebuilding exactly what I need from core components and shedding the parts that I don’t require.
Couple other notes – I’ll be working this week on adding the Edit Shortcuts functionality into the Mac versions task manager (that will bring the Windows and Mac version back together). I’ll also be working to do a little video recording on some of the new stuff just to provide some quick documentation on the changes.
You can download from the website: http://marcedit.reeset.net/downloads or assuming my webhost hasn’t broke it, the automatic downloader. And I should not, the automatic downloader will now work differently – it will attempt to do a download, but if my host causes issues, it will automatically direct your browser to the file for download following this update.