I had a really interesting question make it into my email the other day. A user had configured MarcEdit to use a networked task folder, and in general, it was working. But then, it wouldn’t. The folder was there, the tasks were there, but the program simply wouldn’t see the network. Maybe that has happened to you – you’ve selected a network task folder, uploaded the changes, and then had MarcEdit fall back into offline mode. So what’s happening?
Well, the culprit here is network latency most likely. Here’s the problem – Windows, by default, will keep trying and trying and trying to connect to a networked folder. By default, the timeout to reconnect to a networked device is over 100 seconds. When you are offline, that would make performance simply unacceptable, because the areas where networked task directories need to be resolved would simply freeze, locking the program. To solve that issue, I have a small function in the application that checks to see if a directory exists (the Networked Task directory), and it sets a timeout. By default, I’ve set the timeout to be 300 milliseconds. This doesn’t sound like a very long time, but it’s ages in network time. All the network has to do is respond to a ping. To support this, I use a function that looks something like this:
private bool VerifyDirectoryExists(Uri uri, int timeout)
var task = new System.Threading.Tasks.Task(() =>
var fi = new System.IO.DirectoryInfo(uri.LocalPath);
return task.Wait(timeout) && task.Result;
If you look at the code, you’ll see I have a timeout that can end a specific thread and thus, allow the program to continue. This is how MarcEdit determines if the network folder is offline. It works well, but if there is significant latency on the network, 300 milliseconds may be too small.
To support users that may run into this problem, I’ve added a new preference. In the Locations tab, I’ve added the ability to change the latency timeout.
By default, this value will remain at 300 milliseconds, but uses have the option to change this value. Users also need to keep in mind, that the timeout is set in milliseconds, and there is a maximum value of 100 seconds (the windows default timeout, which is controlled via the registry). Personally, I would recommend against setting this value above 1 second or (1000 milliseconds), because you will notice program freezing when truly offline. What’s more, I’d argue that if your networks latency requires this kind of setting, using the networking options likely isn’t the best choice given your environment. But these options are now available for users. This feature has was rolled into the Windows version as of Update build 6.2452 and will be moved into the MacOS version of MarcEdit later this week.
Over the years, I’ve periodically gotten requests for a much more robust logger in MarcEdit. Currently, when the tool performs a global change, it reports the number of changes made to the user. However, a handful of folks have been wanting much more. Ideally, they’d like to have a log of every change the application makes, which is hard because the program isn’t built that way. I provided the following explanation to the MarcEdit list last week.
The question that has come up a number of times since posting notes about the logger is questions about granularity. There has been a desire to have the tool provide additional information (about the records), more information around change context, and also wondering if this will lead to a preview mode. I think other folks wondered why this process has taken so long to develop. Well, it stems from decisions I make around the development. MarcEdit’s application structure can be summed up by the picture below:
In developing MarcEdit, I have made a number of very deliberate decisions, and one of those is that no one component knows what the other one does. As you can see in this picture, the application parts of MarcEdit don’t actually talk directly to the system components. They are referenced through a messenger, which handles all interactions between the application and the system objects. However, the same is true of communication between the system objects themselves. The editing library, for example, knows nothing about MARC, validation, etc. – it only knows how to parse MarcEdit’s internal file format. Likewise, the MARC library doesn’t know anything about validation, MARC21, or linked data. Those parts live elsewhere. The benefit of this approach is that I can develop each component independent of the other, and avoid breaking changes because all communication runs through the messenger. This gives me a lot of flexibility and helps to enforce MarcEdit’s agnostic view of library data. It’s also how I’ve been able to start including support for linked data components – as far as the tool is concerned, it’s just another format to be messaged.
Of course, the challenge with an approach like this then, is that most of MarcEdit’s functions don’t have a concept of a record. Most functions, for reasons of performance, process data much like an XML sax processor. Fields for edit raise events to denote areas of processing, as do errors, which then push the application into a rescue mode. While this approach allows the tool to process data very quickly, and essentially remove size restrictions for data processing – it introduces issues if, for example, I want to expose a log of the underlying changes. Logs exist – I use them in my debugging, but they exist on a component level, and they are not attached to any particular process. I use messaging identifiers to determine what data I want to evaluate – but these logs are not meant to record a processing history, but rather, record component actions. They can be muddled, but they give me exactly what I need when problems arise. The challenge with developing logging for actual users, is that they would likely want actions associated with records. So, to do that, I’ve added an event handler in the messaging layer. This handles all interaction with the logging subsystem and essentially tracks the internal messaging identifier and assembles data. This means that the logger still doesn’t have a good concept of what a record is, but the messenger does, and can act as a translator.
Anyway – this is how I’ll be providing logging. It will also let me slowly expand the logging beyond the core editing functions if there is interest. It is also how I’ll be able to build services around the log file – to provide parsing and log enhancement, for users that want to add record specific information to a log file, that goes beyond the simple record number identifier that will be used to track changes. This would make log files more permanent (if for example the log was enhanced with a local identifier), but due to the way MarcEdit is designed, and the general lack of a standard control number across all MARC formats (in merging for example, merging on the 001 trips checks of 9 other fields that all could store associated control data), it is my belief that providing ways to enhance the log file after run, while an extra step, will allow me the most flexibility to potentially make greater user of the processing log in the future. It also enables me to continue to keep MARCisms out of the processing library – and focus only on handling data edits.
So that’s pretty much the work in a nut shell. So what do you get. Well, once you turn it on, you get lots of stuff and a few new tools. So, let’s walk through them.
Turning on Logging:
Since Logging only captures changes made within the MarcEditor, you find the logging settings in the MarcEditor Preferences Tab:
Once enabled, the tool will generate a new session in the Log folder each time the Editor starts a new Session. With the logs, come log management. From within the MarcEditor or the Main window, you find the following:
From the MarcEditor, you’ll find in Reports:
Functionally, both areas provide the same functionality, but the MarcEditor reports entry is scoped to the current session logfile and current record file loaded into the Editor (if one is loaded). To manage old sessions, use the entry on the Main Window.
Advanced Log Management
To of the use cases that were laid out for me were the need to be able to enhance logs and the ability to extract only the modified records from a large file. So, I’ve included an Advanced Management tool for just these kinds of queries:
This is an example run from within the MarcEditor.
Anyway – this is a quick write-up. I’ll be recording a couple sessions tomorrow. I’ll also be working to make a new plugin available.
I’ve posted a new update for all versions of MarcEdit, and it’s a large one. It might not look like it from the outside, but it represents close to 3 1/2 months of work. The big change is related to the inclusion of a more detailed change log. Users can turn on logging and see, at a low level, the actual changes made to specific data elements. I’ve also added some additional logging enhancement features to allow users to extract just changed records, or enhance the log files with additional data. For more information, see my next post on the new logging process.
The full change log:
6.2.447 * Enhancement: Z39.50: Sync’ng changes made to support Z39.50 servers that are sending records missing proper encoding guidelines. I’m seeing a lot of these from Voyager…I fixed this in one context in the last update. This should correct it everywhere. * Enhancement: MARCEngine: 008 processing was included when processing MARCXML records in MARC21 to update the 008, particularly the element to note when a record has been truncated. This is causing problems when working with holdings records in MARCXML – so I’ve added code to further distinguish when this byte change is needed. * Enhancement: MarcEdit About Page: Added copy markers to simplify capturing of the Build and Version numbers. * Enhancement: Build New Field: The tool will only create one field per record (regardless of existing field numbers) unless the replace existing value is selected. * Enhancement: Swap Field Function: new option to limit swap operations if not all defined subfields are present. * Bug Fix: MARCValidator: Potential duplicates were being flagged when records had blank fields (or empty fields) in the elements being checked. * Update: MarcEditor: UI responsiveness updates * New Feature: Logging. Logging has been added to the MarcEditor and supports all global functions currently available via the task manager. * New Feature: MarcEditor – Log Manager: View and delete log files. * New Feature: MarcEditor – Log Manager: Advanced Toolset. Ability to enhance logs (add additional marc data) or use the logs to extract just changed records.
One last note, on the downloads page, I’ve added a directly listing that will provide access to the most previous 6.2 builds. I’m doing this partly because some of these changes are so significant, that there may be behavior changes that crop up. If something comes up that is preventing your work – uninstall the application and pull the previous version from the archive and then let me know what isn’t working.
Happy 2017! I hope that everyone had a fine holiday season. I spent some time over the past couple weeks away from MarcEdit and doing a little bit of writing. This put my usual holiday update behind a bit (which I apologize about, since it included a couple bug fixes that folks were waiting for) – but it was nice to spend some time writing, building robots with my son, and otherwise taking it slow.
With that said, I have posted an Update for the Windows/Linux versions of MarcEdit and will be working to update the Mac version (with appropriate fixes and new stuff) later this week. I’m running slightly behind because of the above mentioned robot building. I coach/mentor a middle school robotics team, and their regional competition is this weekend, so it seems most of my free time is being spent with the kids working on their robot and research project.
Anyway – the Changelog:
Update: Z39.50: When downloading records that are in UTF8, the tool will ensure that the LDR byte position related to character encoding is set if the system is supplying MARC21 or USMARC records
Update: MARCEngine: When translating UTF8 conversions, the tool will always check the LDR position and set when appropriate (currently, this only happens when MarcEdit actually does a character conversion).
Bug Fix: Print A Record Per Page — Some record data would be deleted if the last line on a page should have printed onto the next page.
Bug Fix: Exporting Tab Delimited Records — the Secondary delimiter isn’t being placed correctly. This is a regression caused when adding the ability to protect context between subfields on export.
Enhancement: MARCValidator — program will now do a quick record deduplication check and if found, will provide a message letting folks know that duplicate records are likely in the file.
Enhancement: Console Program — added -dups? switch to allow a quick check to see if duplicate records are likely in a file.
Enhancement: Console Program — added an -experimental switch to enable direct processing of tasks. This improves task processing from within the Console, but is still a little experimental. This is the best method to run task editing via Linux.
Enhancement: About Windows — added the update code to the window. This is the code (not the build number) that is most important for me when debugging.
Two specific changes I want to highlight. One is the Console changes. I’ve added two new switches. The first affects running tasks via the console program. I’ve created an –experimental switch to move to a new way of processing tasks that is faster and more consistent on all platforms (Windows/Linux). This will eventually replace the old method, but for now, uses wanting to try the new method will need to use this flag to call it explicitly. So, for example:
That would initiate the new process. Leaving off the –experimental switch would use the older method which simply shells the task to the main MarcEdit.exe application, though, run silently. Since I’m using the console program more and more when helping users setting up automated workflows on Linux, I needed to move this process into the cmarcedit.exe stack – which also improves the speed and reliability, a lot.
The second change deals with the validator. A question on the list this week was complicated because the source file had duplicates (when it shouldn’t have). So, in thinking about this – I added a process to the Validator’s normal validation process that will check for likely duplicates. When you run the validator now, if it encounters likely duplicates, you’ll see a note at the top of the log. For example:
This won’t tell you which records are duplicates, or how many, but it will let you know, up front, that the records likely contain duplicates. Likewise, you can get this same information from the commandline. Run the following:
I’ve posted a small update over the weekend to correct an encoding issue when using the Z39.50 client in batch mode and doing a raw query. You can get the download from the downloads page (http://marcedit.reeset.net/downloads) or via the automated update tool.
Posted a MarcEdit Mac update. This syncs the task management and Edit shortcuts with the Windows version.
************************************************ ** 1.9.45 ************************************************ * Enhancement: Task Manager: Implemented the ability to include Edit Shortcuts in Tasks * Enhancement: Task Manager: Updated Task Manager to complete network task clean up (error messages, file locking) * Enhancement: Preferences: Updated preferences to include dialogs to find files and folders.
In what’s become a bit of a tradition, I took some of my time over the Thanksgiving holiday to work through a few things on my list and put together an update (posted last night). Updates were to all versions of MarcEdit and cover the following topics:
* Enhancement: Dedup Records – addition of a fuzzy match option * Enhancement: Linked Data tweaks to allow for multiple rules files * Bug Fix: Clean Smart Characters can now be embedded in a task * Enhancement: MARC Tools — addition of a MARC=>JSON processing function * Enhancement: MARC Tools — addition of a JSON=>MARC processing function * Behavior Change: SPARQL Browser updates — tweaks make it more simple at this point, but this will let me provide better support * Dependency Updates: Updated Saxon XML Engine * Enhancement: Command-Line Tool: MARC=>JSON; JSON=>MARC processes added to the command-line tool * Enhancement: continued updates to the Automatic updater (due to my webhost continuing to make changes) * removal of some deprecated dependencies
* Enhancement: Dedup Records – addition of a fuzzy match option * Enhancement: Linked Data tweaks to allow for multiple rules files * Enhancement: MARC Tools — addition of a MARC=>JSON processing function * Enhancement: MARC Tools — addition of a JSON=>MARC processing function * Behavior Change: SPARQL Browser updates — tweaks make it more simple at this point, but this will let me provide better support * Dependency Updates: Updated Saxon XML Engine * Enhancement: continued updates to the Automatic updater (due to my webhost continuing to make changes) * Enhancement: Linked data enhancement — allow selective collection processing * Enhancement: MarcEditor: Smart Character Cleaner added to the Edit ShortCuts menu * removal of some deprecated dependencies
Couple notes about the removal of deprecated dependencies. These were mostly related to a SPARQL library that I’d been using – but having some trouble with due to changes a few institutions have been making. It mostly was a convenience set of tools for me, but they were big and bulky. So, I’m rebuilding exactly what I need from core components and shedding the parts that I don’t require.
Couple other notes – I’ll be working this week on adding the Edit Shortcuts functionality into the Mac versions task manager (that will bring the Windows and Mac version back together). I’ll also be working to do a little video recording on some of the new stuff just to provide some quick documentation on the changes.
You can download from the website: http://marcedit.reeset.net/downloads or assuming my webhost hasn’t broke it, the automatic downloader. And I should not, the automatic downloader will now work differently – it will attempt to do a download, but if my host causes issues, it will automatically direct your browser to the file for download following this update.
I posted an update to the Linux and Windows versions of MarcEdit. I had hoped to finish work on the Mac as well, but I have, I would guess, about 5 hours of interface work to finish before that is completed and ready to be made available. I’ll be endeavoring to get that completed over this next week during my spare time in the evenings. I think, I should be able to have it done by Wed.
The updates really are more refinements to existing functionality. The full list:
Enhancement: Edit Shortcuts; added a new option to clean smart characters.
Enhancement: Edit Shortcuts (many) have been enable for use with the Task Manager.
Enhancement: MARCNext; Updated the linking tool to enable individual processing in both the UI and the Console program.
Enhancement: Application Error message is now in a window that can be copied.
Enhancement: RDA Helper; GMD in linking fields will be deleted if the GMD is selected for deletion.
Enhancement: Tab Delimited Records; add multiple subfields (without the delimiter character) to pull multiple subfields into the same column.
Bug Fixes (miscellaneous).
While these updates represented enhancements to existing functionality, a few of them required significant work to implement – especially bringing some of the Edit Shortcuts into the task manager. That required a significant rethinking of how these “shortcuts” get run to fit within the task infrastructure – but I think that I’ve got these set. I will point out, that not all the edit shortcuts have been enabled. In the task manager, you will see placeholders for the Clean Smart Characters and the Math Functions. These were not integrated yet into the tool – I’ll complete these with an update around the same time I update the Mac version of MarcEdit – but I put this out now because folks on the listserv wanted the Clean Smart Characters function sooner than later.
I’ve create a couple of videos to note the new functionality. These should give a better idea as to how these refinements have been implemented and give users a good idea of how these changes are made manifest. Please see the videos below.
This round of MarcEdit updates focused on the Task Manager/Task Editing. After talking to some folks, I really tried to do some work to make it easier for folks when sharing network tasks. Change logs:
Enhancement: Task List will preserve a back up task list before save, and will restore if the original task list is deleted or zero bytes.
Enhancement:Task List: Added .lock files to prevent multiple users from editing files on the network at the same time.
Below are two messages from a conversation on the MarcEdit list around the Task Manager/Task Functionality in MarcEdit. I’ve been discussing some changes to the way that this works – particularly related to how the networked task management works. Since this is a widely used function of the application, I’m pulling these two messages out and making them available here. If you see something that gives you pause, let me know.
Following up – I spent a bit of time working on this and as far as I can tell, everything now works as I’ve described below. This definitely complicates the management of tasks a bit on my end – but after doing a bit of work – here’s the results. Please, look over the notes here and below and let me know if you see any potential issues. If I don’t hear anything, I’ll include this in my update over the weekend.
Simplifying Network Sharing:
When you setup network sharing, the tool now recognizes if your path is a networked path (or something else) and will automatically do a couple things if it’s a networked path:
Adds an import option to the Preferences (option only shows if your path is a network path)
If you click the copy option, it will create a .network option in the local tasks folder and then move a copy of all your items into the network space and into the _tasks.txt file on the network.
On startup, MarcEdit automatically will update your .network task folder with the current data in the defined network folder.
When off-line, MarcEdit automatically will use the data in the .network folder (a local cached copy of your networked data)
When off-line and using a networked path, if you select the task manager, you will see the following:
When you have a networked folder, MarcEdit creates the local .network cache as read-only. You’ll see this is enforced in the editor.
Changes to the _tasks.txt file – file paths are no longer embedded. Just the name of the file. The assumption is that all files will be found in the defined tasks directory. And the program will determine if there is an old path or just a filename, and will ignore any old path information, extracting just the filename and using it with the defined task paths now managed by the program.
Within the TASKS themselves, MarcEdit no longer stores paths to task lists. Like the _tasks.txt file, just the name of the task file is stored, with the assumption being that the task list will be in the task defined file. This means imports and exports can be done through the tool, or just by copying and pasting files into the managed task folder.
Finally – in Windows, networked drives can cause long delays when they are offline. I’ve created a somewhat novel approach to checking this data – but it means that I’ve set a low timeout (~150 mms) – that means if your network has a lot of latency (i.e., not responsive), MarcEdit could think it’s offline. I can make this value configurable if necessary, but the timeout here really has an impact on parts of the program because the tasks data is read into the UI of the Editor. In my testing, the timeouts appear to be appropriate.
These changes have just been implemented on the windows/linux side right now – I’ll move them to the mac version tonight.
If you’d like to try these changes, let me know – I won’t release these till the weekend, but would be willing to build a test build for folks interested in testing how this works.
Finally, I know how much the tasks function is used, so I’m giving time for feedback before going forward. If you see something described that you think may be problematic, let me know.
From: Terry Reese To:MARCEDIT-L@listserv.gmu.edu Cc: Subject: Proposed TASK changes…Feedback requested. [Re: Saving tasks lists on the network….]
Following up on this – I’m looking at a behavior change and want to make certain that this won’t cause anyone problems. Let me explain how the tasks works currently (for access) and how I’ve reworked the code. Please let me know if anyone is using a workflow where this might be problematic.
When a task is created or cloned, the task and its full path is saved to the _tasks.txt file (in your local preferences). Example:
Main task with a whole lot of text is right here C:\Users\reese.2179\AppData\Roaming\marcedit\macros\tasksfile-2016_03_18_052022564.txt SHIFT+F1 AAR Library C:\Users\reese.2179\AppData\Roaming\marcedit\macros\tasksfile-2016_03_18_052317664.txt TLA C:\Users\reese.2179\AppData\Roaming\marcedit\macros\tasksfile-2016_04_19_113756870.txt
This is a tab delimited list – with the first value being the name, the second value the path to the task, the first being a shortcut if assigned. The task naming convention has changed a number of times through the years. Currently, I use GUIDs to prevent any possibility of collision between users. This is important when saving to the network.
When a user saves to the network, internally, the tool just changes where it looks for the _tasks.txt file. But data is still saved in that location using the full network path. So, if users potentially have different drive numbers, or someone uses a UNC path and someone else doesn’t, then you have problems. Also, when setting up a network folder, if you have existing tasks, you have to import and export them into the folder. Depending on how deep your task references are, that may require some manual intervention.
If you are off the network and you save to a network path, your tasks are simply unavailable unless you keep a copy in your local store (which most wouldn’t).
Here’s what I propose to do. This will require changing the import and export processes (trivial), making a change to the preferences (trivial), and updating the run code for the macros (little less trivial).
First, the _tasks.txt file will have all filepaths removed. Items will only be referenced by filenames. Example:
Main task with a whole lot of text is right here tasksfile-2016_03_18_052022564.txt SHIFT+F1 AAR Library tasksfile-2016_03_18_052317664.txt TLA tasksfile-2016_04_19_113756870.txt
The program already sets the taskfile path so that it can run tasks that are on the network, this will continue that process of normalization. Now, no file paths will be part of the string, and access to task locations will be completely tied to the information set in the preferences.
This will make tasks more portable as I no longer have to fix paths.
Makes it easier to provide automatic import of tasks when first setting up a network location
Allows me, for the first time, to provide local caching of networked caches so that if a user is configured to use a network location, is offline, their tasks will continue to be available. At this point, I’m thinking caches would be updated when the user opens MarcEdit, or makes a change to a network task. The assumption, for simplicities sake, is that if you are using a network drive, you cannot edit local networked tasks – and I’ll likely find a way to enforce that to avoid problems with caching and updates.
If you use a networked task, you won’t be able to edit the offline cache when you are offline. You’ll still have access to your tasks (which isn’t true today), but you won’t be allowed to change them.
This isn’t possible with the task manager, but in theory, someone could code tasks to live in locations other than the places MarcEdit manages. This would go away. All tasks would need to be managed within the defined network path, or the local tasks folder.
Using GUIDs for filenames, it shouldn’t happen – but there is always the theoretical chance of duplicate files being created. I’ll need to ensure I keep checks to make sure that doesn’t happen, which complicates management slightly.
Second – Tasks themselves….
Presently, when a task references a task list, it uses the full path. Again, the path will be removed to just the filename. The same pros and cons apply here as above.
For users using tasks, you honestly shouldn’t see a difference. The first time you’d run the program, MarcEdit would likely modify the tasks that you have (network or otherwise) and things would just go as is. If you are creating a new networked folder, you’d see a new option in the locations preference that would allow you to automatically copy your current tasks to the network when setting that up. And, if you are offline and a networked user, you’ll find that you now have access to your networked tasks. Though, this part represents one more behavior change – presently, if you have networked tasks, you can take yourself offline and create and run tasks local to you. In the above format, that goes away since MarcEdit will be caching your network tasks locally in a .network location. If the current functionality is still desired (i.e., folks have workflows where they have are connected to the network for shared tasks, but disconnect to run local only to them tasks), I may be able to setup something so that the task runner checks both the .network and local task directories. My preference would be to not to, but I understand that the current behavior has been around for years now, and I really would like to minimize the impact of making these changes.