Over the past 3 months, I’ve been embarking on a significant effort to redesign one of the fundamental workflows in the MarcEdit software — that being the way manual and global edits are handled.
Currently
In all versions of MarcEdit, global edits and manual edits are handled differently within the Editor. This is largely due to the desire to have the ability to undo global edits. This means that internally, MarcEdit tracks all changes made to a file and keeps a secondary copy of the data in state. The issue occurs when a user mixes global and manual edits — depending on the sequence that these events occur, the manual edits may not take. This is why the best practice when using the tool has been to save when going between manual and global editing. Well, no more…
In Development
Since Nov. 2018, I’ve been rewriting the internal processing for the MarcEditor, reconfiguring how changes are tracked, items are paged, and edits are managed. This has resulted in significant improvements in file loading (i.e, MarcEdit now can load ~120,000 records per second versus 12,000 in previous builds) and has resulted in the correction of small quirks that have bugged me for some time. It’s also leading me to address other parts of the program — like evaluating sorting, validation, and reporting — to ensure that these processes take advantage of the newer event queues. In total, I’ve removed ~27,000 lines of legacy/older code and replaced them with ~12,000 lines of rewritten functions. The reduction in code is actually a good thing, as it has meant that I’ve been able to pull out enhancements that kind of get added ad hoc and restructure the underlying abstract classes to be more reusable and resilient.
Status
Since early Jan. 2019, there has been a beta MarcEdit 7 stream that has been dedicated to dealing with these issues. A variety of folks have been testing and providing feedback. Additionally, I’ve been in the mist of a consulting project which has allowed me to really push the new version and identify issues proactively as I’m currently spending a lot of time working in the MarcEditor building some workflows for an organization. At this point, I believe that most issues have been sorted, and the tool is nearly ready to move into the production stream. I updated the beta stream to 7.1.93 last night — this closed some remaining issues that were occurring because the function doing consolidation of manual edits wasn’t preserving the reindexed file — so data in the editor looked incorrect, but the sourced (stated data) was fine. This was leading to some odd results — that’s been corrected and in the past week prior to pushing the build, I’ve run across no additional issues will processing nearly 500,000 records through a combination of manual, global, and task edits.
New stuff
In addition to the changes to the editor, I’ve been updating the program as well. I’ve enhanced the watcher, adding new actions and building in FTP/SFTP support. I’ve updated the Validator to add LDR checks outside of the structure test. I’ve made a lot of changes — I’m including the change log of most of the updates made through the beta build cycle below:
ChangeLogs:
7.1.93
- removed most debugging messages, and those that remain shouldn’t be intrusive
- finished cleaning consolidation code when working with manual/global edits
- ftp/sftp components added to watcher
- watcher actions added (join/marcxml)
- validator — added ldr trap when doing non-structure validation so records where an ldr doesn’t appear as the first field or appears in the middle of a record will generate a critical reported error.
- update the dedup records — lamda expression to allow for last record seems to have issues when there are imperfections in the data.
- Z39.50 — if database config is deleted — code will now put it back.
7.1.92
- updated paging errors showing up in the new paging queue
7.1.91
- Cleaning up new line buffers (created for testing)
- Added sorting to the page pump
- Updated validator to clean up counting so blank lines don’t artificially raise the number of record numbers reported processed (don’t impact the actual log file though)
7.1.90
- beta stream created — new workflow code implemented
Based on my own testing and feedback, I think this is getting close to coming out of the beta stream. My hope is to see this work completed within the next week — which means that I may end up with one more beta build in the stream if something comes up. Tentatively, I’m looking at the version number to be 7.1.100 when the build moves out of the beta stream and into the production line.
Questions — let me know.
–tr