MarcEdit’s new Logging Features
Over the years, I’ve periodically gotten requests for a much more robust logger in MarcEdit. Currently, when the tool performs a global change, it reports the number of changes made to the user. However, a handful of folks have been wanting much more. Ideally, they’d like to have a log of every change the application makes, which is hard because the program isn’t built that way. I provided the following explanation to the MarcEdit list last week.
The question that has come up a number of times since posting notes about the logger is questions about granularity. There has been a desire to have the tool provide additional information (about the records), more information around change context, and also wondering if this will lead to a preview mode. I think other folks wondered why this process has taken so long to develop. Well, it stems from decisions I make around the development. MarcEdit’s application structure can be summed up by the picture below:
In developing MarcEdit, I have made a number of very deliberate decisions, and one of those is that no one component knows what the other one does. As you can see in this picture, the application parts of MarcEdit don’t actually talk directly to the system components. They are referenced through a messenger, which handles all interactions between the application and the system objects. However, the same is true of communication between the system objects themselves. The editing library, for example, knows nothing about MARC, validation, etc. – it only knows how to parse MarcEdit’s internal file format. Likewise, the MARC library doesn’t know anything about validation, MARC21, or linked data. Those parts live elsewhere. The benefit of this approach is that I can develop each component independent of the other, and avoid breaking changes because all communication runs through the messenger. This gives me a lot of flexibility and helps to enforce MarcEdit’s agnostic view of library data. It’s also how I’ve been able to start including support for linked data components – as far as the tool is concerned, it’s just another format to be messaged.
Of course, the challenge with an approach like this then, is that most of MarcEdit’s functions don’t have a concept of a record. Most functions, for reasons of performance, process data much like an XML sax processor. Fields for edit raise events to denote areas of processing, as do errors, which then push the application into a rescue mode. While this approach allows the tool to process data very quickly, and essentially remove size restrictions for data processing – it introduces issues if, for example, I want to expose a log of the underlying changes. Logs exist – I use them in my debugging, but they exist on a component level, and they are not attached to any particular process. I use messaging identifiers to determine what data I want to evaluate – but these logs are not meant to record a processing history, but rather, record component actions. They can be muddled, but they give me exactly what I need when problems arise. The challenge with developing logging for actual users, is that they would likely want actions associated with records. So, to do that, I’ve added an event handler in the messaging layer. This handles all interaction with the logging subsystem and essentially tracks the internal messaging identifier and assembles data. This means that the logger still doesn’t have a good concept of what a record is, but the messenger does, and can act as a translator.
Anyway – this is how I’ll be providing logging. It will also let me slowly expand the logging beyond the core editing functions if there is interest. It is also how I’ll be able to build services around the log file – to provide parsing and log enhancement, for users that want to add record specific information to a log file, that goes beyond the simple record number identifier that will be used to track changes. This would make log files more permanent (if for example the log was enhanced with a local identifier), but due to the way MarcEdit is designed, and the general lack of a standard control number across all MARC formats (in merging for example, merging on the 001 trips checks of 9 other fields that all could store associated control data), it is my belief that providing ways to enhance the log file after run, while an extra step, will allow me the most flexibility to potentially make greater user of the processing log in the future. It also enables me to continue to keep MARCisms out of the processing library – and focus only on handling data edits.
So that’s pretty much the work in a nut shell. So what do you get. Well, once you turn it on, you get lots of stuff and a few new tools. So, let’s walk through them.
Turning on Logging:
Since Logging only captures changes made within the MarcEditor, you find the logging settings in the MarcEditor Preferences Tab:
Once enabled, the tool will generate a new session in the Log folder each time the Editor starts a new Session. With the logs, come log management. From within the MarcEditor or the Main window, you find the following:
From the MarcEditor, you’ll find in Reports:
Functionally, both areas provide the same functionality, but the MarcEditor reports entry is scoped to the current session logfile and current record file loaded into the Editor (if one is loaded). To manage old sessions, use the entry on the Main Window.
Advanced Log Management
To of the use cases that were laid out for me were the need to be able to enhance logs and the ability to extract only the modified records from a large file. So, I’ve included an Advanced Management tool for just these kinds of queries:
This is an example run from within the MarcEditor.
Anyway – this is a quick write-up. I’ll be recording a couple sessions tomorrow. I’ll also be working to make a new plugin available.