Because I change version numbers so rarely when it comes to MarcEdit, I usually like to take the major version numbers as an opportunity to look at how some of the core code works, and this time is no different. One of the things that I’ve occasionally heard is that Opening and Saving larger files in the MarcEditor can be slow. I guess, before I talk about some of the early metrics, I’d like to explain how the MarcEditor works, because it works differently than a normal text editor in order to allow users to work with files of any size.
Opening records in the MarcEditor
When you open the MarcEditor, the program utilizes one of two modes to read files into the editing screen: Preview and Paging.
Preview mode has been designed specifically for really large files – but the caveat is that when in Preview mode, the editor gets locked into Read Only mode. This means you can’t type in the Editor, but you can use any of the Editing functions to change the file. The benefit of the Preview mode is you remove the need to load the file (which is an expensive process).
Paging mode is the editing mode enabled by default. This mode breaks files into pages, meaning that MarcEdit must first, read the file to determine the number of records, and create an internal directory of page start and end locations. Once that is accomplished, the program then renders data onto the screen. The pages created are all virtual (they don’t exist), unless a user actually edits (typing onto the screen) information on a page. Global edits affect the whole file, so the file get’s re-paged after every global edit.
The paging mode is by far the best rendering mode for data under, say, 150 MBs (in MarcEdit 6). This is because at around 150 MB, it starts taking a lot longer to create the virtual pages. And depending on your operating system, and hard drive type, this process could be really expensive. I’ve found on older equipment (non-Solid State (SD) drives), this process can really slow down reading and writing because so many disk accesses have to occur when creating pages (even virtually).
Saving records in the MarcEditor
Saving files essentially does the paging operation in reverse, though now, rather than a virtual page, the program does have to access the file and extract the page content for every virtual page in existence. Again, if you have a non-SD drive or an older 5400 rpm drive, this can by a slow process. If your operating system is already having disk usage issues (and older computers upgraded to Windows 10 have many of these), this can slow the process considerably.
MarcEdit 7 Enhancements
In thinking about how this process works, I started wondering how I could improve file operations in MarcEdit 7. Obviously, the easiest way to improve the open and save processes would be to remove as many disk operations as possible. The fewer file operations, the faster the process. so, I started looking. Now, one of the benefits of updating to the new version of .NET, is that I have access to some new programming concepts. One of these new elements are Thread Tasks to initiate Parallel processes in C# (though, I’ve found these must be handled with care, or I can really cause disk issue as threads spawn too quickly) and the other are simply lamba expressions that enable the compiler to optimize the operations code. With this in mind, I started working.
For the purpose of this benchmark, I’m using an Dell Inspiron 13, with an i-5 processor, SD drive, and 16 GB of RAM.
Reading Data into the MarcEditor
In order to speed up the reading operation, I had to reduce the number of file operations that were being run on the system. To do this, I made two significant changes.
When MarcEdit’s Enhanced File reading mode is enabled, MarcEdit reads files under 60 MB into memory. Using Parallel Tasks, I was able to improve this process, reducing the number of file reads by 50%. So, if the old method made 100 file reads to build the page, the new process would only make 50 file reads. Additionally, with the processing now in a Parallel process, data could be read asynchronously, though this doesn’t help as much as one might hope since data needs to be processed in order. But, it does seem to help.
For files larger than 60 MB, again, I needed to find a way to reduce the number of file reads. To do this, I tried two things. First, I increased the buffer. This means that more data is read at a time, so fewer file reads must occur. Previously, the buffer was 1 MB. The buffer has been increased to 8 MB. This makes a big difference, as now files under 8 MB only are read once, as the remainder of the data lives in the buffer. The second thing that I did was moved access down to the abstract classes. This allowed me to interact beneath the StreamReader class and access the actual positions in the file when data was read. This couldn’t be done in the current version of MarcEdit, because the position properties report where buffered data was read. This meant that an additional file operation had to occur just to get the file positions. Again, if the file needed 100 reads to read the file, the update process would only need 50 reads.
So, what’s the impact of this. Well, let’s see. I have a 350 MB file and paging set to 100 records per page. This is a UTF8 file with records from materials The Ohio State University Libraries has loaded into the HathiTrust. Using this as my test set, I simply opened the files in the MarcEditor in MarcEdit 6.3.x and MarcEdit 7.0.0.alpha. To test, I loaded this file five times, throwing out the slowest and fastest times, and selecting the status message closest to the average.
We can see that by reducing the number of file reads, the process improves significantly, though, it could be better. Digging deeper into the results, I’m finding that the actual reading of the data is even faster, with the actual rending of the data in the newer control taking longer than the previous editing control. The reason for this is that in MarcEdit 6.3.x, this control usage has been optimized, double buffered, etc. In MarcEdit 7.0.0.alpha, this hasn’t been done yet. My guess, I can probably get these numbers down to around 8.7-9 seconds for a file of this size. That would represent a 5-5 1/2 second increase in performance. Of course the question is, will this help individuals opening smaller files. I think yes. On my SD drive, loading of a 50 MB file takes roughly the same amount of time: 1.3 seconds. But on a non-SD drive, I think the improvement will be significant given that the number of file reads will be reduced.
This test though was with the old defaults in MarcEdit. For MarcEdit 7.0.0.alpha, I would like to change the default paging size to 1000 records per page (since the new component is more is more efficient when dealing with larger sets). So, let’s run the test again, this time using the different paging values using the same approach as above:
Looking at the process, you can see that the gap between the two versions gets larger. Again, looking closer at the data, the actual loading of the file is faster than the first tests, but rendering the data pushed the final load times higher. As in the first tests, I believe that once the Editor itself has been optimized, we’ll see this improve significantly. By the time the final version comes out, the performance different on this type of file could be between 6-8 seconds, or a 37-50% speed improvement over the current 6.3.x version of the software.
Writing files in the MarcEditor
In looking at the process used to write files on save, the same kind of issues are causing the problems there. First, saving requires a lot of file access (both read and write), and second, once a file is saved, it is reloaded into the Editor. This means the on systems with SD drives, the performance benefits may be modest, but for non-SD systems, the gains should be significant. But there was only one way to tell. Using the same file, I made edits on 4 pages. The first page, the 50th page, the 150th page, and the last page. Paging was set back to 100 records per page. This forces the tool to combine the changes pages with the unchanged data in the virtual space. Using the loading times above, we can estimate the time actually used when saving the data. I’ll be providing numbers for both the save, and save as process (since they work slightly different):
Saving the file using Save:
Saving the file using Save As:
As you can see here, the difference between the new saving method and the old saving method is pretty significant. The time posted here reflects the time it takes to both save the file, and then reload the data back into the Editor window. Taking the times from the first test, we can determine that the Save function in MarcEdit 6.3.x takes ~6.2 seconds, if rendering the file takes an average of 14 seconds, and the Save As operation takes approximately 6.7 seconds. Let’s compare that to MarcEdit 7.0.0.alpha. We know that the rendering of the file takes approximately 10 seconds. That means that the Save function takes ~.8 seconds to complete, and the Save as function, 1.2 seconds to complete. In each case, this represents a significant performance improvement, and as noted above, optimizations have yet to be completed. Additionally, I do believe that on non-SD systems, the performance gains will be even more noticeable.
Thoughts, Conclusions, and So what
Given how early I am in the development and optimization process, why start looking at any of these metrics now. Surely, some of these things will change, and I’m sure they will. But these give me a base-line to work with, and a touchstone as I continue working on optimizing the process. And it is early, but one of the things that I wanted to highlight here is that in addition to the new features, updated interface, and accessibility improvements – a big part of this update is about performance and speed. When I initially wrote MarcEdit, nearly all the code was written in Assembly. Shifting to a higher level language was incredibly painful for me to do because I want things to be fast, and Assembly programming is all about building things small and building things fast. You have access to the CPU registers, and you can make magic happen. Unfortunately, keeping up with the changes in the metadata world, the need to provide better Unicode support, and my desire to support Mac systems (which used, at the time, a different CPU architecture, meant moving to a higher language that could be compiled for different systems. Ever since that code migration, I’ve been chasing the clock, trying to get the processing speeds down to the original assembly code-base. Is that possible? No. Though, even if it was, so many things have changed and been added, the process simply does more than the simple libraries that I first created in 1999…but still, that desire is there.
So, while I am spending most of my time communicating publically about the new wireframes and new functionality in MarcEdit 7 (and I’m really excited about these changes)…please know – MarcEdit 7 is also about making it fast. I think MarcEdit 6.3.x is already pretty quick on its feet. As you can see here, its about to get faster.
I’ve posted update for all versions. Windows and linux updates for 6.3.x Sunday evening and updates to MacOS for 2.5.x on Wed. morning. Change log below:
* Bug Fix: MarcEditor: Convert clipboard content to….: The change in control caused this to stop working – mostly because the data container that renders the content is a rich object, not plain text like the function was expecting. Missed that one. I’ve fixed this in the code.
* Enhancement: Extract Selected Records: Connected the exact match to the search by file
* Bug Fix: MarcEditor: Right to left flipping wasn’t working correctly for Arabic and Hebrew if the codes were already embedded into the file.
* Update: Cleaned up some UI code.
* Update: Batch Process MarcXML: respecting the native versus the XSLT options.
* Bug Fix: MarcEditor: Right to left flipping wasn’t working correctly for Arabic and Hebrew if the codes were already embedded into the file.
* Update: Cleaned up some UI code.
* Update: Batch Process MarcXML: respecting the native versus the XSLT options.
* Enhancement: Exact Match searching in the Extract, Delete Selected Records tool
* Enhancement: Exact Match searching in the Find/Replace Tool
* Enhancement: Work updates in the Linked data tool to support the new MAC proposal
In this set of wireframes, you can see one of the concepts that I’ll be introducing with MarcEdit 7…wizards. Each wizard is designed to encapsulate a reference interview to attempt to make adding new functions, etc. to the tool easier. You will find these throughout MarcEdit 7.
XML Functions Window:
XML Functions Wizard Screens:
You’ll notice one of the options is the new XML/JSON Profiler. This is a new tool that I’ll wireframe later; likely sometime in August 2017.
Something that comes up a lot is the lack of key combinations or pathways to using functions in MarcEdit. I’ll admit, the program is very mouse heavy. So, as part of the accessibility work in MarcEdit 7, I’m taking a long look at how access to all functions can be accommodated via the keyboard. This means that for MarcEdit 7, I’m mapping out all keycode combinations (the ALT+[KEY]) paths and the more traditional shortcut key combinations) for each window in MarcEdit. When it’s finished, I’ll make this part of the application documentation. Before I get too far along, I wanted to show what this looks like. Please see: http://marcedit.reeset.net/software/MarcEdit7_KeycodeMap.pdf
I’m continuing to flesh out new wireframes, and one of the areas where I’ll be consolidating some options is in the preferences window. I’ve decided to reorganize the menu and some of the settings. Additionally, I’m adding a new setting: Ease of Access.
Here’s the Initial Wireframes demonstrating the new menu layout
Ease of Use:
This is a new section developed to support Accessibility options. At this point, these are the options that I’m working on:
While MarcEdit will respect the operating system’s accessibility settings (i.e., if you’ve scaled fonts, etc.), but these settings directly affect the MarcEdit application. In this section, you’ll find the themes (and I’m working out a way to provide a wizardry way to create themes and find ones that have been created), feedback options (right now, if this is selected, you’ll get audible clicks letting you know that an action has occurred), and Keyboard options. I’m spending a lot of time mapping the current keyboard options, with the intention that I’ll try to map all actions to some keyboard combination. These settings tell MarcEdit if this information should show up in the Tooltips, as well as rich descriptions about an operation. The last thing that I’ll likely add is a set of links to topics for users looking for accessible friendly fonts, etc.
I think that the reorganization should help to provide some clarity in the settings and will help me in thinking about the first run wizard – and hopefully the currently planned accessibility options will provide users with a wider range of options.
I’ve been working a bit more around this notion of creating “themes” to improve visible accessibility options. This started with an initial implementation that included the default interface and then a High Contrast interface. Over the past few days, I’ve been getting a wide range of feedback, and one of the things that is becoming apparent is that folks would like to have a wide range of preferences. So, this afternoon, I spent time taking the hardcoded default and high contract themes, and rewriting all non-default UI implementations as themes.
When I think about theming, I immediately start thinking about the operating system themes, or themes that you can download for browsers. At this point, we aren’t talking about anything quite so complex. In fact, until I get feedback, I’ll be keeping theming light weight – but I think that in the long run, this might actually make them more useful.
How do they work? Essentially, a theme is going to be the implementation of an XML file. Here’s the dark (high contract) theme written out in the new xml theme structure.
<?xml version=”1.0″ encoding=”utf-8″ ?>
<theme> <name>Dark (High Contrast) Theme</name> <global> <!–Use HTML web color codes for these values–> <font_color>#ffffff</font_color> <background_color>#000000</background_color> </global> <marceditor> <font_color>#000000</font_color> <background_color>#ffffff</background_color> </marceditor> <overrides> <!– Override values <menus> <font_color> <background_color> </menus> <links> <font_color> <visited_font_color> <behavior> [set to always, hover, none] </links> –> </overrides>
As you can see, the initial implementation of theming is very limited. Essentially, users can theme font color and background colors globally, at the MarcEditor level, and override options for menus and links found within the program. This may (and likely) will be extended prior to the release of MarcEdit 7, but I don’t anticipate it being enhanced a lot. While the new GUI rendering engine makes this kind of work easier, I don’t want to develop an entire rendering process around this method until I know there is more than a passing interest.
What this means, however, is that I can quickly create new themes. Right now, I’ve implemented this in the Options dialog. You can see the current line of thinking below:
Using the Main Windows of MarcEdit 7 as the example page, I’ll run through the current themes that I’ve marked up:
Default Theme (hardcoded):
Dark (High Contrast) Theme:
Dark Gray Theme:
All these themes were created using the theming xml files. As I say, if I get feedback, I’ll look to expand this as we move towards the official release.
An interesting request made while reviewing the Wireframes was if MarcEdit 7 could support a kind of high contrast, or “Dark” theme mode. An Example of this would be Office:
Some people find this interface easier on the eyes, especially if you are working on a screen all day.
Since MarcEdit utilizes its own GUI engine to handle font sizing, scaling, and styling – this seems like a pretty easy request. So, I did some experimentation. Here’s MarcEdit 7 using the conventional UI:
And here it is under the “high contrast” theme:
Since theming falls into general accessibility options, I’ve put this in the language section of the options:
However, I should point out that in MarcEdit 7, I will be changing this layout to include a dedicated setting area for Accessibility options, and this will likely move into that area.
I’m not sure this is an option that I’d personally use as the “Dark” theme or High Contrast isn’t my cup of tea, but with the new GUI engine added to MarcEdit 7 with the removal of XP support – supporting this option really took about 5 minutes to turn on.
The changes aren’t big – they are really designed to make the form a little more compact and add common topics to the screen. The big changes are related to integrations. In MarcEdit 6.x, when you run across an error, you have to open the validator, pick the correct validation option, etc. This won’t be the case any longer. When the tool determines that the problem may be related to the record structure – it will just offer you option to check for errors in your file…no opening the validator, not picking options. This should make it easier get immediate feedback regarding any structural processing errors that the tool may run up against.
MARC Tools Window Wireframe #1:
The second write frame collapses the list into an autocomplete/autosuggest options, moves data around and demonstrates some of the potential integration options. I like this one as well – though I’m not sure if having the items in a dropdownlist with autocomplete would be more difficult to use than the current dropdown list. I also use this as an opportunity to get ride of the Input File and Output file labels. I’m not sure these are always necessary, and I honestly hate seeing them. But I know that iconography maybe isn’t the best way to express meeting. I think attaching tooltips to each button and textbox might allow me to finally let these labels go.
MARC Tools Wireframe #2:
Based on feedback, it sounds like the labels are still desired. So here is wireframe #3 with a slight modification to allow for labels in the window.
While I’ve got a couple of clean up things to do with MarcEdit 6.x, I’ve been starting the process of revising MarcEdit 7.0.0.alpha. At this point, there is a version running new code. I’ve spent the past few evenings reorganizing the main window, pulling some things apart, and beginning the process of redoing code. So far, this what’s been completed:
New MarcEdit 7 Main window:
This is what the main window looks like. I’ll be creating a 4 column interface. Scaling works much better than in the previous version of MarcEdit, as I’m using a different layout engine. The topics in the top – those are wide open. These were the ideas I thought of as short cuts to either help pages or parts of the program. In some cases, I’ll be creating “wizards” that answer each of these questions so users get pointed in the correct direction (at least, that’s the plan). In working through this interface, you’ll notice the menus have changed. This means that menu entries have changed a lot (location wise). I’m going to be looking at setting up shortcut keys to everything for keyboard access, but here’s the new layout I’ve drawn up:
You’ll notice many menus now have secondary menus – MARC Processing Tools for example, now includes all items like MARCSplit, MARCJoin, etc. This moves items down a level, and I realize that’s not what you always want to do. I’m hoping that what will help with this is the help textbox on the right hand corner.
This is in MarcEdit 6.3.x, and is being expanded in MarcEdit 7. If you, for example, type the words Join – you will be able to open MARCJoin directly from this window.
Say you want to merge some records…just type – merge records and you get:
An error message you don’t recognize:
Or you just want to know how to get started:
Or maybe, you’ve had a hard day and just want to look at cats:
The help system is being developed to allow for pseudo natural language searching. To start with, it will be English only, but by the time MarcEdit 7 comes out, you should be able to write queries in about any language and hopefully get back useful responses. I’m hoping that users gravitate towards this method of accessing commands or using the keyboard shortcuts, or the new Last Used Tools options so that the menu restructuring doesn’t cause usability issues. But I’m definitely interested in feedback.
The plugin menu works mostly the same as before, with the primary difference being that the plugin manager now is found under plugins (probably should have always have been)
Help menu has been updated significantly. You’ll notice that I’ve moved access to the Hex editor, shorting configuration settings (for when you get a new computer) and the restarting into 32-bit mode to allow for integration with Connexion into this menu. You’ll also notice a new option – the troubleshooting wizard. This is a new tool that walks through a set of questions where you can copy error messages, error numbers, etc. and the tool will point you the right direction or run the correct validation routines for you (so you don’t have to guess). This is something that should improve with feedback, and should be in the first release that I make available to users for testing.
Also, please note that you can see that this version of MarcEdit is being built against .NET 4.6. This means that this version officially won’t run on Windows XP.
The MARCNext and About windows have been dislodged from the main window. This means that if you click MARCNext, it displays in its own window. I did this because it gives me more room to grow this resource a bit easier.
In looking through the program, one of the main activities that I’ll be doing is hopefully addressing UI issues, doing better integration of functionality added during the 4 years MarcEdit was in the 6.x series (from a code perspective, some things are kind of bolted on, rather than integrated) and reduce redundancy. Most folks may not know it, but MarcEdit has 4(!) different z39.50 clients, each using different code, in the program. The Mac version has one, and it does all the things that the 4(!) in the windows version currently does. Those are the kinds of redundancies that I’ll be addressing as I clean the tool.
Oh, and performance. Just moving MarcEdit to the new framework has seen some improvements. Without optimizations, I was testing breaking on a simple file I have of 120,000 random MARC records. After multiple runs and averaging the results, I’m seeing that:
MarcEdit 6.3.x: ~5975 recs / sec.
MarcEdit 7.0.0 ~7213 recs / sec.
Once I start optimizing code, I’m pretty sure we’ll see this continue to improve. The gist here is that by moving the program forward (and dropping XP support), I should be able to finally include a number of tools related to graph processing as well as see a significant speed improvement with the software.
Finally – you’ll notice the icon has changed. I don’t know if this will be a permanent change as I like the current MarcEdit icon (I’ve used it for almost 17 years now). I’ve changed it while I’m developing MarcEdit 7 so that I can tell the difference between the two versions of the software while working on my laptop.
I’ve been thinking about the new UI for MarcEdit 7. I haven’t decided yet if the main window should have the ribbon or keep the menus (menus seem most appropriate for the MARC Tools and Main Window) – but the main thing I wanted to do with the new UI in MarcEdit 7, is to try and find ways to surface tools based on common questions/actions; as well as push the last used tools up. I’ll keep the user defined buttons (I like those, I use them all the time). One of the other things I’ll likely end up doing, presently I keep the MARCNext toolset framed in the main windows (as well as the about window). I’ll be pushing those into their own windows as the way it currently works – it complicates updates. Also, all fonts will be updated from 8.5 point (the default) to 10.5 pt. I’d like to set the default font to the Google Noto Fonts, but distributing the font is out of the question (the font set is 450 mbs in total — but maybe I can include something in the installer to allow users to select this font for download if they want…I’ll have to think about it). With that, I’ll be improving the accessibility functionality so that users can continue to easily update font sizes. In fact, I’ll be changing the window that shows when you first install to be a series of questions (rather than showing the preferences). The questions will be:
Preferred Font/Size: I’ll show current settings and sample of typography
MARC Flavor: You tell me are you using MARC21 or something else
Default Z39.50/SRU servers (you’ll have a list of known servers to select from this way you have servers in the tool at the beginning)
Link to the Tutorials/Help
After this, you’ll have the option to select the preferences and update all of the options. But I’m looking for ways to make this easier so when users install MarcEdit 7 for the first time, you don’t have to look for specific settings (specifically fonts).
Feedback is welcome.
* Note — this wireframe is for the windows/Linux version. Some of these concepts will make it to the MarcEdit Mac version — but I try to keep that development in line with Apples UI recommendations when possible.