(Posting this on my blog as well)
I have a necessary MarcEdit update planned for tonight, as it fixes a problem introduced in the last update that I don’t believe anyone is seeing (I didn’t) until you start working with really large files. I found it yesterday and was up till around 3:30 this morning trying to pin-point what was happening (because it didn’t make any sense to me).
Here’s the issue — in the last update, I worked on adding some code to ensure that temporary files MarcEdit creates get cleaned up. The code works great, but what I’m finding is that it is almost impossible to determine when the garbage collector will finalize the disposal of the object in certain cases. This means that on very large files, I’m finding that the garbage collector is removing the modified file from the MarcEditor — so a change might not stick, or worst, will result in only a partial file being loaded (which is really obvious). This is how I noticed the issue. I was working with files over 400 MBs testing the thread pooling added to the linked data process and couldn’t figure out why only partial results were being loaded. What added to the confusion was that a single Message Box anywhere in the file loading process would enable the process to finish and work as expected. It took a long time of poking around to find the problem, and fix it. Unfortunately, it was too late to get everything done.
What I am doing — I’m issuing an intermediary fix for anyone that might be running into this problem with MarcEdit. The links are below. Tonight, I’ll have the formal build (which will have gone through the unit testing process, etc.) that I’ll post through the normal update process. Really sorry this one slipped by me — but by normal testing process is to work on sets of files up to 10,000 records and unfortunately, this didn’t appear to be a large enough set to reliably see this issue. And given the garbage collection will work differently based on your system — it’s hard to know when (if) anyone would see this issue.
To correct the problem, in the meedit7.dll — the file that handles all global editing — the tool now sets a preservation bit on files that need to exist outside of the calling process. This removes this problem. It may means that a temp file or two stays on the machine — but honestly, in Windows 10 and current OS’s like MacOS — they provide options that allow the operating system to manage temp files, and MarcEdit provides a tool in the Help section to automatically clean all MarcEdit temp files. So, at this point, I think this is the best option going forward.
Since I’m releasing this early — I didn’t get an opportunity to formalize my change log and write the notes on what’s changed — there is a lot. You’ll see the following:
- Verify URLs — there is an option to manage # of threads used. This is the first time this tool will utilize a thread pool to provide faster query. I wouldn’t recommend using more than 10 threads (3-5 is a good number) as you could start to look like a denial of service attack to those you are checking.
- Verify URLs — I’ve updated the stylesheet that generates the results. Its easier to read, and the record #s are now able to be copied so you could put these in a file, and select these files through the Extract Selected Records Tool.
- Build Links — I’m still experimenting with this, but I’m making it available because it works. This tool now uses threads to build links. I’m generally seeing a 15-20% improvement in speed to process files. Not a big change, but I’m happy for it. I have some ideas I’m working on to improve speed further — they might show up in tonight’s release
- Regular Expression Store — its been updated. You can now add new metadata and search across multiple fields of metadata.
- Replace Function: When using external file criteria, the tool has trouble if your files include BOM values. I’ve added code to filter these out.
- Task Management — I’ve added an option in the Task manager to allow a task to override the broker’s assessment and run the tasks using the older task method. Generally, the brokers assessment results in significant speed gains, but there are times when you have a task that will touch every record — it may be faster to use the other method. This will give users control to use either.
- Component updates: I’ve updated core components to the linked data tooling, the saxon xslt processor, and the JSON processing tools.
This intermediary update is version: 7.0.126 and is found in the normal download links:
- 32-bit admin: http://marcedit.reeset.net/software/marcedit7/MarcEdit_Setup32Admin.msi
- 32-bit user: http://marcedit.reeset.net/software/marcedit7/MarcEdit_Setup32.msi
- 64-bit admin: http://marcedit.reeset.net/software/marcedit7/MarcEdit_Setup64Admin.msi
- 64-bit user: http://marcedit.reeset.net/software/marcedit7/MarcEdit_Setup64.msi
I’m providing this download for users that may experience this issue now as it will interrupt current work or would like to test the new functionality (would love to have some feedback). However, tonight, the update will become official and roll over to version 7.0.127 (or 7.0.128) and will initiate through the automated updating mechanism.
If you are unsure which version of MarcEdit 7 you have installed, you can check in two ways:
- Click on Help/System Information
In the System Information box, you’ll see the Install type:
- The second method — Open the Windows Control Panel, and look at the Program List. You will see the Install version listed in the program title:
It is important that you pick the version currently installed. While each version of MarcEdit shares the same component, registry entries vary by version installed.
Let me know if you have questions.
–tr