So I missed one — I made some changes to how MarcEdit loads data into the MarcEditor to improve performance, especially on newer equipment, and introduced a bug. When making multiple global updates, the program may (probably) will lose track of the last page of records. This slipped through my unit tests because the program was reporting the correct number of changes, but when the program analyzed the for indexing, it was dropping the last page. Oops. This was introduced in the update posted at 1 am on Aug. 15th. I’ve corrected the problem and updated my unit tests so that this type of regression shouldn’t occur again.
One question that did come up to me privately was why make this change to begin with. Primarily, it was about performance. MarcEdit’s paging process requires the program to index the records within a MARC block. In the previous approach, this resulted in a lot of disk reads. Generally, this isn’t a problem, but I’ve had occasion where folks with lower disk speeds have had performance issues. This change will correct that. For files under 50 MB, the program will now read the entire file into memory, and process the data in memory to generate paging. This is a more memory intensive task (the previous method utilized a small amount of memory, whereas the new process can require allocations of 100-120 MB of system memory for processing) but removes the disk reads which is the largest bottleneck within the process. The effect of this change is a large performance gain. On my development system, which has a solid state drive, the improvement loading a 50 MB file is over a second, going from 3.3 seconds to 1.8 seconds. That’s a pretty significant improvement — especially on a system where disk reads tend to happen very quickly. On my secondary systems, the improvements are more noticeable. On an Intel I-5 with a non-solid state drive and 6 GB of RAM, the old process took between 3.7 to 4.1 seconds, while the new method loaded the file between 1.6-1.8 seconds. And on a tablet with an older Atom processor and 2 GB of RAM, the old process took approximately 22 seconds, while the new only 9 seconds. These are big gains that I hope users will be able to see and benefit from.
Testing Results Old Process
Machine Description | File Description | Time to Load |
I-7 Dell XPS Ultrabook, 8 GB RAM, SSD | 45,922 records; 50 MB | 1st load: 3.4s 2nd load: 3.3s 3rd load: 3.2s |
I-5 Dell Workstation, 6 GB RAM, 7200 rpm HD | 45,922 records, 50 MB | 1st load: 4.1 s 2nd load: 4.0s 3rd load: 3.8s |
Atom 1.5 mhz ACER tablet, 2 GB RAM, SSD | 45,922 records; 50 MB | 1st load: 27s 2nd load: 22s 3rd load: 23s |
Testing Results New Process
Machine Description | File Description | Time to Load | Diff |
I-7 Dell XPS Ultrabook, 8 GB RAM, SSD | 45,922 records; 50 MB | 1st load: 1.4s 2nd load: 1.3s 3rd load: 1.3s |
(2s) |
I-5 Dell Workstation, 6 GB RAM, 7200 rpm HD | 45,922 records, 50 MB | 1st load: 1.8 s 2nd load: 1.6s 3rd load: 1.6s |
(2.3s) |
Atom 1.5 mhz ACER tablet, 2 GB RAM, SSD | 45,922 records; 50 MB | 1st load: 10.1s 2nd load: 9.6s 3rd load: 9.7s |
(10-18s) |
While the new process appears to provide better performance on many different types of systems, I realize that there may be some system variations that not benefit from this new method. To that end, I’ve added a new configuration option in the MarcEditor Preferences that will allow users to decide to turn off the new paging method. By default, this option is selected.
If you update the program via MarcEdit, the download will be offered automatically the next time you use the program. Otherwise, you can get the update at: http://marcedit.reeset.net/downloads
–TR