Oct 312013

Ever since having kids, one of my favorite parts of Halloween has been the carving of the Jack-o-lanterns.  Each year, the family picks out different characters, and I try to see what I can do with each. This is our 2014 crop of jack-o-lanterns.

calvin and hobbes
Calvin and Hobbes


Snow Leopard (created from a picture)


 Posted by at 3:32 pm
Oct 102013

I got a reminder from Oregon State University that my old web space will be disabled this weekend.  I’ve since moved all the content to a new location, but be aware that it will likely take the search engines some time to update these redirects since so many people linked to the older URLs. 

I’ve also nearly migrated most of the other content simply to: http://reeset.net


 Posted by at 1:33 pm
Apr 162013

Just a couple of notes.  I’ve been spending quite a bit of time as of late testing the current MONO builds against the current MarcEdit codebase.  It looks like the latest stable version works just fine with the program, while the 3.0.x branch has a few rendering issues (not sure if they are mine or not — I’m looking into it).  Anyway, what this means is that I’m looking at working on trying to simplify the installation process for folks using the software on Linux/Mac systems.  I’m starting with Linux (since I run Linux).  Right now, I’m working on using a tool called: makeself (http://megastep.org/makeself/) which will create a self extracting archive and then allow me to execute a cleanup script to take away a number of the necessary steps during the install process.  Ideally, the steps would be:

  1. Run the install script
  2. Run MarcEdit
  3. Get a report of missing dependencies

This would remove the need to remember to run the bootloader to clean up and set install paths, as well as get the configuration right for Z39.50 use if it yaz is installed on the system.  It should also give me a method to check dependencies and provide a list of packages needing to be installed to make installation as simple as possible (though at this point, the only dependencies are the mono-core and the System.Windows.Forms libraries — which sometimes is and sometimes isn’t included in specific distributions definition of “core” files).

Once I have this build process defined, I’ll find myself a mac and give MONO’s MacPackager a whirl.  It seems like that might be a simple solution to packaging the program, and  at this point, all the Z39.50 components can be disabled to greatly simplify the process.

Ideally, once I have an installer in place, I should be able to tweak MarcEdit’s automated updater so that Mac/Linux users receive notifications and downloads of the current installers.  From there, they would just need to spawn them within their own specific environments.  Easier — I hope so.  And as always, I’ll continue to provide a plan zip file for download, for those folks in environments that don’t allow the installation of software.

Questions, comments — let me know.


 Posted by at 10:12 am

Moving domains

 Uncategorized  Comments Off
Mar 242013

If you are reading this, then the redirections that I’ve temporarily put into place are working.  I’m currently in the process of moving all of my content off of the oregonstate.edu domain and onto my new personal domain at: reeset.net.  This means everything from my blog: http://blog.reeset.net to MarcEdit: http://marcedit.reeset.net.  It’s been a slow process, and one that I’m still working on – but it’s coming along.

The redirections I’ve put in place should stay live until around June 2013.  I’m pretty sure that’s why my personal OSU web space will be turned off – though all the data should be migrated long before then.  Hopefully, with the new domain, I won’t have to go through this process again if/when I ever change jobs again in the future. 


 Posted by at 12:28 am
Jan 262013

Sunday, I’ll be making an exception to my no ALA rule (well, it’s not a rule, but I’ve yet to find a good reason to get back involved) and will be in Seattle giving a talk as part of the ALCTS CaMMS program.  The focus is around research, and I thought it would give me a chance to talk to folks about the RDA Helper and some of the practical research I’ve been doing with MarcEdit to help librarians support RDA encoding rules in MARC through data mining and automatic data creation.  Not particularly all that sexy when you consider some of the more abstract concepts that others (and myself) have been working on – but with March 2013 approaching, I’m hoping that this practical work with help to make a real difference while we work how to make the more abstract concepts real in our production environments. 

I’ll post the slides tomorrow, but if you are interested in the RDA Helper, you can find out more about it here:


If you are in Seattle and want to say hi, drop me a line or you can find me here: http://connect.ala.org/node/195883


 Posted by at 11:19 am
Nov 302012

I occasionally have occasion to build small projects (web and command line) to do various things.  A lot of times, these projects could be done in things like Rails or another framework, but honestly, I don’t need that much overhead so I fall back to using PHP or PERL.  Between the two, I prefer working with PHP and setting it up to work as a traditional shell script.  It’s fairly easy, and here’s how.

First, you need to make sure that PHP CLI is installed.  You can do this by checking the version.
reeset@enyo home]$ php -v
PHP 5.3.3 (cli) (built: Jul  3 2012 16:53:21)
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies

Essentially, you are looking for the text, cli in the version string.  If you have it, you are good to go.  If not, you’ll have to hunt down and install it using apt-get for example. 

Once you are set, you just need to hunt down the full path to the php binary.  I find that it often can be found at /usr/bin/php.  You’ll want to confirm that path because you’ll need it later.

Because I want to turn this into more of an executable script, I don’t want to have to run the script using the php command line.  So, I’m going to add a shebang to the first line of the script file.  This was new for me – I hadn’t realized that this was possible in PHP until I was looking through the manual, and for CLI scripts, I definitely prefer this method. 
Example script:
      echo “PHP CLI script \n”;

The shebang allows the script to be run without refererncing the PHP executable in the command line.  Now, all you need to do is make the file executable. 
chmod +x yourphpfile.php

With the file set as executable, you can now run the script like any other shell script, i.e.; ./yourphpfile.php


 Posted by at 11:36 pm
Nov 282012

This afternoon, I put the finishing touches on the update/creation components for the ILS direct integration framework in MarcEdit.  As a proof of concept, I’ve continued to demonstrate the processing utilizing Koha’s API.  The Koha API has been abstracted as a separate class so I can push up to github for those interested in a C# implementation. 

I’ve posted the following video to demonstrate the update/creation process utilizing MarcEdit configured for Koha integration. 


 Posted by at 1:53 am
Nov 272012

One of the things I have been working on is a framework for supporting the ability to configure MarcEdit to support the direct integration with one’s ILS system.  Many ILS systems like Sierra, Alma, Koha, and others support (or will support) APIs that allow for the modification and creation of bibliographic.  I’ve developed a framework in MarcEdit that should allow me to provide integration options for most systems (so long as documentation is available), and as a proof of concept, I’ve provided integration with Koha.  Koha has an easy to use set of API that make updating and creating new bibliographic records on the system fairly easy.  The attached video demonstrates how one will be able to configure MarcEdit to utilize this new functionality once it becomes available (around Dec. 1st-ish, 2012).


 Posted by at 2:58 am
Dec 192011

I’ve been working on making a few changes to the way in which MarcEdit processes MARCXML data.  In MarcEdit, XML metadata transactions happen via XSLT.  This was done primarily to provide a great deal of flexibility in the types of XML transactions MarcEdit could preform.  It also meant that others could create their own XSLT transactions and share them with the MarcEdit community – meaning that I wouldn’t be a bottleneck. 

So, within the application, there were only 4 canonical conversion:

    1. MARC=>MarcEdit’s mnemonic format
    2. MarcEdit’s mnemonic format => MARC
    3. MARC => MARCXML (which occurs through an algorithm, no XSLT)
    4. MARCXML => MARC (which involves an XSLT translation from MARCXML to MarcEdit’s mnemonic format)


The four conversions represented the foundation for which all other work happened in MarcEdit.  The last conversion, MARCXML => MARC, represented a hybrid approach that used an XSLT to translate the data into MarcEdit’s mnemonic format, before handing the data off to the MARCEngine to complete data processing.  This method has worked very well throughout the years that I’ve made this functionality available, but it also has imposed a significant bottleneck on users working with large XML data files.  Because the MARCXML process utilized an XSLT translation, conversions from MARCXML were particularly expensive because of the XSLT processing.  It also meant that uses want to process exceptionally large MARCXML document would run into hard limits associated with the amount of memory available with their system. 

In the next update, this will change.  Starting in MarcEdit 5.7, the MARCXML => MARC function will shift from being an XSLT process to a native processing algorithm that uses SAX to process data.  The affect of this is that MarcEdit’s ability to process MARCXML data will be greatly improved and much less expensive. 

In benchmarking the change in process, the results are pretty staggering the larger the source file is.  For example, using a 50 MB MARCXML file (14,000 records), you can see the following improvement:

Process Time Records Per Second
MARCXML => MARC (old method) 5.9 seconds 2372
MARCXML => MARC (new method) 2.1 seconds 7000


Working with this smaller file, you can see that there has definitely been an improvement.  Using the new processing method, we are able to process early 3 times as many records per second.  However, this difference becomes even more pronounced as the number of records and the source XML file increases.  Using a 7 GB MARCXML file (1.5 million records), the improvement is startling:

Process Time Records Per Minute
MARCXML => MARC (old method) 50400 seconds 1785
MARCXML => MARC (new method) 460 seconds 197,368


Working with the larger file sizes, we see that the new method was able to process 110.5 times more records per minute.  What’s more, it’s likely that on my benchmarking workstations, this represents the largest file I would be able to process utilizing the old, XSLT centric method.  At nearly 14 hours, this file size seriously tested the DOM XSLT processor during the initial loading and validation phases of the process.  The new method, however, should easily be able to handle MARCXML files of any size. 

So why make these changes?  It’s probably pretty rare that folks are going to need to be working with MARCXML files of this size very often.  At least, I would hope not.  However, the speed improvements were so great working with both small and larger files, that it was well worth the effort to implement this change.  Likewise, it will improve MarcEdit’s XSLT based translations by removing one crosswalking step for those transformations moving from a non-MARCXML XML format to MARC.  So, the real practical affects of this change will be:

    1. MARCXML => MARC translations will be much faster
    2. XSLT translations from a non-MARCXML XML format to MARC will be improved (because you will no longer have the added MARCXML=>Mnemonic translation occurring)
    3. MarcEdit will be able to process MARCXML files of any size (physical system storage will be the only limiting factor)
    4. Processing non-MARCXML XSLT translations using XSLT will continue to have practical size limits due to the memory requirements of DOM based XSLT processors.   In my own benchmarking, practical limits tend to be around 500 MB – 1 GB.


These changes will be made available when MarcEdit 5.7 is released, sometime before January 1, 2012.


 Posted by at 5:03 pm