Sep 262015

After about a month of working with the headings validation tool, I’m ready to start adding a few enhancements to provide some automated headings corrections.  The first change to be implemented will be automatic correction of headings where the preferred heading is different from the in-use headings.  This will be implemented as an optional element.  If this option is selected, the report will continue to note variants are part of the validation report – but when exporting data for further processing – automatically corrected headings will not be included in the record sets for further action.


Additionally – I’ll continue to be looking at ways to improve the speed of the process.  While there are some limits to what I can do since this tool relies on a web service (outside of providing an option for users to download the ~10GB worth of LC data locally), there are a few things I can to do continue to ensure that only new items are queried when resolving links.

These changes will be made available on the next update.


 Posted by at 2:24 pm
Sep 072015

So, this update is a bit of a biggie.  If you are a Mac user, the program officially moves out of the Preview and into release.  If you are a Mac user, this version brings the following changes:

** 1.1.25 ChangeLog

  • Bug Fix: MarcEditor — changes may not be retained after save if you make manual edits following a global updated.
  • Enhancement: Delimited Text Translator completed.
  • Enhancement: Export Tab Delimited complete
  • Enhancement: Validate Headings Tool complete
  • Enhancement: Build New Field Tool Complete
  • Enhancement: Build New Field Tool added to the Task Manager
  • Update: Linked Data Tool — Added Embed OCLC Work option
  • Update: Linked Data Tool — Enhance pattern matching
  • Update: RDA Helper — Updated for parity with the Windows Version of MarcEdit
    * Update: MarcValidator — Enhancements to support better checking when looking at the mnemonic format.

If you are on the Windows/Linux version – you’ll see the following changes:

* 6.1.60 ChangeLog

  • Update: Validate Headings — Updated patterns to improve the process for handling heading validation.
  • Enhancement: Build New Field — Added a new global editing tool that provides a pattern-based approach to building new field data.
  • Update: Added the Build New Field function to the Task Management tool.
  • UI Updates: Specific to support Windows 10.

The Windows update is a significant one.  A lot of work went into the Validate Headings function, which impacts the Linked Data tools and the underlying linked data engine.  Additionally, the Build New Fields tool provides a new global editing function that should simplify complex edits.  If I can find the time, I’ll try to mark up a youtube video demoing the process.

You can get the updates from the MarcEdit downloads page: or if you have MarcEdit configured to check automated updates – the tool will notify you of the update and provide a method for you to download it.

If you have questions – let me know.


 Posted by at 7:23 pm
Sep 062015

This has been a long-time coming – making up countless hours and the generosity of a great number of people to test and provide feedback (not to mention the folks that crowd sourced the purchase of a Mac) – but MarcEdit’s Mac version is coming out of Preview and will be made available for download on Labor Day.  I’ll be putting together a second post officially announcing the new versions (all versions of MarcEdit are getting an update over labor day), so if this interests you – keep an eye out.

So exactly what is different from the Preview versions?  Well, at this point, I’ve completed all the functions identified for the first set of development tasks – and then some.  New to this version will be the new Validate Headings tool just added to the Windows version of MarcEdit, the new Build New Field utility (and inclusion into the Task Automation tool), updates to the Editor for performance, updates to the Linking tool due to the validator, inclusion of the Delimited Text Translator and the Export Tab Delimited Text Translator – and a whole lot more.

At this point, the build is made, the tests have been run – so keep and eye out tomorrow – I’ll definitely be making it available before the Ohio State/Virginia Tech football game (because everything is going to stop here once that comes on).  Smile

To everyone that has helped along the way, providing feedback and prodding – thanks for the help.  I’m hoping that the final result will be worth the wait and be a nice addition to the MarcEdit family.  And of course, this doesn’t end the development on the Mac – I have 3 additional sprints planned as I work towards functional parity with the Windows version of MarcEdit.


 Posted by at 7:45 pm
Sep 022015

I’m not sure how I’ve missed creating something like this for so long, but it took a question from a cataloger to help me crystalize the need for a new feature.  Here was the question:

Add an 856 url to all records that uses the OCLC number from the 035 field, the title from 245 $a, and the ISSN, if present. This will populate an ILLiad form with the publication title, ISSN, call number (same for all records), and OCLC number. Although I haven’t worked it in yet, the link in our catalog will also include instructions to “click for document delivery form” or something like that.

In essence, the user was looking to generate a link within some records – however, the link would need to be made up of data pulled from different parts of the MARC record.  It’s a question that comes up all the time, and in many cases, the answer I generally give points users to the Swap Field Function – a tool designed around moving data between fields.  For fields that are to be assembled from data in multiple fields – multiple swap field operations would need to be run.  The difference here was how the data from the various MARC fields listed above, needed to be presented.  The swap field tool moves data from one subfield to another – where as this user was looking to pull data from various fields and reassemble that data using a specific data pattern.  And in thinking about how I would answer this question – it kind of clicked – we need a new tool.

Build New Field:

The build new field tool is the newest global editing tool being added to the MarcEditor tool kit.  The tool will be available in the Tools menu:


And will be supported in the Task Automation list.  The tool is designed around this notion of data patterns – the idea that rather than moving data between fields – some field data needs to be created via a specific set of data patterns.  The example provided by the user asking this question as:

  •[Title from 245$a]&&ISSN=[ISSN from 022$a]&CallNumber=[CallNumber from 099$a]&ESPNumber=[oclc number from the 035]

While the swap field could move all this data around, the tool isn’t designed to do this level of data integration when generating a new field.  In fact, none of MarcEdit’s present global editing tasks are configured for this work.  To address this gap, I’ve introduced the Build New Field tool:


The Build New Field tool utilizes data patterns to construct a new MARC field.  Using the example above, a user could create a new 856 by utilizing the following pattern:

=856  41$u{245$a}&ISSN={022$a}&CallNumber={099$a}&ESPNumber={035$a(OcLC)}

Do you see the pattern?  This tool allows users to construct their field, replacing the variable data to be extracted from their MARC records using the mnemonic structure: {field$subfield}.  Additionally, in the ESPNumber tag, you can see that in addition to the field and subfield, qualifying information was also included.  The tool allows users to provide this information, which is particularly useful when utilizing fields like the 035 to extract control numbers.

Finally, the new tool provides two additional options.  For items like proxy development, data extracted from the MARC record will need to be URL encoded.  By checking the “Escape data from URL” option, all MARC data extracted and utilized within the data pattern will be URL encoded.  Leaving this item unchecked will allow the tool to capture the data as presented within the record. 

The second option, “Replace Existing Field or Add New one if not Present” tells the tool what to do if the field exists.  If left unchecked, the tool will create a new field if this option is not selected (were the field defined in the pattern exists or not).  If you check this option, the the tool will replace any existing field data or create a new field if one doesn’t exist, for the field defined by your pattern.

Does that make sense?  This function will be part of the next MarcEdit release, so if you have questions or comments, let me know.


 Posted by at 3:13 pm
Aug 232015

Last week, I posted an update that included the early implementation of the Validate Headings tool.  After a week of testing, feedback and refinement, I think that the tool now functions in a way that will be helpful to users.  So, let me describe how the tool works and what you can expect when the tool is run.


The Validate Headings tool was added as a new report to the MarcEditor to enable users to take a set of records and get back a report detailing how many records had corresponding Library of Congress authority headings.  The tool was designed to validate data in the 1xx, 6xx, and 7xx fields.  The tool has been set to only query headings and subjects that utilize the LC authorities.  At some point, I’ll look to expand to other vocabularies.

How does it work

Presently, this tool must be run from within the MarcEditor – though at some point in the future, I’ll extract this out of the MarcEditor, and provide a stand alone function and a integration with the command line tool.  Right now, to use the function, you open the MarcEditor and select the Reports/Validate Headings menu.


Selecting this option will open the following window:


Options – you’ll notice 3 options available to you.  The tool allows users to decide what values that they would like to have validated.  They can select names (1xx, 600,10,11, 7xx) or subjects (6xx).  Please note, when you select names, the tool does look up the 600,610,611 as part of the process because the validation of these subjects occurs within the name authority file.  The last option deals with the local cache.  As MarcEdit pulls data from the Library of Congress – it caches the data that it receives so that it can use it on subsequent headings validation checked.  The cache will be used until it expires in 30 days…however, a user at any time can check this option and MarcEdit will delete the existing cache and rebuild it during the current data run. 

Couple things you’ll also note on this screen. There is an extract button and it’s not enabled.  Once the Validate report is run, this button will become enabled if there are any records that are identified as having headings that could not be validated against the service. 

Running the Tool:

Couple notes about running the tool.  When you run the tool, what you are asking MarcEdit to do is process your data file and query the Library of Congress for information related to the authorized terms in your records.  As part of this process, MarcEdit sends a lot of data back and forth to the Library of Congress utilizing the service.  The tool attempts to use a light touch, only pulling down headings for a specific request – but do realize that a lot of data requests are generated through this function.  You can estimate approximately how many requests might be made on a specific file by using the following formula: (number of records x 2)  + (number of records), assuming that most records will have 1 name to authorize and 1 subjects per record.  So a file with 2500 records would generate ~7500 requests to the Library of Congress.  Now, this is just a guess, in my tests, I’ve had some sets generate as many as 12,000 requests for 2500 records and as few as 4000 requests for 2500 records – but 7500 tended to be within 500 requests in most test files.

So why do we care?  Well, this report has the potential to generate a lot of requests to the Library of Congress’s identifier service – and while I’ve been told that there shouldn’t be any issues with this – I think that question won’t really be known until people start using it.  At the same time – this function won’t come as a surprise to the folks at the Library of Congress – as we’ve spoken a number of times during the development.  At this point, we are all kind of waiting to see how popular this function might be, and if MarcEdit usage will create any noticeable up-tick in the service usage.

Validation Results:

When you run the validation tool, the program will go through each record, making the necessary validation requests of the LC ID service.  When the service has completed, the user will receive a report with the following information:

Validation Results:
Process completed in: 121.546001431667 minutes. 
Average Response Time from LC: 0.847667984420415
Total Records: 2500
Records with Invalid Headings: 1464
1xx Headings Found: 1403
6xx Headings Found: 4106
7xx Headings Found: 1434
1xx Headings Not Found: 521
6xx Headings Not Found: 1538
7xx Headings Not Found: 624
1xx Variants Found: 6
6xx Variants Found: 1
7xx Variants Found: 3
Total Unique Headings Queried: 8604
Found in Local Cache: 1001

This represents the header of the report.  I wanted users to be able to quickly, at a glance, see what the Validator determined during the course of the process.  From here, I can see a couple of things:

  1. The tool queried a total of 2500 records
  2. Of those 2500 records, 1464 of those records had a least one heading that was not found
  3. Within those 2500 records, 8604 unique headers were queried
  4. Within those 2500 records, there were 1001 duplicate headings across records (these were not duplicate headings within the same record, but for example, multiple records with the same author, subject, etc.)
  5. We can see how many Headings were found by the LC ID service within the 1xx, 6xx, and 7xx blocks
  6. Likewise, we can see how many headings were not found by the LC ID service within the 1xx, 6xx, and 7xx blocks.
  7. We can see number of Variants as well.  Variants are defined as names that resolved, but that the preferred name returned by the Library of Congress didn’t match what was in the record.  Variants will be extracted as part of the records that need further evaluation.

After this summary of information, the Validation report returns information related to the record # (record number count starts at zero) and the headings that were not found.  For example:

Record #0
Heading not found for: Performing arts--Management--Congresses
Heading not found for: Crawford, Robert W

Record #5
Heading not found for: Social service--Teamwork--Great Britain

Record #7
Heading not found for: Morris, A. J

Record #9
Heading not found for: Sambul, Nathan J

Record #13
Heading not found for: Opera--Social aspects--United States
Heading not found for: Opera--Production and direction--United States

The current report format includes specific information about the heading that was not found.  If the value is a variant, it shows up in the report as:

Record #612
Term in Record: bible.--criticism, interpretation, etc., jewish
LC Preferred Term: Bible. Old Testament--Criticism, interpretation, etc., Jewish
Heading not found for: Bible.--Criticism, interpretation, etc

Here you see – the report returns the record number, the normalized form of the term as queried, the current LC Preferred term, and the URL to the term that’s been found.

The report can be copied and placed into a different program for viewing or can be printed (see buttons).


To extract the records that need work, minimize or close this window and go back to the Validate Headings Window.  You will now see two new options:


First, you’ll see that the Extract button has been enabled.  Click this button, and all the records that have been identified as having headings in need of work will be exported to the MarcEditor.  You can now save this file and work on the records. 

Second, you’ll see the new link – save delimited.  Click on this link, and the program will save a tab delimited copy of the validation report.  The report will have the following format:

Record ID [tab] 1xx [tab] 6xx [tab] 7xx [new line]

Each column will be delimited by a colon, so if two 1xx headings appear in a record, the current process would create a single column, but with the headings separated by a colon like: heading 1:heading 2. 

Future Work:

This function required making a number of improvements to the linked data components – and because of that, the linking tool should work better and faster now.  Additionally, because of the variant work I’ve done, I’ll soon be adding code that will give the user the option to update headings for Variants as is report or the linking tool is running – and I think that is pretty cool.  If you have other ideas or find that this is missing a key piece of functionality – let me know.


 Posted by at 7:16 pm
Aug 092015

Over the last year, I’ve spent a good deal of time looking for ways to integrate many of the growing linked data services into MarcEdit.  These services, mainly revolving around vocabularies, provide some interesting opportunities for augmenting our existing MARC data, or enhancing local systems that make use of these particular vocabularies.  Examples like those at the Bentley ( are real-world demonstrations of how computers can take advantage of these endpoints when they are available.

In MarcEdit, I’ve been creating and testing linking tools for close to a year now, and one of the areas I’ve been waiting to explore is whether libraries can utilize linking services to build their own authorities workflows.  Conceptually, it should be possible – the necessary information exists…it’s really just a matter of putting it together.  So, that’s what I’ve been working on.  Utilizing the linked data libraries found within MarcEdit, I’ve been working to create a service that will help users identify invalid headings and records where those headings reside.

Working Wireframes

Over the last week, I’ve prototyped this service.  The way that it works is pretty straightforward.  The tool extracts the data from the 1xx, 6xx, and 7xx fields, and if they are tagged as being LC controlled, I query the service to see what information I can learn about the heading.  Additionally, since this tool is designed for work in batch, there is a high likelihood that headings will repeat – so MarcEdit is generating a local cache of headings as well – this way it can check against the local cache rather than the remote cache when possible.  The local cache will constantly be grown – with materials set to expire after a month.  I’m still toying with what to do with the local cache, expirations, and what the best way to keep it in sync might be.  I’d originally considered pulling down the entire LC names and subjects headings – but for a desktop application, this didn’t make sense.  Together, these files, uncompressed, consumed GBs of data.  Within an indexed database, this would continue to be true.  And again, this file would need to be updated regularly.  To, I’m looking for an approach that will give some local caching, without the need to make the user download and managed huge data files.

Anyway – the function is being implemented as a Report.  Within the Reports menu in the MarcEditor, you will eventually find a new item titled Validate Headings.


When you run the Validate Headings tool, you will see the following window:


You’ll notice that there is a Source file.  If you come from the MarcEditor, this will be prepopulated.  If you come from outside the MarcEditor, you will need to define the file that is being processed.  Next, you select the elements to authorize.  Then Click Process.  The Extract button will initially be enabled until after the data run.  Once completed, users can extract the records with invalid headings.

When completed, you will receive the following report:


This includes the total processing time, average response from LC’s service, total number of records, and the information about how the data validated.  Below, the report will give you information about headings that validated, but were variants.  For example:

Record #846
Term in Record: Arnim, Bettina Brentano von, 1785-1859
LC Preferred Term: Arnim, Bettina von, 1785-1859

This would be marked as an invalid heading, because the data in the record is incorrect.  But the reporting tool will provide back the Preferred LC label so the user can then see how the data should be currently structured.  Actually, now that I’m thinking about it – I’ll likely include one more value – the URI to the dataset so you can actually go to the authority file page, from this report.

This report can be copied or printed – and as I noted, when this process is finished, the Extract button is enabled so the user can extract the data from the source records for processing.

Couple of Notes

So, this process takes time to run – there just isn’t any way around it.  For this set, there were 7702 unique items queried.  Each request from LC averaged 0.28 seconds.  In my testing, depending on the time of day, I’ve found that response rate can run between 0.20 seconds per request to 1.2 seconds per response.  None of those times are that bad when done individually, but when taken in aggregate against 7700 queries – it adds up.  If you do the math, 7702*0.2 = 1540 seconds to just ask for the data.  Divide that by 60 and you get 25.6 minutes.  The total time to process that means that there are 11 minutes of “other” things happening here.  My guess, that other 11 minutes is being eaten up by local lookups, character conversions (since LC request UTF8 and my data was in MARC8) and data normalization.  Since there isn’t anything I can do about the latency between the user and the LC site – I’ll be working over the next week to try and remove as much local processing time from the equation as possible.

Questions – let me know.


 Posted by at 7:44 am
Aug 022015

MarcEdit Mac users, a new preview update has been made available.  This is getting pretty close to the first “official” version of the Mac version.  And for those that may have forgotten, the preview designation will be removed on Sept. 1, 2015.

So what’s been done since the last update?  Well, I’ve pretty much completed the last of the work that was scheduled for the first official release.  At this point, I’ve completed all the planned work on the MARC Tools and the MarcEditor functions.  For this release, I’ve completed the following:

** 1.0.9 ChangeLog

  • Bug Fix: Opening Files — you cannot select any files but a .mrc extension. I’ve changed this so the open dialog can open multiple file types.
  • Bug Fix: MarcEditor — when resizing the form, the filename in the status can disappear.
  • Bug Fix: MarcEditor — when resizing, the # of records per page moves off the screen.
  • Enhancement: Linked Data Records — Tool provides the ability to embed URI endpoints to the end of 1xx, 6xx, and 7xx fields.
  • Enhancement: Linked Data Records — Tool has been added to the Task Manager.
  • Enhancement: Generate Control Numbers — globally generates control numbers.
  • Enhancement: Generate Call Numbers/Fast Headings – globally generated call numbers/fast headings for selected records.
  • Enhancement: Edit Shortcuts — added back the tool to enabled Record Marking via a comment.

Over the next month, I’ll be working on trying to complete four other components prior to the first “official” release Sept. 1.  This means that I’m anticipating at least 1, maybe 2 more large preview releases before Sept. 1, 2015.  The four items I’ll be targeting for completion will be:

  1. Export Tab Delimited Records Feature — this feature allows users to take MARC data and create delimited files (often for reporting or loading into a tool like Excel).
  2. Delimited Text Translator — this feature allows users to generate MARC records from a delimited file.  The Mac version will not, at least initially, be able to work with Excel or Access data.  The tool will be limited to working with delimited data.
  3. Update Preferences windows to expose MarcEditor preferences
  4. OCLC Metadata Framework integration…specifically, I’d like to re-integrate the holdings work and the batch record download.

How do you get the preview?  If you have the current preview installed, just open the program and as long as you have the notifications turned on – the program will notify that an update is available.  Download the update, and install the new version.  If you don’t have the preview installed, just go to: and select the Mac app download.

If you have any questions, let me know.


 Posted by at 4:42 pm

MarcEdit 6 Updates

 MarcEdit  Comments Off on MarcEdit 6 Updates
Jul 292015

I hadn’t planned on putting together an update for the Windows version of MarcEdit this week, but I’ve been working with someone putting the Linked Data tools through their paces and came across instances where some of the linked data services were not sending back valid XML data – and I wasn’t validating it.  So, I took some time and added some validation.  However, because the users are processing over a million items through the linked data tool, I also wanted to provide a more user friendly option that doesn’t require opening the MarcEditor – so I’ve added the linked data tools to the command line version of MarcEdit as well. 

Linked Data Command Line Options:

The command line tool is probably one of those under-used and unknown parts of MarcEdit.  The tool is a shim over the code libraries – exposing functionality from the command line, and making it easy to integrate with scripts written for automation purposes.  The tool has a wide range of options available to it – and for users unfamiliar with the command line tool – they can get information about the functionality offered by querying help.  For those using the command line tool – you’ll likely want to create an environmental variable pointing to the MarcEdit application directory so that you can call the program without needing to navigate to the directory.  For example, on my computer, I have an environmental variable called: %MARCEDIT_PATH% which points to the MarcEdit app directory.  This means that if I wanted to run the help from my command line for the MarcEdit Command Line tool, I’d run the following and get the following results:

C:\Users\reese.2179>%MARCEDIT_PATH%\cmarcedit -help
* MarcEdit 6.1 Console Application
* By Terry Reese
* email:
* Modified: 2015/7/29
        -s:     Path to file to be processed.
                        If calling the join utility, source must be files
                        delimited by the ";" character
        -d:     Path to destination file.
                          If call the split utility, dest should specify a fold
                        where split files will be saved.
                        If this folder doesn't exist, one will be created.
        -rules: Rules file for the MARC Validator.
        -mxslt: Path to the MARCXML XSLT file.
        -xslt:  Path to the XML XSLT file.
        -batch: Specifies Batch Processing Mode
        -character:     Specifies character conversion mode.
        -break: Specifies MarcBreaker algorithm
        -make:  Specifies MarcMaker algorithm
        -marcxml:       Specifies MARCXML algorithm
        -xmlmarc:       Specifics the MARCXML to MARC algorithm
        -marctoxml:     Specifies MARC to XML algorithm
        -xmltomarc:     Specifies XML to MARC algorithm
        -xml:   Specifies the XML to XML algorithm
        -validate:      Specifies the MARCValidator algorithm
        -join:  Specifies join MARC File algorithm
        -split: Specifies split MARC File algorithm
        -records:       Specifies number of records per file [used with split c
        -raw:   [Optional] Turns of mnemonic processing (returns raw data)
        -utf8:  [Optional] Turns on UTF-8 processing
        -marc8: [Optional] Turns on MARC-8 processing
        -pd:    [Optional] When a Malformed record is encountered, it will modi
y the process from a stop process to one where an error is simply noted and a s
ub note is added to the result file.
        -buildlinks:    Specifies the Semantic Linking algorithm
This function needs to be paired with the -options parameter
        -options        Specifies linking options to use: example: lcid,viaf:lc
oclcworkid,autodetect           lcid: utilizes to link 1xx/7xx data
                autodetect: autodetects subjects and links to know values
                oclcworkid: inserts link to oclc work id if present
                viaf: linking 1xx/7xx using viaf.  Specify index after colon. I
 no index is provided, lc is assumed.
                        VIAF Index Values:
                        all -- all of viaf
                        nla -- Australia's national index
                        vlacc -- Belgium's Flemish file
                        lac -- Canadian national file
                        bnc -- Catalunya
                        nsk -- Croatia
                        nkc -- Czech.
                        dbc -- Denmark (dbc)
                        egaxa -- Egypt
                        bnf -- France (BNF)
                        sudoc -- France (SUDOC)
                        dnb -- Germany
                        jpg -- Getty (ULAN)
                        bnc+bne -- Hispanica
                        nszl -- Hungary
                        isni -- ISNI
                        ndl -- Japan (NDL)
                        nli -- Israel
                        iccu -- Italy
                        LNB -- Latvia
                        LNL -- Lebannon
                        lc -- LC (NACO)
                        nta -- Netherlands
                        bibsys -- Norway
                        perseus -- Perseus
                        nlp -- Polish National Library
                        nukat -- Poland (Nukat)
                        ptbnp -- Portugal
                        nlb -- Singapore
                        bne -- Spain
                        selibr -- Sweden
                        swnl -- Swiss National Library
                        srp -- Syriac
                        rero -- Swiss RERO
                        rsl -- Russian
                        bav -- Vatican
                        wkp -- Wikipedia

        -help:  Returns usage information

The linked data option uses the following pattern: cmarcedit.exe –s [sourcefile] –d [destfile] –buildlinks –options [linkoptions]

As noted above in the list, –options is a comma delimited list that includes the values that the linking tool should query.  A user, for example, looking to generate workids and uris on the 1xx and 7xx fields using – the command would look like:

<< cmarcedit.exe –s [sourcefile] –d [destfile] –buildlinks –options oclcworkid,lcid

Users interesting in building all available linkages (using viaf, autodetecting subjects, etc. would use:

<< cmarcedit.exe –s [sourcefile] –d [destfile] –buildlinks –options oclcworkid,lcid,autodetect,viaf:lc

Notice the last option – viaf. This tells the tool to utilize viaf as a linking option in the 1xx and the 7xx – the data after the colon identifies the index to utilize when building links.  The indexes are found in the help (see above).

Download information:

The update can be found on the downloads page: or using the automated update tool within MarcEdit.  Direct links:

Mac Port Update:

Part of the reason I hadn’t planned on doing a Windows update of MarcEdit this week is that I’ve been heads down making changes to the Mac Port.  I’ve gotten good feedback from folks letting me know that so far, so good.  Over the past few weeks, I’ve been integrating missing features from the MarcEditor into the Port, as well as working on the Delimited Text Translation.  I’ll now have to go back and make a couple of changes to support some of the update work in the Linked Data tool – but I’m hoping that by Aug. 2nd, I’ll have a new Mac Port Preview that will be pretty close to completing (and expanding) the initial port sprint. 

Questions, let me know.


 Posted by at 9:39 pm

Code4LibMW 2015 Write-up

 Code4LibMW  Comments Off on Code4LibMW 2015 Write-up
Jul 242015

Whew – it’s be a wonderfully exhausting past few days here in Columbus, OH as the Libraries played host to Code4LibMW.  This has been something that I’ve been looking forward to ever since making the move to The Ohio State University; the C4L community has always been one of my favorites, and while the annual conference continues to be one of the most important meetings on my calendar – it’s within these regional events where I’m always reminded why I enjoy being a part of this community. 

I shared a story with the folks in Columbus this week.  As one of the folks that attended the original C4L meeting in Corvallis back in 2006 (BTW, there were 3 other original attendees in Columbus this week), there are a lot of things that I remember about that event quite fondly.  Pizza at American Dream, my first experience doing a lightening talk, the joy of a conference where people were writing code as they were standing on stage waiting their turn to present, Roy Tennant pulling up the IRC channel while he was on stage, so he could keep an eye on what we were all saying about him.  It was just a lot of fun, and part of what made it fun was that everyone got involved.  During that first event, there were around 80 attendees, and nearly every person made it onto the stage to talk about something that they were doing, something that they were passionate about, or something that they had been inspired to build during the course of the week.  You still get this at times at the annual conference, but with it’s shear size and weight, it’s become much harder to give everyone that opportunity to share the things that interest them, or easily connect with other people that might have those same interests.  And I think that’s the purpose that these regional events can serve. 

By and large, the C4L regional events feel much more like those early days of the C4L annual conference.  They are small, usually free to attend, with a schedule that shifts and changes throughout the day.  They are also the place where we come together, meet local colleagues and learn about all the fantastic work that is being done at institutions of all sizes and all types.  And that’s what the C4LMW meeting was for me this year.  As the host, I wanted to make sure that the event had enough structure to keep things moving, but had a place for everyone to participate.  For me – that was going to be the measure of success…did we not just put on a good program – but did this event help to make connections within our local community.  And I think that in this, the event was successful.  I was doing a little bit of math, and over the course of the two days, I think that we had a participation rate close to 90%, and an opportunity for everyone that wanted to get up and just talk about something that they found interesting.  And to be sure – there is a lot of great work being done out here by my Midwest colleagues (yes, even those up in Michigan Smile).

Over the next few days, I’ll be collecting links and making the slides available via the C4LMW 2015 home page as well as wrapping up a few of the last responsibilities of hosting an event, but I wanted to take a moment and again thank everyone that attended.  These types of events have never been driven by the presentations, the hosts, or the presenters – but have always been about the people that attend and the connections that we make with the people in the room.  And it was a privilege this year to have the opportunity to host you all here in Columbus. 



 Posted by at 7:17 pm

Merge Record Changes

 MarcEdit  Comments Off on Merge Record Changes
Jul 212015

With the last update, I made a few significant modifications to the Merge Records tool, and I wanted to provide a bit more information around how these changes may or may not affect users.  The changes can be broken down into two groups:

  1. User Defined Merge Field Support
  2. Multiple Record merge support

Prior to MarcEdit 6.1, the merge records tool utilized 4 different algorithms for doing record merges.  These were broken down by field class, and as such, had specific functionality built around them since the limited scope of the data being evaluated, made it possible.  Two of these specific functions was the ability for users to change the value in a field group class (say, change control numbers from 001 to 907$b) and the ability for the tool to merge multiple records in a merge file, into the source.

When I made the update to 6.1, I tossed out the 3 field specific algorithms, and standardized on a single processing algorithm – what I call the MARC21 option.  This is an algorithm that processes data from a wide range of fields, and provides a high level of data evaluation – but in doing this, I set the fields that could be evaluated, and the function dropped the ability to merge multiple records into a single source file.  The effect of this was that:

  • Users could no longer change the fields/subfields used to evaluate data for merge outside of those fields set as part of the MARC21 option.
  • if a user had a file that looked like the following —
    sourcefile1 – record 1
    mergefile – record1 (matches source1)
    mergefile – record2
    mergefile – record3 (matches source1)

    Only data from the mergefile – record 1 would be merged.  The tool didn’t see the secondary data that might be in the merge file.  This has always been the case when working with the MARC21 merge option, but by making this the only option, I removed this functionality from the program (as the 3 custom field algorithms did make accommodations for merging data from multiple records into a single source).

With the last update, I’ve brought both of these to elements back to the tool.  When a user utilizes the Merge Records tool, they can change the textbox with the field data – and enter a new field/subfield combination for matching (at this point, it must be a field/subfield combination).  Secondly, the tool now handles the merging of multiple records if those data elements are matched via a title or control number.  Since MarcEdit will treat user defined fields as the same class as a standard number (ISBN technically) for matching – users will now see that the tool can merge duplicate data into a single source file.

Questions about this – just let me know.


 Posted by at 9:06 am