Jul 292015
 

I hadn’t planned on putting together an update for the Windows version of MarcEdit this week, but I’ve been working with someone putting the Linked Data tools through their paces and came across instances where some of the linked data services were not sending back valid XML data – and I wasn’t validating it.  So, I took some time and added some validation.  However, because the users are processing over a million items through the linked data tool, I also wanted to provide a more user friendly option that doesn’t require opening the MarcEditor – so I’ve added the linked data tools to the command line version of MarcEdit as well. 

Linked Data Command Line Options:

The command line tool is probably one of those under-used and unknown parts of MarcEdit.  The tool is a shim over the code libraries – exposing functionality from the command line, and making it easy to integrate with scripts written for automation purposes.  The tool has a wide range of options available to it – and for users unfamiliar with the command line tool – they can get information about the functionality offered by querying help.  For those using the command line tool – you’ll likely want to create an environmental variable pointing to the MarcEdit application directory so that you can call the program without needing to navigate to the directory.  For example, on my computer, I have an environmental variable called: %MARCEDIT_PATH% which points to the MarcEdit app directory.  This means that if I wanted to run the help from my command line for the MarcEdit Command Line tool, I’d run the following and get the following results:

C:\Users\reese.2179>%MARCEDIT_PATH%\cmarcedit -help
***************************************************************
* MarcEdit 6.1 Console Application
* By Terry Reese
* email: reeset@gmail.com
* Modified: 2015/7/29
***************************************************************
Arguments:
        -s:     Path to file to be processed.
                        If calling the join utility, source must be files
                        delimited by the ";" character
        -d:     Path to destination file.
                          If call the split utility, dest should specify a fold
r
                        where split files will be saved.
                        If this folder doesn't exist, one will be created.
        -rules: Rules file for the MARC Validator.
        -mxslt: Path to the MARCXML XSLT file.
        -xslt:  Path to the XML XSLT file.
        -batch: Specifies Batch Processing Mode
        -character:     Specifies character conversion mode.
        -break: Specifies MarcBreaker algorithm
        -make:  Specifies MarcMaker algorithm
        -marcxml:       Specifies MARCXML algorithm
        -xmlmarc:       Specifics the MARCXML to MARC algorithm
        -marctoxml:     Specifies MARC to XML algorithm
        -xmltomarc:     Specifies XML to MARC algorithm
        -xml:   Specifies the XML to XML algorithm
        -validate:      Specifies the MARCValidator algorithm
        -join:  Specifies join MARC File algorithm
        -split: Specifies split MARC File algorithm
        -records:       Specifies number of records per file [used with split c
mmand].
        -raw:   [Optional] Turns of mnemonic processing (returns raw data)
        -utf8:  [Optional] Turns on UTF-8 processing
        -marc8: [Optional] Turns on MARC-8 processing
        -pd:    [Optional] When a Malformed record is encountered, it will modi
y the process from a stop process to one where an error is simply noted and a s
ub note is added to the result file.
        -buildlinks:    Specifies the Semantic Linking algorithm
This function needs to be paired with the -options parameter
        -options        Specifies linking options to use: example: lcid,viaf:lc
oclcworkid,autodetect           lcid: utilizes id.loc.gov to link 1xx/7xx data
                autodetect: autodetects subjects and links to know values
                oclcworkid: inserts link to oclc work id if present
                viaf: linking 1xx/7xx using viaf.  Specify index after colon. I
 no index is provided, lc is assumed.
                        VIAF Index Values:
                        all -- all of viaf
                        nla -- Australia's national index
                        vlacc -- Belgium's Flemish file
                        lac -- Canadian national file
                        bnc -- Catalunya
                        nsk -- Croatia
                        nkc -- Czech.
                        dbc -- Denmark (dbc)
                        egaxa -- Egypt
                        bnf -- France (BNF)
                        sudoc -- France (SUDOC)
                        dnb -- Germany
                        jpg -- Getty (ULAN)
                        bnc+bne -- Hispanica
                        nszl -- Hungary
                        isni -- ISNI
                        ndl -- Japan (NDL)
                        nli -- Israel
                        iccu -- Italy
                        LNB -- Latvia
                        LNL -- Lebannon
                        lc -- LC (NACO)
                        nta -- Netherlands
                        bibsys -- Norway
                        perseus -- Perseus
                        nlp -- Polish National Library
                        nukat -- Poland (Nukat)
                        ptbnp -- Portugal
                        nlb -- Singapore
                        bne -- Spain
                        selibr -- Sweden
                        swnl -- Swiss National Library
                        srp -- Syriac
                        rero -- Swiss RERO
                        rsl -- Russian
                        bav -- Vatican
                        wkp -- Wikipedia

        -help:  Returns usage information

The linked data option uses the following pattern: cmarcedit.exe –s [sourcefile] –d [destfile] –buildlinks –options [linkoptions]

As noted above in the list, –options is a comma delimited list that includes the values that the linking tool should query.  A user, for example, looking to generate workids and uris on the 1xx and 7xx fields using id.loc.gov – the command would look like:

<< cmarcedit.exe –s [sourcefile] –d [destfile] –buildlinks –options oclcworkid,lcid

Users interesting in building all available linkages (using viaf, autodetecting subjects, etc. would use:

<< cmarcedit.exe –s [sourcefile] –d [destfile] –buildlinks –options oclcworkid,lcid,autodetect,viaf:lc

Notice the last option – viaf. This tells the tool to utilize viaf as a linking option in the 1xx and the 7xx – the data after the colon identifies the index to utilize when building links.  The indexes are found in the help (see above).

Download information:

The update can be found on the downloads page: http://marcedit.reeset.net/downloads or using the automated update tool within MarcEdit.  Direct links:

Mac Port Update:

Part of the reason I hadn’t planned on doing a Windows update of MarcEdit this week is that I’ve been heads down making changes to the Mac Port.  I’ve gotten good feedback from folks letting me know that so far, so good.  Over the past few weeks, I’ve been integrating missing features from the MarcEditor into the Port, as well as working on the Delimited Text Translation.  I’ll now have to go back and make a couple of changes to support some of the update work in the Linked Data tool – but I’m hoping that by Aug. 2nd, I’ll have a new Mac Port Preview that will be pretty close to completing (and expanding) the initial port sprint. 

Questions, let me know.

–tr

 Posted by at 9:39 pm
Jul 212015
 

With the last update, I made a few significant modifications to the Merge Records tool, and I wanted to provide a bit more information around how these changes may or may not affect users.  The changes can be broken down into two groups:

  1. User Defined Merge Field Support
  2. Multiple Record merge support

Prior to MarcEdit 6.1, the merge records tool utilized 4 different algorithms for doing record merges.  These were broken down by field class, and as such, had specific functionality built around them since the limited scope of the data being evaluated, made it possible.  Two of these specific functions was the ability for users to change the value in a field group class (say, change control numbers from 001 to 907$b) and the ability for the tool to merge multiple records in a merge file, into the source.

When I made the update to 6.1, I tossed out the 3 field specific algorithms, and standardized on a single processing algorithm – what I call the MARC21 option.  This is an algorithm that processes data from a wide range of fields, and provides a high level of data evaluation – but in doing this, I set the fields that could be evaluated, and the function dropped the ability to merge multiple records into a single source file.  The effect of this was that:

  • Users could no longer change the fields/subfields used to evaluate data for merge outside of those fields set as part of the MARC21 option.
  • if a user had a file that looked like the following —
    sourcefile1 – record 1
    mergefile – record1 (matches source1)
    mergefile – record2
    mergefile – record3 (matches source1)

    Only data from the mergefile – record 1 would be merged.  The tool didn’t see the secondary data that might be in the merge file.  This has always been the case when working with the MARC21 merge option, but by making this the only option, I removed this functionality from the program (as the 3 custom field algorithms did make accommodations for merging data from multiple records into a single source).

With the last update, I’ve brought both of these to elements back to the tool.  When a user utilizes the Merge Records tool, they can change the textbox with the field data – and enter a new field/subfield combination for matching (at this point, it must be a field/subfield combination).  Secondly, the tool now handles the merging of multiple records if those data elements are matched via a title or control number.  Since MarcEdit will treat user defined fields as the same class as a standard number (ISBN technically) for matching – users will now see that the tool can merge duplicate data into a single source file.

Questions about this – just let me know.

–tr

 Posted by at 9:06 am
Jul 202015
 

This update will have four significant changes to three specific algorithms that are high use — so I wanted to give folks a heads up.

1) Merge Records — I’ve updated the process in two ways.  

   a) Users can now change the data in the dropdown box to a user-defined field/subfield combination.  At present, you have defined options: 001, 020, 022, 035, marc21.  You will now be able to specify another field/subfield combination (must be the combination) for matching.  So say you exported your data from your ILS, and your bibliographic number is in a 907$b — you could change the textbox from 001 to 907$b and the tool will now utilize that data, in a control number context — to facilitate matching.  

   b) This meant making a secondary change.  When I shifted to using the MARC21 method, I removed the ability for the algorithm to collapse multiple records of the same type with the merge file into the source.  For example, after the change to the marc21 algorithm, in the following scenario, the following would be true:

 source 1 — record 1
merge 1 — matches record 1
merge 2 — matches record 2
merge 3 — matches record 3

 

The data moved into source 1 would be the data from merge1 — merge 3 wouldn’t be seen.  In the previous version prior to utilizing just the Marc21 option, users could collapse records when using the control number index match.  I’ve updated the merge algorithm, so that default is now to assume that all source data could have multiple merge matches.  This has the practical option of essentially allowing users to take a merge file with multiple duplicates, and merge all data into a single corresponding source file.  But this does represent a significant behavior change — so users need to be aware.

 

2) RDA Helper — 

   a) I’ve updated the error processing to ensure that the tool can fail a bit more gracefully

   b) Updating the abbreviation expansion because the expression I was using could miss values on occasion.  This will catch more content — it should also be a bit faster.

 

3) Linked Data tools — I included the ability to link to OCLC works ids — there were problems when the json outputted was too nested.  This has been corrected.

 

4) Bibframe tool — I’ve updated the mapping used to the current LC flavor.

 

Updates can be found on the downloads page (Windows/Linux) or via the automated update tool.

Direct Links:

 

 Posted by at 11:51 pm
Jul 052015
 

It’s with a little trepidation that I’m formally making the first Public Preview of the MarcEdit OSX version available for download and use.  In fact, as of today, this version is now *the* OSX download available on the downloads page.  I will no longer be building the old code-base for use on OSX.

When I first started this project around Mid-April, I began knowing that this process would take some time.  I’ve been working on MarcEdit continuously for a little over 16 years.  It’s gone through one significant rewrite (when the program moved from Assembly to C#) and has had way too many revisions to count.  In agreeing to take on the porting work — I’d hoped that I could port a significant portion of the program over the course of about 8 months and that by the end of August, I could produce a version of MarcEdit that would cover the 80% or so of the commonly used application toolset.  To do this, it meant porting the MARC Tools portion of the application and the MarcEditor.

Well, I’m ahead of schedule.  Since about 2014, I’ve been reworking a good deal of the application to support a smoother porting process sometime in the future — though, honestly, I wasn’t sure that I’d ever actual do the porting work.  Pleasantly, this early work has made a good deal of the porting work easier allowing me to move faster than I’d anticipated.  As of this posting, a significant portion of that 80% has been converted, and I think that for many people — most of what they probably use daily — has been implemented.  And while I’m ahead of schedule and have been happy with how the porting process has gone, make no mistake — it’s been a lot of work, and a lot of code.  Even though this work has primarily been centered around rewriting just the UI portions of MarcEdit, you are still talking, as of today, close to 200,000 lines of code.  This doesn’t include the significant amount of work I’ve done around the general assemblies that have provided improvements to all MarcEdit users.  Because of that — I need to start getting feedback from users.  While the general assemblies go through an automated testing process — I haven’t, as of yet, come up with an automated testing process for the OSX build.  This means that I’m testing things manually, and simply cannot go through the same leveling of testing that I do each time I build the Windows version.  Most folks may not realize it, but it takes about a day to build the Windows version — as the program goes through various unit tests processing close to 25 million records.  I simply don’t have an equivalent of that process yet, so I’m hoping that everyone interested in this work will give it a spin, use it for real work, and let me know if/when things fall down.

In creating the Preview, I’ve tried to make the process for users as easy as possible.  Users interested in running the program simply need to be running at least OSX 10.8 and download the dmg found here: http://marcedit.reeset.net/downloads.  Once downloaded, run the dmg an a new disk will mount called MarcEdit OSX.  Run this file, and you’ll see the following installer:

MarcEdit OSX installer

MarcEdit OSX installer

Drag the MarcEdit icon into the Applications folder and the application will either install, or overwrite an existing version.  That’s it.  No other downloads are necessary.  On first run, the program will generate a marcedit folder under /users/[yourid]/marcedit.  I realize that this isn’t completely normal — but I need the data accessible outside of the normal app sandbox to easily support updates.  I’d also considered the User Documents folder, but the configuration data probably shouldn’t live there either.  So, this is where I ended up putting it.

So what’s been completed — Essentially, all the MARC Tools functions and a significant amount of the MarcEditor has been completed.  There are some conspicuous functions that are absent at this point though.  The Call Number and Fast Heading generation, the Delimited Text Translator and Exporter, the Select and Delete Selected Records, everything Z39.50 related, as well as the Linked Data tools and the Integration work with OCLC and Koha.  All these are not currently available — but will be worked on.  At this point, what users can do is start letting me know what absent components are impacting you the most, and I’ll see how they fit into the current development roadmap.

Anyway — that’s it.  I’m excited to let you all give this a try, and a little nervous as well.  This has been a significant undertaking which has definitely pushed me a bit, requiring me to learn Object-C in a short period of time, as well as quickly assimilate a significant portion of Apples SDK documents relating to UI design.  I’m sure I’ve missed things, but it’s time to let other folks start working with it.

If you have been interested in this work — download the installer, kick the tires, and give feedback.  Just remember to be gentle.  :)

–TR

Download URL: http://marcedit.reeset.net/downloads

 

 Posted by at 8:40 pm
Jul 052015
 

This was something I’d hoped to get into the last update, but didn’t get the time to test it; so I got it done now.  While at the first MarcEdit User Group meeting at ALA, there was a question about supporting 880 fields when exporting data via tab delimited format.  When you use the tool right now, the program will export all the 880 fields, not a specific 880 field.  This update changes that.  After the update, when you select the 880 field in the Export tab delimited tool, the program will ask you for the linking field.  In this case, the program will then match the 880$6[linkingfield], and pull the selected subfield.  I’m not sure how often this comes up — but it certainly made a lot of sense when the problem was described to me.

You can pick up the download at: http://marcedit.reeset.net/downloads

–tr

 Posted by at 8:33 pm
Jun 192015
 

Logistics

Time: 6:00 – 7:30 pm, Friday, June 26, 2015
Place: Marriott Marquis (map)
Room: Pacific H, capacity: 30

Description:

The MarcEdit user community is large and diverse and honestly, I get to meet far too few community members.  This meeting has been put together to give members of the community a chance to come together and talk about the development road map, hear about the work to port MarcEdit to the Mac, and give me an opportunity to hear from the community.  I’ll talk about future work, areas of potential partnership, as well as hearing from you what you’d like to see in the program to make your metadata live’s a little easier.  If this sounds interesting to you — I really hope to see you there.

Acknowledgements:

A *big* thank you to John Chapman and OCLC for allowing this to happen.  As folks might guess, finding space at ALA can be a challenging and expensive endeavor so when I originally broached the idea with OCLC, I had pretty low expectations.  But they truly went above and beyond any reasonable expectation, working with the hotel and ALA so this meeting could take place.  And why they didn’t ask for it — they have my personal thanks and gratitude.  If you can attend the event, or heck, wish you could have but your schedule made it impossible — make sure you let OCLC know that this was appreciated.

 Posted by at 1:24 pm
Jun 062015
 

Having made one preview available for evaluation and feedback, I’ve been diligently working on updating the tool and working on new functionality.  This includes moving onto developing a new notification service so that preview users know when new builds are available and providing a new preferences window to enable support for changing applicable preferences currently exposed within the application.  From the perspective of new work — I’ve begun working on the MarcEditor.  At this point, I’ve mocked out the window and am starting to create the global editing toolsets and connecting actions to the various UI elements.  This is going to be a bit of a time consuming process — but thankfully, it’s been made somewhat easier by the fact that much of the code within the MarcEditor that’s not platform specific, has been moved outside the application to re-usable assemblies.  There’s a bit more refactoring that needs to be done, and I’ll need to re-think how the program streams data into the MarcEditor edit window since the OSX apis make this a bit difficult — but it’s getting there.  Below — you will find screenshots of some of the new work?

–tr

Mac Port Notification Example

MarcEdit Mac Port Notifications Window Example

MarcEdit Mac Preferences: MARCEngine

MarcEdit Mac Preferences window: Current preferences exposed for update are for the MARCEngine and the Automatic Update notification.

This is the initial MarcEditor Wireframe for the Mac.  This feels pretty solid at this point.

This is the initial MarcEditor Wireframe for the Mac. This feels pretty solid at this point.

Current options in the MarcEditor scheduled for the first release

Current options in the MarcEditor scheduled for the first release

MarcEdit Mac MarcEditor Edit Menu wireframe -- options targeted for the first release

MarcEdit Mac MarcEditor Edit Menu wireframe — options targeted for the first release

MarcEdit Mac MarcEditor Reports Menu wireframe showing functions targeted for the first release

MarcEdit Mac MarcEditor Reports Menu wireframe showing functions targeted for the first release

MarcEdit Mac MarcEditor Tools Menu wireframe showing functions targeted for the first release

MarcEdit Mac MarcEditor Tools Menu wireframe showing functions targeted for the first release

May 292015
 

MarcEdit provides lots of different ways for users to edit their data.  However, one use case that comes up often is the ability to perform an action on a field or fields based on the presence of data within another field.  While you can currently do this in MarcEdit by using tools to isolate the specific records to edit, and then working on just those items — more could be done to make this process easier.  So, to that end, I’ve updated the Replace Function to include a new conditional element that will allow MarcEdit to presort using an in-string or regular expression query, prior to evaluating data for replacement.  Here’s how it will work…

When you first open the Replace Window:

Replace Function Changes

Notice that the conditional string text has been replaced.  This was confusing to folks – because maybe that didn’t reflect exactly what was being done.  Rather, this is an option that allows a user to run an instring or Regular Expression search across your entire record before the Find/Replace is run.  The search options grouped below – these *only* affect the Find/Replace textboxes.  They do not affect the options that are enabled when the Perform Find/Replace If…is checked.  Those data fields have their own toggles for instring (has) or regular expression (regex) matching.

 

If you check the box, the following information will be displayed:

New Replace Functionality

Again – the If  [Textbox] [REGEX] is a search that is performed and must evaluate as true in order for the paired find and replace runs.  The use case for this function are things like:

  • I want to modify the field x but only if foobar is found in field y.

 

There are other ways to do this by extracting data from files and creating lots of different files for processing or writing a script – but this will give users a great deal more flexibility when wanting to perform options, but only if specific data is found within a field.

 

A simple example would be below:

Example of the new Replace

This is a non-real world example of how this function works.  A user wants to change the 050 field to an 090 field, but only if the data in the 945$a is equal to an m-z.  That’s what the new option allows.  By checking the Perform Find/Replace If option, I’m allowed to provide a pre-search that will then filter the data sets that I’m going to actually perform the primary Find/Replace pair on.  Make sense?  I hope so.

Finally – I’ve updated the code around the task wizard so that this information can be utilized within tasks.  This enhancement will be in the next available update.

–tr

 Posted by at 11:09 pm

MarcEdit 6 update

 MarcEdit  Comments Off on MarcEdit 6 update
May 252015
 

I’ve been working hard on making a few changes to a couple of the MarcEdit internal components to improve the porting work.  To that end, I’ve posted an update that targets improvements to the Deduping and the Merging tools.

Updates:

  • Update: Dedup tool — improves the handling of qualified data in the 020, 022, and 035.
  • Update: Merge Records Tool — improves the handling of qualified data in the 020, 022, and 035.

Downloads can be picked up using the automated update tool or by going to: http://marcedit.reeset.net/downloads/

–tr

 Posted by at 11:26 pm

MarcEdit 6 Update

 MarcEdit  Comments Off on MarcEdit 6 Update
May 062015
 

I’ve posted an update.  The change log is below:

  • Enhancement: Dedup Function: Added support for multiple field evaluation. 
  • Enhancement: Dedup Function: Changed default control number field to 001|019$a
  • Enhancement: Merge Function: Added 019 to the evaluation profile
  • Enhancement: Updated the 001, 010$a, 020$a, 022$a, 035$a process so that it utilizes the same process as the MARC21 process.  This adds multiple field support when handling these individual elements.
  • Optimizations: MARCEngine: character conversion code has been moved into the MARCEngine.
  • Optimizations: Dedup code has been moved into the meedit assembly.
  • Update: JSON View JSON processing assembly has been updated.
  • Update: MARCEngine SAXON dependency has been updated.
  • Behavior Change: MARC Tools UI has been updated to utilize a hybrid of a menu and toolbar.
  • Bug Fix: Windows XP Support has been restored to the installer.

You can pick up the update either using the automated update tool or by going to: http://marcedit.reeset.net/downloads/

–tr

 Posted by at 8:11 pm