MarcEdit 6 Updates

I hadn’t planned on putting together an update for the Windows version of MarcEdit this week, but I’ve been working with someone putting the Linked Data tools through their paces and came across instances where some of the linked data services were not sending back valid XML data — and I wasn’t validating it.  So, I took some time and added some validation.  However, because the users are processing over a million items through the linked data tool, I also wanted to provide a more user friendly option that doesn’t require opening the MarcEditor — so I’ve added the linked data tools to the command line version of MarcEdit as well. 

Linked Data Command Line Options:

The command line tool is probably one of those under-used and unknown parts of MarcEdit.  The tool is a shim over the code libraries — exposing functionality from the command line, and making it easy to integrate with scripts written for automation purposes.  The tool has a wide range of options available to it — and for users unfamiliar with the command line tool — they can get information about the functionality offered by querying help.  For those using the command line tool — you’ll likely want to create an environmental variable pointing to the MarcEdit application directory so that you can call the program without needing to navigate to the directory.  For example, on my computer, I have an environmental variable called: %MARCEDIT_PATH% which points to the MarcEdit app directory.  This means that if I wanted to run the help from my command line for the MarcEdit Command Line tool, I’d run the following and get the following results:

C:\Users\reese.2179>%MARCEDIT_PATH%\cmarcedit -help
***************************************************************
* MarcEdit 6.1 Console Application
* By Terry Reese
* email: reeset@gmail.com
* Modified: 2015/7/29
***************************************************************
Arguments:
        -s:     Path to file to be processed.
                        If calling the join utility, source must be files
                        delimited by the ";" character
        -d:     Path to destination file.
                          If call the split utility, dest should specify a fold
r
                        where split files will be saved.
                        If this folder doesn't exist, one will be created.
        -rules: Rules file for the MARC Validator.
        -mxslt: Path to the MARCXML XSLT file.
        -xslt:  Path to the XML XSLT file.
        -batch: Specifies Batch Processing Mode
        -character:     Specifies character conversion mode.
        -break: Specifies MarcBreaker algorithm
        -make:  Specifies MarcMaker algorithm
        -marcxml:       Specifies MARCXML algorithm
        -xmlmarc:       Specifics the MARCXML to MARC algorithm
        -marctoxml:     Specifies MARC to XML algorithm
        -xmltomarc:     Specifies XML to MARC algorithm
        -xml:   Specifies the XML to XML algorithm
        -validate:      Specifies the MARCValidator algorithm
        -join:  Specifies join MARC File algorithm
        -split: Specifies split MARC File algorithm
        -records:       Specifies number of records per file [used with split c
mmand].
        -raw:   [Optional] Turns of mnemonic processing (returns raw data)
        -utf8:  [Optional] Turns on UTF-8 processing
        -marc8: [Optional] Turns on MARC-8 processing
        -pd:    [Optional] When a Malformed record is encountered, it will modi
y the process from a stop process to one where an error is simply noted and a s
ub note is added to the result file.
        -buildlinks:    Specifies the Semantic Linking algorithm
This function needs to be paired with the -options parameter
        -options        Specifies linking options to use: example: lcid,viaf:lc
oclcworkid,autodetect           lcid: utilizes id.loc.gov to link 1xx/7xx data
                autodetect: autodetects subjects and links to know values
                oclcworkid: inserts link to oclc work id if present
                viaf: linking 1xx/7xx using viaf.  Specify index after colon. I
 no index is provided, lc is assumed.
                        VIAF Index Values:
                        all -- all of viaf
                        nla -- Australia's national index
                        vlacc -- Belgium's Flemish file
                        lac -- Canadian national file
                        bnc -- Catalunya
                        nsk -- Croatia
                        nkc -- Czech.
                        dbc -- Denmark (dbc)
                        egaxa -- Egypt
                        bnf -- France (BNF)
                        sudoc -- France (SUDOC)
                        dnb -- Germany
                        jpg -- Getty (ULAN)
                        bnc+bne -- Hispanica
                        nszl -- Hungary
                        isni -- ISNI
                        ndl -- Japan (NDL)
                        nli -- Israel
                        iccu -- Italy
                        LNB -- Latvia
                        LNL -- Lebannon
                        lc -- LC (NACO)
                        nta -- Netherlands
                        bibsys -- Norway
                        perseus -- Perseus
                        nlp -- Polish National Library
                        nukat -- Poland (Nukat)
                        ptbnp -- Portugal
                        nlb -- Singapore
                        bne -- Spain
                        selibr -- Sweden
                        swnl -- Swiss National Library
                        srp -- Syriac
                        rero -- Swiss RERO
                        rsl -- Russian
                        bav -- Vatican
                        wkp -- Wikipedia

        -help:  Returns usage information

The linked data option uses the following pattern: cmarcedit.exe —s [sourcefile] —d [destfile] —buildlinks —options [linkoptions]

As noted above in the list, —options is a comma delimited list that includes the values that the linking tool should query.  A user, for example, looking to generate workids and uris on the 1xx and 7xx fields using id.loc.gov — the command would look like:

<< cmarcedit.exe —s [sourcefile] —d [destfile] —buildlinks —options oclcworkid,lcid

Users interesting in building all available linkages (using viaf, autodetecting subjects, etc. would use:

<< cmarcedit.exe —s [sourcefile] —d [destfile] —buildlinks —options oclcworkid,lcid,autodetect,viaf:lc

Notice the last option — viaf. This tells the tool to utilize viaf as a linking option in the 1xx and the 7xx — the data after the colon identifies the index to utilize when building links.  The indexes are found in the help (see above).

Download information:

The update can be found on the downloads page: http://marcedit.reeset.net/downloads or using the automated update tool within MarcEdit.  Direct links:

Mac Port Update:

Part of the reason I hadn’t planned on doing a Windows update of MarcEdit this week is that I’ve been heads down making changes to the Mac Port.  I’ve gotten good feedback from folks letting me know that so far, so good.  Over the past few weeks, I’ve been integrating missing features from the MarcEditor into the Port, as well as working on the Delimited Text Translation.  I’ll now have to go back and make a couple of changes to support some of the update work in the Linked Data tool — but I’m hoping that by Aug. 2nd, I’ll have a new Mac Port Preview that will be pretty close to completing (and expanding) the initial port sprint. 

Questions, let me know.

–tr


Posted

in

by

Tags: