Dec 192014
 

Over the past couple of weeks, I’ve been working on expanding the linking services that MarcEdit can work with in order to create identifiers for controlled terms and headings.  One of the services that I’ve been experimenting with is NLM’s beta SPARQL endpoint for MESH headings.  MESH has always been something that is a bit foreign to me.  While I had been a cataloger in my past, my primary area of expertise was with geographic materials (analog and digital), as well as traditional monographic data.  While MESH looks like LCSH, it’s quite different as well.  So, I’ve been spending some time trying to learn a little more about it, while working on a process to consistently query the endpoint to retrieve the identifier for a preferred Term. Its been a process that’s been enlightening, but also one that has led me to think about how I might create a process that could be used beyond this simple use-case, and potentially provide MarcEdit with an RDF engine that could be utilized down the road to make it easier to query, create, and update graphs.

Since MarcEdit is written in .NET, this meant looking to see what components currently exist that provide the type of RDF functionality that I may be needing down the road.  Fortunately, a number of components exist, the one I’m utilizing in MarcEdit is dotnetrdf (https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/browse/).  The component provides a robust set of functionality that supports everything I want to do now, and should want to do later.

With a tool kit found, I spent some time integrating it into MarcEdit, which is never a small task.  However, the outcome will be a couple of new features to start testing out the toolkit and start providing users with the ability to become more familiar with a key semantic web technology,  SPARQL.  The first new feature will be the integration of MESH as a known vocabulary that will now be queried and controlled when run through the linked data tool.  The second new feature is a SPARQL Browser.  The idea here is to give folks a tool to explore SPARQL endpoints and retrieve the data in different formats.  The proof of concept supports XML, RDFXML, HTML. CSV, Turtle, NTriple, and JSON as output formats.  This means that users can query any SPARQL endpoint and retrieve data back.  In the current proof of concept, I haven’t added the ability to save the output – but I likely will prior to releasing the Christmas MarcEdit update.

Proof of Concept

While this is still somewhat conceptual, the current SPARQL Browser looks like the following:

image

At present, the Browser assumes that data resides at a remote endpoint, but I’ll likely include the ability to load local RDF, JSON, or Turtle data and provide the ability to query that data as a local endpoint.  Anyway, right now, the Browser takes a URL to the SPARQL Endpoint, and then the query.  The user can then select the format that the result set should be outputted.

Using NLM as an example, say a user wanted to query for the specific term: Congenital Abnormalities – utilizing the current proof of concept, the user would enter the following data:

SPARQL Endpoint: http://id.nlm.nih.gov/mesh/sparql

SPARQL Query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX meshv: <http://id.nlm.nih.gov/mesh/vocab#>
PREFIX mesh: <http://id.nlm.nih.gov/mesh/>

SELECT distinct ?d ?dLabel 
FROM <http://id.nlm.nih.gov/mesh2014>
WHERE {
  ?d meshv:preferredConcept ?q .
  ?q rdfs:label 'Congenital Abnormalities' . 
  ?d rdfs:label ?dLabel . 
} 
ORDER BY ?dLabel 

Running this query within the SPARQL Browser produces a resultset that is formatted internally into a Graph for output purposes.

image

image

image

The images snapshot a couple of the different output formats.  For example, the full JSON output is the following:

{
  "head": {
    "vars": [
      "d",
      "dLabel"
    ]
  },
  "results": {
    "bindings": [
      {
        "d": {
          "type": "uri",
          "value": "http://id.nlm.nih.gov/mesh/D000013"
        },
        "dLabel": {
          "type": "literal",
          "value": "Congenital Abnormalities"
        }
      }
    ]
  }
}

The idea behind creating this as a general purpose tool, is that in theory, this should work for any SPARQL endpoint.   For example, the Project Gutenberg Metadata endpoint.  The same type of exploration can be done, utilizing the Browser.

image

Future Work

At this point, the SPARQL Browser represents a proof of concept tool, but one that I will make available as part of the MARCNext research toolset:

image

As part of the next update.  Going forward, I will likely refine the Browser based on feedback, but more importantly, start looking at how the new RDF toolkit might allow for the development of dynamic form generation for editing RDF/BibFrame data…at least somewhere down the road.

–TR

[1] SPARQL (W3C): http://www.w3.org/TR/rdf-sparql-query/
[2] SPARQL (Wikipedia): http://en.wikipedia.org/wiki/SPARQL
[3] SPARQL Endpoints: http://www.w3.org/wiki/SparqlEndpoints
[4] MarcEdit: http://marcedit.reeset.net
[5] MARCNext: http://blog.reeset.net/archives/1359

Nov 292014
 

While experimenting with doing automatic language translation using the Microsoft Translation API, I got a couple of questions from users asking if this same process could be applied to doing automatic field translation to create localized searching indexes of subject terms.  The specific use case proposed was the generation of a single 653 that included automated translations of the 650$a.  Since this is likely a pretty specific use case with a limited audience, I’ve created this process as a plug-in.  If you are interested in seeing how this works, please see the following video:

If you have questions, let me know.

 

–tr

 Posted by at 9:27 am
Oct 162014
 

As libraries begin to join and participate in systems to test Bibframe principles, my hope is that when possible, I can provide support through MarcEdit to provide these communities a conduit to simplify the publishing of information into those systems.  The first of these test systems is the Libhub Initiative, and working with Eric Miller and the really smart folks at Zepheira (http://zepheira.com/), have created a plug-in specifically for libraries and partners working with the LibHub initiative.  The plug-in provides a mechanism to publish a variety of metadata formats into the system – from MARC, MARCXML, EAD, and MODS data – the process will hopefully help users contribute content and help spur discussion around the data model Zepheira is employing with this initiative.

For the time being, the plug-in is private, and available to any library currently participating in the LibHub project.  However, my understanding is that as they continue to ramp up the system, the plugin will be made available to the general community at large.

For now, I’ve published a video talking about the plug-in and demonstrating how it works.  If you are interested, you can view the video on YouTube.

 

–tr

 Posted by at 8:19 pm
Oct 072014
 

Here’s a snapshot of the server log data as reported through Awstats for the marcedit.reeset.net subdomain. 

Server log stats for Sept. 2014:

  • Logged MarcEdit uses: ~190,000
  • Unique Users: ~17,000
  • Bandwidth Used: ~14 GB

Top 10 Countries by Bandwidth:

  1. United States
  2. Canada
  3. China
  4. India
  5. Australia
  6. Great Britain
  7. Mexico
  8. Italy
  9. Spain
  10. Germany

Countries by Use (with at least 100+ reported uses)

clip_image002[4]

United States

clip_image004[4]

Canada

clip_image006[4]

Australia

clip_image008[4]

Italy

clip_image010[4]

India

clip_image012[4]

Great Britain

clip_image014[4]

China

clip_image016[4]

Finland

clip_image018[4]

Poland

clip_image020[4]

France

clip_image022[4]

Germany

clip_image024[4]

Ukraine

clip_image026[4]

Philippines

clip_image028[4]

Mexico

clip_image030[4]

New Zealand

clip_image032[4]

Brazil

clip_image034[4]

Spain

clip_image036[4]

Russian Federation

clip_image038[4]

Hong Kong

clip_image040[4]

Colombia

clip_image042[4]

Taiwan

clip_image044[4]

Egypt

clip_image046[4]

Sweden

clip_image048[4]

Denmark

clip_image050[4]

Saudi Arabia

clip_image052[4]

Turkey

clip_image054[4]

Argentina

clip_image056[4]

Greece

clip_image058[4]

Belgium

clip_image060[4]

Pakistan

clip_image062[4]

Georgia

clip_image064[4]

Malaysia

clip_image066[4]

Czech Republic

clip_image068[4]

Thailand

clip_image070[4]

Netherlands

clip_image072[4]

Japan

clip_image074[4]

Bangladesh

clip_image076[4]

Chile

clip_image078[4]

Ireland

clip_image080[4]

Switzerland

clip_image082[4]

Vietnam

clip_image084[4]

El Salvador

clip_image086[4]

Venezuela

clip_image088[4]

Kazakhstan

clip_image090[4]

Romania

clip_image092[4]

European country

clip_image094[4]

Norway

clip_image096[4]

Belarus

clip_image098[4]

United Arab Emirates

clip_image100[4]

South Africa

clip_image102[4]

Estonia

clip_image104[4]

Portugal

clip_image106[4]

Singapore

clip_image108[4]

Austria

clip_image110[4]

Indonesia

clip_image112[4]

South Korea

clip_image114[4]

Kenya

clip_image116[4]

Bolivia

clip_image118[4]

Israel

clip_image120[4]

Sudan

clip_image122[4]

Ecuador

clip_image124[4]

Qatar

clip_image126[4]

Nepal

clip_image128[4]

Slovak Republic

clip_image130[4]

Algeria

clip_image132[4]

Lithuania

clip_image134[4]

Costa Rica

clip_image136[4]

Rwanda

clip_image138[4]

Guatemala

clip_image140[4]

Peru

clip_image142[4]

Slovenia

clip_image144[4]

Iran

clip_image146[4]

Morocco

clip_image148[4]

Moldova

clip_image150[4]

Mauritius

clip_image152[4]

Croatia

clip_image154[4]

Kuwait

clip_image156[4]

Republic of Serbia

clip_image158[4]

Armenia

clip_image160[4]

Jordan

clip_image162[4]

Cameroon

clip_image164[4]

Sri Lanka

clip_image166[4]

Puerto Rico

clip_image168[4]

Dominican Republic

clip_image170[4]

Jamaica

clip_image172[4]

Cuba

clip_image174[4]

Iraq

clip_image176[4]

Oman

clip_image178[4]

Zimbabwe

clip_image180[4]

Tunisia

clip_image182[4]

Benin

clip_image184[4]

Uruguay

clip_image186[4]

Honduras

clip_image188[4]

Ivory Coast (Cote D’Ivoire)

clip_image190[4]

Syria

clip_image192[4]

Hungary

clip_image194[4]

Latvia

clip_image196[4]

Cyprus

clip_image198[4]

Macau

clip_image200[4]

Papua New Guinea

clip_image202[4]

Malawi

clip_image204[4]

Nigeria

clip_image206[4]

Netherlands Antilles

clip_image208[4]

Zambia

clip_image210[4]

Tanzania

clip_image212[4]

Panama

clip_image214[4]

Uganda

clip_image216[4]

Palestinian Territories

 

Aland islands

clip_image218[4]

Bosnia-Herzegovina

clip_image220[4]

Ethiopia

 

Tadjikistan

clip_image222[4]

Senegal

clip_image224[4]

Ghana

clip_image226[4]

Mongolia

clip_image228[4]

Luxembourg

 Posted by at 7:23 pm
Oct 072014
 

I sent this note to the MarcEdit listserv late last night, early this morning, but forgot to post here.  Over the weekend, the Ohio State University Libraries hosted our second annual hackaton on the campus.  It’s been a great event, and this year, I had one of the early morning shifts (12 am-5 am) so I decided to use the time to do a little hacking myself.  Here’s a list of the changes:

  • Bug Fix: Merge Records Function: When processing using the control number option (or MARC21 primarily utilizing control numbers for matching) the program could merge incorrect data if large numbers of merged records existed without the data specified to be merged.  The tool would pull data from the previous record used and add that data to the matches.  This has been corrected.
  • Bug Fix: Network Task Directory — this tool was always envisioned as a tool that individuals would point to when an existing folder existed.  However, if the folder doesn’t exist prior to pointing to the location, the tool wouldn’t index new tasks.  This has been fixed.
  • Bug Fix: Task Manager (Importing new tasks) — When tasks were imported with multiple referenced task lists, the list could be unassociated from the master task.  This has been corrected.
  • Bug Fix:  If the plugins folder doesn’t exist, the current Plugin Manager doesn’t create one when adding new plugins.  This has been corrected.
  • Bug Fix: MarcValidator UI issue:  When resizing the form, the clipboard link wouldn’t move appropriately.  This has been fixed.
  • Bug Fix: Build Links Tool — relator terms in the 1xx and 7xx field were causing problems.  This has been corrected.
  • Bug Fix: RDA Helper: When parsing 260 fields with multiple copyright dates, the process would only handle one of the dates.  The process has been updated to handle all copyright values embedded in the 260$c.
  • Bug Fix: SQL Explorer:  The last build introduced a regression error so that when using the non-expanded SQL table schema, the program would crash.  This has been corrected.
  • Enhancement:  SQL Explorer expanded schema has been enhanced to include a column id to help track column value relationships.
  • Enhancement: Z39.50 Cataloging within the MarcEditor — when selecting the Z39.50/SRU Client, the program now seemlessly allows users to search using the Z39.50 client and automatically load the results directly into the open MarcEditor window.

Two other specific notes.  First, a few folks on the listserv have noted trouble getting MarcEdit to run on a Mac.  The issue appears to be MONO related.  Version 3.8.0 appears to have neglected to include a file in the build (which caused GUI operations to fail), and 3.10.0 brings the file back, but there was a build error with the component so the issue continues.  The problems are noted in their release notes as known issues and the bug tracker seems to suggest that this has been corrected in the alpha channels, but that doesn’t help anyone right now.  So, I’ve updated the Mac instruction to include a link to MONO 3.6.0, the last version tested as a stand alone install that I know works.  From now on, I will include the latest MONO version tested, and a link to the runtime to hopefully avoid this type of confusion in the future.

Second – I’ve created a nifty plugin related to the LibHub project.  I’ve done a little video recording and will be making that available shortly.  Right now, I’m waiting on some feedback.  The plugin will be initially released to LibHub partners to provide a way for them to move any data into the project for evaluation – but hopefully in time, it will be able to be more made more widely available.

Updates can be downloaded automatically via MarcEdit, or can be found at: http://marcedit.reeset.net/downloads

Please remember, if you are running a very old copy of MarcEdit 5.8 or lower, it is best practice to uninstall the application prior to installing 6.0.

 

–TR

 Posted by at 6:47 am

MarcEdit 6.0 Update

 MarcEdit  Comments Off
Sep 222014
 

This update is coming a little later than I’d hoped, but I’ve been busying myself with a couple of projects that have been consuming some of my off hours.  Today’s update deals with a handful of issues, as well as provides some new functionality. 

Changes:

  • Bug Fix: Edit Field Function: Field recursion switch (/r) was broken in the last update.  This has been corrected.
  • Enhancement: Edit Field Function: LDR editing support has been added to the function.
  • Enhancement: MarcEditor: Keyboard shortcut for jump to page and jump to records have been added.
  • Enhancement: RDA Helper:  Added a new option to the 260/264 translation that enables users to always utilize a copyright or phonograph symbol.
  • Enhancement: RDA Helper:  Updated the RDA Helper to support the manufacturer or distributor subfields.  When the program encounters these in the 260, the appropriate 264 with second indicator 2 or 3 will be created.
  • Enhancement: RDA Helper:  The new option has been added to the task list.
  • Enhancement: Linked Records Tool:  I’ve added a new option to the Linked Records to allowing the program to embed $0 links to VIAF. 
  • Enhancement: MARCSplit:  The save directory now automatically sets to the desktop rather than the root drive.

You can get the updates via MarcEdit’s automated update tool or at: http://marcedit.reeset.net/downloads

–tr

 Posted by at 9:39 pm
Sep 022014
 

I’ve just posted a new update to MarcEdit.  In addition to fixing the following three issues:

  • Check URL crashes when running…this has been fixed.
  • Delimited Text Translator doesn’t show finishing message…fixed
  • Debugging messagebox shows when processing mnemonic files not using MarcEdit’s documented format.

In addition to these three bug fixes, MarcEdit is including a new tool called MARCNext for testing BibFrame principles. Please note, the BibFrame Testbed currently *does not* work on the MAC platform under MONO.  This is due to an incompatibility in the current version of saxon with the runtime.  It appears that downgrading the version will correct the problem, but I need to make sure there are not any unforeseen issues.  I’ll be working to correct this during the week.

I’ve recorded a couple videos documenting the new functionality.  You can find there here:

You can download the update via MarcEdit’s automated update tool or view the MarcEdit downloads page at: http://marcedit.reeset.net/downloads

–tr

 Posted by at 7:58 pm
Aug 252014
 

As I noted in my last post (http://blog.reeset.net/archives/1359), I’ll be adding a new area to the MarcEdit application called MARCNext.  This will be used to expose a number of research tools for users interested in working with BibFrame data.  In addition to the BibFrame Testbed, I’ll also be releasing a JSON Object Viewer.  The JSON Object Viewer is a specialized viewer designed to parse JSON text and provide an object visualization of the data.  The idea is that this tool could be utilized to render MARC data translated into Bibframe as JSON for easy reading.  However, I’m sure that there will be other uses as well.  I’ve tried to keep the interface simple.  Essentially, you point the tool at a JSON file and the tool will render the file as objects.  From there, you can search and query the data, view the JSON file in Object or Plain text mode, and ultimately, copy data for use elsewhere. 

image

Some additional testing needs to be done to make sure the program works well when coming across poorly formed data – but this tool will be a part of the next update.

–tr

 Posted by at 9:31 pm
Aug 232014
 

While developing MarcEdit 6, one of the areas that I spent a significant amount of time working on was the MarcEdit Research Toolkit.  The Research Toolkit is an easter egg of sorts – it’s a set of tools and utilities that I’ve developed to support my own personal research interests around library metadata – specifically, around the future of library metadata including topics the current BibFrame testing and linked data.  I’ve kept these tools private because they tend to not be fully realized concepts or ideas and have very little in the way of a user interface.  Just as important, many of these tools represent work being created to engage in the conversation that the library community is having around library metadata formats and standards, so things can and do change or drop out of the conversation and are then removed from my toolkit.

While developing MarcEdit 6, one of the goals of the project was to find a way to make some or parts of these tools available to the general MarcEdit community.  To that end, I’ll be making a new area available within MarcEdit called MARCNext.  MARCNext will provide a space to make proof of concept tools available for anyone to use, and offer a simple to use interface that anyone can use to test new bibliographic concepts like BibFrame. 

Presently, I’m evaluating my current workbench to see which of the available tools can be made public.  I have a handful that I think may be applicable – but will need some time to move them from concept to a utility for public consumption.  With that said, I will be making one tool immediately available as part of the next MarcEdit update, and that will be the BibFrame Testbed.  This is code that utilizes the LC XQuery files being developed and distributed at: https://github.com/lcnetdev/marc2bibframe with a handful of changes made to provide better support within MarcEdit.  These are my base files that will enable librarians to easily model their MARC metadata in a variety of serializations.  And using this initial work, I’ll likely add some additional serializations to the list. 

I have two goals for making this particular tool available.  First and foremost, I would like to enable anyone that is interested the ability to take their existing library metadata and model it using Bibframe concepts.  Currently, Library of Congress makes available a handful of commandline tools that users can utilize to process their metadata – but these tools tend to not be designed for the average user.  By making this information available in MarcEdit – I’m hoping to lower the barrier so that anyone can model their data and then engage in the larger discussion around this work. 

Secondly, I’m currently engaging in some work with Zepheira and other early implementers to take Bibframe testing mainstream.  Given the number of users working with MarcEdit, it made a lot of sense to provide tools to support this level of integration.  Likewise, by taking the time to move this work from the concept stage, I’ve been able to develop the start of a framework around these concepts. 

So how is this going to work?  On the next update, you will see a new link within the Main MarcEdit Window called MARCNext. 

image 
MarcEdit Main Window

Click on the MARCNext link, and you will be taken to the public version of the Research Toolkit.  At this point, the only tool being made publically available is the BibFrame Testbed, though this will change.

image 
MarcEdit’s MARCNext Window

Selecting the BibFrame Testbed initializes a simple dialog box to allow a user to select from a variety of library metadata types and convert them using BibFrame principles into a user-defined serialization. 

image 
BibFrame Testbed window

As noted above, this test bed will be the first of a handful of tools that I will eventually be making available.  Will they be useful to anyone – who knows.  Honestly, the questions that these tools are working to answer are not ones that come up on the list serv, and at present, aren’t going to help much in one’s daily cataloging work.  But hopefully they will enable every cataloger that wants to, the ability to engage with some of these new metadata concepts and at least take their existing data and see how it may change utilizing different serializations and concepts.

Questions – feel free to ask.

–tr

 Posted by at 9:36 pm
Aug 142014
 

Latest update has been posted.  The following has been fixed:

  • Bug Fix: Delimited Text Translator — The Edit LDR, Load Template, and AutoGenerate buttons were not responding.  This has been corrected.
  • Bug Fix: MARCCompare — when processing data with improperly formatted mnemonic data, the program doesn’t correctly trap the generated formatting error.
  • Bug Fix: Edit Field Data: When processing control fields by position, the replacement would generate duplicate data.
  • Bug Fix: MARC Tools — the MARC8 and UTF8 conversion checkbox would become grayed out when selecting new functions from the function list.
  • Bug Fix: Task Lists — The Delete Subfield Text task wasn’t respecting the option to delete the entire subfield option.
  • Performance Fix: Re-did the paging code so that for files under 50 MBs, the program utilizes a different, faster method of reading data.  This method utilizes more memory because data processing happens in memory rather than on disk, but this gives a 50 to 75% improvement in speed over the past method.
  • Bug Fix: Classify Tool — when processing dates, if the 008 wasn’t present, the program wouldn’t capture date information from the 260 or 264.
  • Enhancement: Classify Tool — The program now utilizes prefixes within the control number to eliminate false control number matches.
  • Bug Fix: Merge Records Tool — when merging on the 035, the program wasn’t properly normalizing all data for matching.

You can get the update automatically or from the download page at: http://marcedit.reeset.net/downloads

–tr

 Posted by at 10:02 pm