MarcEdit 7 alpha: Introducing Clustering tools

By reeset / On / In MarcEdit

Folks sometimes ask me how I decide what kinds of new tools and functions to add to MarcEdit.  When I was an active cataloger/metadata librarian, the answer was easy – I added tools and functions that helped me do my work.  As my work has transitioned to more and more non-MARC/integrations work; I still add things to the program that I need (like the linked data tooling), but I’ve become more reliant on the MarcEdit and metadata communities to provide feedback regarding new features or changes to the program.

This is kind of how the Clustering work came about.  It started with this tweet: https://twitter.com/LibSkrat/status/898189609859002368.  There are already tools that catalogers can use to do large scale data clustering (OpenRefine); and my hope is that more and more individuals make use of them.  But in reading the responses and asking some questions, I started thinking about what this might look like in a tool like MarcEdit – and could I provide a set of lite-weight functionality that would help users solve some problems, while at the same time exposing them to other tooling (like OpenRefine)…and I hope this is what I’ve done.

This work is very much still in active development, but I’ve started the process of creating a new way of batch editing records in MarcEdit.  The clustering tools will be provided as both a stand alone resource and a resource integrated into the MarcEditor, and will be somewhat special in that it will require that the application extract the data out of MARC and store it in a different data model.  This will allow me to provide a different way of visualizing one’s data, and potentially make it easier to surface issues with specific data elements.

The challenge with doing clustering is that this is a very computationally expensive process.  From the indexing of the data out of MARC, to the creation of the clusters using different matching algorithms, the process can take time to generate.  But beyond performance, the question that I’m most interested in right now is how to make this function easier for users to navigate and understand.  How to create an interface that makes it simple to navigate clustered groups and make edits within or across clustered groups.  I’m still trying to think about what this looks like.  Presently, I’ve created a simple interface to test the processes and start asking those questions.

If you are interested in see how this function is being created and some of the assumptions being made as part of the development work – please see: https://youtu.be/DH93QDmeOW8

I’m interested in feedback – particularly around the questions of UI and editing options, so if you see the video and have thoughts, let me know.

–tr