MarcEdit 7: Add/Delete Field Changes

By reeset / On / In MarcEdit

I’m starting to think about the global editing functions in the MarcEditor – and one of the first things I’m trying to do is start to flesh out a few confusing options related to the interface.  This is the first update in thinking about these kinds of changes

image

The idea here is to make it clear which options belong to which editing groupset as sometimes folks aren’t sure which options are add field options and which are delete field options.  Hopefully, this will make the form easier to decipher.

–tr

MarcEdit 7: Startup Wizard

By reeset / On / In MarcEdit

One of the aspects of MarcEdit that I’ve been trying to think a lot about over the past year, is how to make it easier for users to know which configuration settings are important, and which ones are not.  This is the problem of writing a library metadata application that is MARC agnostic.  There are a lot of assumptions that users make because they associate MARC with the specific flavor of MARC that they are using.  So, for someone who only has exposure to MARC21, associating the title with MARC field 245 would be second nature.  But MarcEdit is used by a large community that doesn’t use MARC21, but UNIMARC (or other flavors for that matter).  For those users, the 245 field has a completely different meaning.

This presents a special challenge.  Simple things, like just displaying title information for a record, gets harder, because assumptions I make for one set of users will cause issues for others.  To address this, MarcEdit has a rich set of application settings, designed to enable users to tell the application a little about the data they are working with.  Once that information is provided, MarcEdit can configure the components and adjust assumptions so title information pulls from the correct fields, or Unicode bits get update in the correct leader locations.  The problem, from a usability perspective, is that these values are sorted into a wide range of other MarcEdit settings and preferences…which raises the question: which are the most important?

If you’ve installed MarcEdit 6 recently on a new computer, the way that the program has attempted to deal with this issue is by showing the preferences window on the application’s first run.  This means that the first time the program is executed, you see the following window:

image

Now, I’m not naïve.  I know that most users just click OK, and the program opens up for them, and they work with MarcEdit until they run across something that might require them to go back and look at the settings.  But when I do MarcEdit workshops, I get some specific questions related to Accessibility questions (i.e., can I make the fonts bigger or change the font), display (my Unicode characters don’t display), UNIMARC versus MARC21, etc.  From the window above, you can answer all the questions above, but you have to know which settings group handles each option.  It’s admittedly a pain, and because of that, most workshops I do include 20-30 minutes just going over the setting that might be worth considering.

With MarcEdit 7, I have an opportunity to rethink how users interact with the program, and I started to think about how other software does this successfully.  By and large, the ones that I think are more successful provide a kind of wizard at the start that helps to push the most important options forward…and the best examples include a little bit of whimsy in the process.  No, I might not do whimsy well, but I can think about the setting groups that might be the most important to bring front and center to the user.

To that end, I’ve developed a startup wizard for MarcEdit 7.  All users that install the application will see it (because MarcEdit 7 will install into its own user space, everyone will have this first run experience).  Based on the answers to questions, I’m able to automatically set data in the background to ensure that the application is better configured for the user, the first time they start using MarcEdit, rather than later, when they need help finding configuration settings.   It also will give me an opportunity to bring potential issues to the user’s attention.  So, for example, the tool will specifically look to see if you have a comprehensive Unicode Font installed (so, MS Arial Unicode or the Noto Sans fonts).  If you don’t, the program will point you to help files that discuss how to get one for free; as this will directly impact how the program displays Unicode characters (and comes up all the time given some decisions Microsoft has made in distributing their own Unicode fonts).  Additionally, I’ll be utilizing some automatic translation services, so the program will automatically react to your systems default language settings.  If they are English, text will show in English.  If they are Greek, the interface will show the machine translated Greek.  Users will have the option to change the language in the wizard, and I’ll provide notes about the translations (since machine translations are getting better, but there’s bound to be some pretty odd text. )  The hope is that this will make the program more accessible, and usable…and whimsical.  Yes, there is that too.  MarcEdit 7’s codename was developed after a nickname for my Golden Doodle.  So, she’s volunteered to help get users through the initial startup process.

The Wizard will likely change as I continue to evaluate settings groups, but at this point, I’m kind of leaning towards something that looks like this:

image

image

image

image

 

I’ve had a  few folks walk through this process, and by and large, they find it much more accessible than the current, just show the settings screen, process.  Additionally, they like the idea of the language translations, but wonder if the machine translations will be useful (I did an initial set, they are what they are)…I’ll get more feedback on that before release.  If they aren’t useful, I may remove that option, though I have to feel that for folks where English is a challenge, having anything is better than nothing (though, I could be wrong).

But this is what I’m thinking.  Its hopefully a little fun, easy to walk through, and will allow me to ensure that MarcEdit has been optimally configured for your data.  What do you think?

–tr

MarcEdit 7: Super charging Task Processing

By reeset / On / In MarcEdit

One of the components getting a significant overhaul in MarcEdit 7 is how the application processes tasks.  This work started in MarcEdit 6.3.x, when I introduced a new –experimental bit when processing tasks from the command-line.  This bit shifted task processing from within the MarcEdit application to directly against the libraries where the underlying functions for each task was run.  The process was marked as experimental, in part, because task process have always been tied to the MarcEdit GUI.  Essentially, this is how a task works in MarcEdit:

image

Essentially, when running a task, MarcEdit opens and closes the corresponding edit windows and processes the entire file, on each edit.  So, if there are 30 steps in a task, the program will read the entire file, 30 times.  This is wildly inefficient, but also represents the easiest way that tasks could be added into MarcEdit 6 based on the limitations within the current structure of the program.

In the console program, I started to experiment with accessing the underlying libraries directly – but still, maintained the structure where each task item represented a new pass through the program.  So, while the UI components were no longer being interacted with (improving performance), the program was still doing a lot of file reading and writing.

In MarcEdit 7, I re-architected how the application interacts with the underlying editing libraries, and as part of that, included the ability to process tasks at that more abstract level.  The benefit of this, is that now all tasks on a record can be completed in one pass.  So, using the example of a 30 item task – rather than needing to open and close a file 30 times, the process now opens the file once and then processes all defined task operations on the record.  The tool can do this, because all task processing has been pulled out of the MarcEdit application, and pushed into a task broker.  This new library accepts from MarcEdit the file to process, and the defined task (and associated tasks), and then facilitates task processing at a record, rather than file, level.  I then modified the underlying library functions, which actually was really straightforward given how streams work in .NET. 

Within MarcEdit, all data is generally read and written using the StreamReader/StreamWriter classes, unless I specifically have need to access data at the binary level.  In those cases, I’d use a MemoryStream.  The benefit of using the StreamReader/Writer classes, however, is that it is an instance of the abstract TextReader class.  .NET also has a StringReader class, that allows C# to read strings like a stream – it too is an instance of the TextReader class.  This means that I’ve been able to make the following changes to the functions, and re-use all the existing code while still providing processing at both a file and  a record level:

string function(string sSource, string sDest, bool isFile=true) {

StringBuilder output = new StringBuilder(sDest);

System.IO.TextReader reader = null;
System.IO.TextWriter writer = null;

if (isFile) {

    reader = new System.IO.StreamReader(sSource);
    writer = new System.IO.StreamWriter(output.ToString(), false);

} else {

      output.Clear();  
     reader = new System.IO.StringReader(sSource);
     writer = new System.IO.StringWriter(output);

}

//…Do Stuff

return output.ToString()

}

As a TextReader/TextWriter, I now have access to the necessary functions needed to process both data streams like a file.  This means that I can now handle file or record level processing using the same code – as long as both data sources are in the mnemonic format.  Pretty cool.

What does this mean for users?  It means that in MarcEdit 7, tasks will be supercharged.  In testing, I’m seeing tasks that use to take 1, 2, or 3 minutes to complete now run in a matter of seconds.  So, while there are a lot of really interesting changes planned for MarcEdit 7, this enhancement feels like the one that might have the biggest impact for users as it will represent significant time savings when you consider processing time over the course of a month or year. 

Questions, let me know.

–tr

MarcEdit 7 release schedule planning

By reeset / On / In MarcEdit

I’m going to put this here to help folks that need to work with IT depts when putting new software on their machines.  At this point, with the new features, the updates related to the .NET language changes, the filtering of old XP code and the updated performance code, and new installer – this will be the largest update to the application since I ported the codebase from Assembly to C#.  Just looking at this past weekend, I added close to 17,000 lines of code while completing the clustering work, and removed ~3000 lines of code doing optimization work and removing redundant information. 

In total, work on MarcEdit 7 has been ongoing since April 2017 (formally), and informally since Jan. 2017.  However, last night, I hit a milestone of sorts – I setup the new build environment for MarcEdit 7.  In fact, this morning (around 1 am), I created the first version of the new MarcEdit 7 installer that can installed without administrator permissions.  I’ve heard again and again, the administrator requirements are one of the single biggest issues for users in staying up today.  With MarcEdit 7, the program will provide multiple installation options that should help to alleviate these problems. 

Anyway, given the pace of change and my desire to have some folks put this through its paces prior to the formal release, I’ll be making multiple versions of MarcEdit 7 available for testing using the following schedule below.  Please note, the Alpha and Beta dates are soft dates (they could move up or down by a few days), but the Release Date is a hard date.  Please note, unlike previous versions of MarcEdit, MarcEdit 7 will be able to be installed along-side MarcEdit 6, so both versions will be able to be installed on the same machine.  To simplify this process, all test builds of MarcEdit will be released requiring non-administrator access to install as this will allow me to sandbox the software easier.

Alpha Testing

Sept. 14, 2017 – this will be the first version of MarcEdit.  It won’t be feature complete, but the features included should be finished and working – but I’m expecting to hear from people that some things are broken.  Really, this first version is for those waiting to get their hands on the installer and play with software that likely is a little broken.

Beta Testing:

Oct 2, 2017 – First beta build will be created.  New builds will likely be made available biweekly.

MarcEdit 7 Release Date:

Nov. 25, 2017 – MarcEdit 7.0.x release date.  The release will happen over the U.S. Thanksgiving Holiday. 

This gives users approximately 3 months to ensure that their local systems will be ready for the new update.  Remember, the system requirements are changing.  As of MarcEdit 7, the software will have the following system requirements on Windows (mac and linux already require these requirements):

System Requirements:

  1. Operating System
    Windows 7-present (software may work on Windows Vista, but given the low install-base [smaller than Windows XP], Windows 7 will be the lowest version of Windows I’ll be officially testing on and supporting)
  2. .NET Version
    4.6.1+ –  Version 4.6.1 is the minimal required version of the .NET platform.  If you have Windows 8-10,you should be fine.  If you have Windows 7, you may have to update your .NET instance (though, this will happen automatically if you accept Microsoft’s updates).  If you have questions, you’ll want to contact your IT departments.

That’s it.  But this does represent a very significant change for the program.  For years, I’ve been limping Windows XP support along, and MarcEdit 7 does represent a break from that platform.  I’ll be keeping the last version of MarcEdit 6.3.x available for users that run an unsupported operating system and cannot upgrade, though, I won’t be making any more changes to MarcEdit 6.3.x after MarcEdit 7 comes out. 

If you have questions, let me know.

–tr

MarcEdit 7 alpha: Introducing Clustering tools

By reeset / On / In MarcEdit

Folks sometimes ask me how I decide what kinds of new tools and functions to add to MarcEdit.  When I was an active cataloger/metadata librarian, the answer was easy – I added tools and functions that helped me do my work.  As my work has transitioned to more and more non-MARC/integrations work; I still add things to the program that I need (like the linked data tooling), but I’ve become more reliant on the MarcEdit and metadata communities to provide feedback regarding new features or changes to the program.

This is kind of how the Clustering work came about.  It started with this tweet: https://twitter.com/LibSkrat/status/898189609859002368.  There are already tools that catalogers can use to do large scale data clustering (OpenRefine); and my hope is that more and more individuals make use of them.  But in reading the responses and asking some questions, I started thinking about what this might look like in a tool like MarcEdit – and could I provide a set of lite-weight functionality that would help users solve some problems, while at the same time exposing them to other tooling (like OpenRefine)…and I hope this is what I’ve done.

This work is very much still in active development, but I’ve started the process of creating a new way of batch editing records in MarcEdit.  The clustering tools will be provided as both a stand alone resource and a resource integrated into the MarcEditor, and will be somewhat special in that it will require that the application extract the data out of MARC and store it in a different data model.  This will allow me to provide a different way of visualizing one’s data, and potentially make it easier to surface issues with specific data elements.

The challenge with doing clustering is that this is a very computationally expensive process.  From the indexing of the data out of MARC, to the creation of the clusters using different matching algorithms, the process can take time to generate.  But beyond performance, the question that I’m most interested in right now is how to make this function easier for users to navigate and understand.  How to create an interface that makes it simple to navigate clustered groups and make edits within or across clustered groups.  I’m still trying to think about what this looks like.  Presently, I’ve created a simple interface to test the processes and start asking those questions.

If you are interested in see how this function is being created and some of the assumptions being made as part of the development work – please see: https://youtu.be/DH93QDmeOW8

I’m interested in feedback – particularly around the questions of UI and editing options, so if you see the video and have thoughts, let me know.

–tr

MarcEdit 6.3 Updates (all versions)

By reeset / On / In MarcEdit

I spent sometime this week working on a few updates for MarcEdit 6.3.  Full change log below (for all versions).

Windows/Linux/MacOS:

* Bug Fix: MarcEditor: When processing data with right to left characters, the embedded markers were getting flagged by the validator.
* Bug Fix: MarcEditor: When processing data with right to left characters, I’ve heard that there have been some occasions when the markers are making it into the binary files (they shouldn’t).  I can’t recreate it, but I’ve strengthen the filters to make sure that these markers are removed when the mnemonic file format is saved.
* Bug Fix: Linked data tool:  When creating VIAF entries in the $0, the subfield code can be dropped.  This was missed because viaf should no longer be added to the $0, so I assumed this was no longer a valid use case.  However local practice in some places is overriding best practice.  This has been fixed.

A note on the MarcEditor changes.  The processing of right to left characters is something I was aware of in regards to the validator – but in all my testing and unit tests, the data was always filtered prior to compiling the data.  These markers that are inserted are for display, as noted here: http://blog.reeset.net/archives/2103.  However, on the pymarc list, there was apparently an instance where these markers slipped through.  The conversation can be found here: https://groups.google.com/forum/#!topic/pymarc/5zxuOh0fVuc.  I posted a long response on the list, but I think i t’s being held in moderation (I’m a new member to the list), but generally, here’s what I found.  I can’t recreate it, but I have updated the code to ensure that this shouldn’t happen.  Once a mnemonic file is saved (and that happens prior to compiling), these markers are removed from the file.  I guess if you find this isn’t the case, let me know.  I can add the filter down into the MARCEngine level, but I’d rather not, as there are cases where these values may be present (legally)…this is why the filtering happens in the Editor, where it can assess their use and if the markers are present already, determine if they are used correctly.

Downloads can be picked up through the automated update tool, or via http://marcedit.reeset.net/downloads.

–tr

MarcEdit 7 Z39.50/SRU Client Wireframes

By reeset / On / In MarcEdit

One of the appalling discoveries when taking a closer look at the MarcEdit 6 codebase, was the presence of 3(!) Z39.50 clients (all using slightly different codebases.  This happened because of the ILS integration, the direct Z39.50 Database editing, and the actual Z39.50 client.  In the Mac version, these clients are all the same thing – so I wanted to emulate that approach in the Windows/Linux version.  And as a plus, maybe I would stop (or reduce) my utter distain at having support Z39.50 generally, within any library program that I work with. 

* Sidebar – I really, really, really can’t stand working with Z39.50.  SRU is a fine replacement for the protocol, and yet, over the 10-15 years that its been available, SRU remains a fringe protocol.  That tells me two things:

  1. Library vendors generally have rejected this as a protocol and there are some good reason for this…most vendors that support (and I’m thinking specifically about ExLibris), use a custom profile.  This is a pain in the ass because the custom profile requires code to handle foreign namespaces.  This wouldn’t be a problem if this only happened occasionally, but it happens all the time.  Every SRU implementation works best if you use their custom profiles.  I think what made Z39.50 work, is the well-defined set of Bib-1 attributes.  The flexibility in SRU is a good thing, but I also think it’s why very few people support it, and fewer understand how it actually works.
  2. That SRU is a poor solution to begin with.  Hey, just like OAI-PMH, we created library standards to work on the web.  If we had it to do over again, we’d do it differently.  We should probably do it differently at this point…because supporting SRU in software is basically just checking a box.  People have heard about it, they ask for it, but pretty much no one uses it.

By consolidating the Z39.50 client code, I’m able to clean out a lot of old code, and better yet, actually focus on a few improvements (which has been hard because I make improvements in the main client, but forget to port them everywhere else).  The main improvements that I’ll be applying has to do with searching multiple databases.  Single search has always allowed users to select up to 5 databases to query.  I may remove that limit.  It’s kind of an arbitrary one.  However, I’ll also be adding this functionality to the batch search.  When doing multiple database searches in batch, users will have an option to take all records, the first record found, or potentially (I haven’t worked this one out), records based on order of database preference. 

Wireframes:

Main Window:

image

Z39.50 Database Settings:

image

SRU Settings:

image

There will be a preferences panel as well (haven’t created it yet), but this is where you will set proxy information and notes related to batch preferences.  You will no longer need to set title field or limits, as the limits are moving to the search screen (this has always needed to be variable) and the title field data is being pulled from preferences already set in the program preferences.

One of the benefits of making the changes is that this folds the z39.50/sru client into the Main MarcEdit application (rather than as a program that was shelled to), which allows me to leverage the same accessibility platform that has been developed for the rest of the application.  It also highlights one of the other changes happening in MarcEdit 7.  MarcEdit 6- is a collection of about 7 or 8 individual executables.  This makes sense in some cases, less sense in others.  I’m evaluating all the stand-alone programs and if I replicate the functionality in the main program, then it means that while initially, having these as separate program might have been a good thing, the current structure of the application has changed, and so the code (both external and internal) code needs to be re-evaluated and put in one spot.  In the application, this has meant that in some cases, like the Z39.50 client, the code will move into MarcEdit proper (rather being a separate program called mebatch.exe) and for SQL interactions, it will mean that I’ll create a single shared library (rather than replicating code between three different component parts….the sql explorer, the ILS integration, and the local database query tooling).

Questions, let me know.

–tr

MarcEdit 7 Alpha: the XML/JSON Profiler

By reeset / On / In MarcEdit

Metadata transformations can be really difficult.  While I try to make them easier in MarcEdit, the reality is, the program really has functioned for a long time as a facilitator of the process; handling the binary data processing and character set conversions that may be necessary.  But the heavy lifting, that’s all been on the user.  And if you think about it, there is a lot of expertise tied up in even the simplest transformation.  Say your library gets an XML file full of records from a vendor.  As a technical services librarian, I’d have to go through the following steps to remap that data into MARC (or something else):

  1. Evaluate the vended data file
  2. Create a metadata dictionary for the new xml file (so I know what each data element represents)
  3. Create a mapping between the data dictionary for the vended file and MARC
  4. Create the XSLT crosswalk that contains all the logic for turning this data into MARCXML
  5. Setup the process to move data between XML=>MARC

 

All of these steps are really time consuming, but the development of the XSLT/XQuery to actually translate the data is the one that stops most people.  While there are many folks in the library technology space (and technical services spaces) that would argue that the ability to create XSLT is a vital job skill, let’s be honest, people are busy.  Additionally, there is a big difference between knowing how to create an XSLT and writing a metadata translation.  These things get really complicated, and change all the time (XSLT is up to version 3), meaning that even if you’ve learned how to do this years ago, the skills may be stale or not translate into the current XSLT version.

Additionally, in MarcEdit, I’ve tried really hard to make the XSLT process as simple and straightforward as possible.  But, the reality is, I’ve only been able to work on the edges of this goal.  The tool handles the transformation of binary and character encoding data (since the XSLT engines cannot do that), it uses a smart processing algorithm to try to improve speed and memory handling while still enabling users to work with either DOM or Sax processing techniques.  And I’ve tried to introduce a paradigm that enables reuse and flexibility when creating transformations.  Folks that have heard me speak have likely heard me talk about this model as a wheel and spoke:

image

The idea behind this model is that as long as users create translations that map to and from MARCXML, the tool can automatically enable transformations to any of the known metadata formats registered with MarcEdit.  There are definitely tradeoffs to this approach (for sure, doing a 1-to-1, direct translation would produce the best translation, but it also requires more work and users to be experts in the source and final metadata formats), but the benefit from my perspective is that I don’t have to be the bottleneck in the process.  Were I to hard-code or create 1-to-1 conversions, any deviation or local use within a spec, would render the process unusable…and that was something that I really tried to avoid.  I’d like to think that this approach has been successful, and has enabled technical services folks to make better use of the marked up metadata that they are provided.

The problem is that as content providers have moved more of their metadata operations online,  a large number have shifted away from standards-based metadata to locally defined metadata profiles.  This is challenging because these are one off formats that really are only applicable for a publisher’s particular customers.  As a result, it’s really hard to find conversions for these formats.  The result of this, for me, are large numbers of catalogers/MarcEdit users asking for help creating these one off transformations…work that I simply don’t have time to do.  And that can surprise folks.  I try hard to make myself available to answer questions.  If you find yourself on the MarcEdit listserv, you’ll likely notice that I answer a lot of the questions…I enjoy working with the community.  And I’m pretty much always ready to give folks feedback and toss around ideas when folks are working on projects.  But there is only so much time in the day, and only so much that I can do when folks ask for this type of help.

So, transformations are an area where I get a lot of questions.  Users faced with these publisher specific metadata formats often reach out for advice or to see if I’ve worked with a vendor in the past.  And for years, I’ve been wanting to do more for this group.  While many metadata librarians would consider XSLT or XQuery as required skills, these are not always in high demand when faced with a mountain of content moving through an organization.  So, I’ve been collecting user stories and outlining a process that I think could help: an XML/JSON Profiler.

So, it’s with a lot of excitement, that I can write that MarcEdit 7 will include this tool.  As I say, it’s been a long-term coming; and the goal is to reduce the technical requirements needed to process XML or JSON metadata.

XML/JSON Profiler

To create this tool, I had decide how users would define their data for mapping.  Given that MarcEdit has a Delimited Text Translator for converting Excel data to MARC, I decided to work form this model.  The code produced does a couple of things:

  1. It validates the XML format to be profiled.  Mostly, this means that the tool is making sure that schema’s are followed, namespaces are defined and discoverable, etc.
  2. Output data in MARC, MARCXML, or another XML format
  3. Shifts mapping of data from an XML file to a delimited text file (though, it’s not actually creating a delimited text file).
  4. Since the data is in XML, there is  a general assumption that data should be in UTF8.

 

Users can access the Wizard through the updated XML Functions Editor.  Users open MARC Tools and select Edit XML function list, and you see the following:

image

I highlighted the XML Function Wizard.  I may also make this tool available from the main window.  Once selected, the program walks users through a basic reference interview:

Page 1:

image

 

From here, users just need to follow the interview questions.  User will need a sample XML file that contains at least one record in order to create the mappings against.  As users walk through the interview, they are asked to identify the record element in the XML file, as well as map xml tags to MARC tags, using the same interface and tools as found in the delimited text translator.  Users also have the option to map data directly to a new metadata format by creating an XML mapping file – or a representation of the XML output, which MarcEdit will then use to generate new records.

Once a new mapping has been created, the function will then be registered into MarcEdit, and be available like any other translation.  Whether this process simplifies the conversion of XML and JSON data for librarians, I don’t know.  But I’m super excited to find out.  This creates a significant shift in how users can interact with marked up metadata, and I think will remove many of the technical barriers that exist for users today…at least, for those users working with MarcEdit.

To give a better idea of what is actually happening, I created a demonstration video of the early version of this tool in action.  You can find it here: https://youtu.be/9CtxjoIktwM.  This provides an early look at the functionality, and hopefully help provide some context around the above discussion.  If you are interested in seeing how the process works, I’ve posted the code for the parser on my github page here: https://github.com/reeset/meparsemarkup

Do you have questions, concerns?  Let me know.

 

–tr