One of the appalling discoveries when taking a closer look at the MarcEdit 6 codebase, was the presence of 3(!) Z39.50 clients (all using slightly different codebases. This happened because of the ILS integration, the direct Z39.50 Database editing, and the actual Z39.50 client. In the Mac version, these clients are all the same thing – so I wanted to emulate that approach in the Windows/Linux version. And as a plus, maybe I would stop (or reduce) my utter distain at having support Z39.50 generally, within any library program that I work with.
* Sidebar – I really, really, really can’t stand working with Z39.50. SRU is a fine replacement for the protocol, and yet, over the 10-15 years that its been available, SRU remains a fringe protocol. That tells me two things:
Library vendors generally have rejected this as a protocol and there are some good reason for this…most vendors that support (and I’m thinking specifically about ExLibris), use a custom profile. This is a pain in the ass because the custom profile requires code to handle foreign namespaces. This wouldn’t be a problem if this only happened occasionally, but it happens all the time. Every SRU implementation works best if you use their custom profiles. I think what made Z39.50 work, is the well-defined set of Bib-1 attributes. The flexibility in SRU is a good thing, but I also think it’s why very few people support it, and fewer understand how it actually works.
That SRU is a poor solution to begin with. Hey, just like OAI-PMH, we created library standards to work on the web. If we had it to do over again, we’d do it differently. We should probably do it differently at this point…because supporting SRU in software is basically just checking a box. People have heard about it, they ask for it, but pretty much no one uses it.
By consolidating the Z39.50 client code, I’m able to clean out a lot of old code, and better yet, actually focus on a few improvements (which has been hard because I make improvements in the main client, but forget to port them everywhere else). The main improvements that I’ll be applying has to do with searching multiple databases. Single search has always allowed users to select up to 5 databases to query. I may remove that limit. It’s kind of an arbitrary one. However, I’ll also be adding this functionality to the batch search. When doing multiple database searches in batch, users will have an option to take all records, the first record found, or potentially (I haven’t worked this one out), records based on order of database preference.
Z39.50 Database Settings:
There will be a preferences panel as well (haven’t created it yet), but this is where you will set proxy information and notes related to batch preferences. You will no longer need to set title field or limits, as the limits are moving to the search screen (this has always needed to be variable) and the title field data is being pulled from preferences already set in the program preferences.
One of the benefits of making the changes is that this folds the z39.50/sru client into the Main MarcEdit application (rather than as a program that was shelled to), which allows me to leverage the same accessibility platform that has been developed for the rest of the application. It also highlights one of the other changes happening in MarcEdit 7. MarcEdit 6- is a collection of about 7 or 8 individual executables. This makes sense in some cases, less sense in others. I’m evaluating all the stand-alone programs and if I replicate the functionality in the main program, then it means that while initially, having these as separate program might have been a good thing, the current structure of the application has changed, and so the code (both external and internal) code needs to be re-evaluated and put in one spot. In the application, this has meant that in some cases, like the Z39.50 client, the code will move into MarcEdit proper (rather being a separate program called mebatch.exe) and for SQL interactions, it will mean that I’ll create a single shared library (rather than replicating code between three different component parts….the sql explorer, the ILS integration, and the local database query tooling).
Over the years, I’ve periodically gotten requests for a much more robust logger in MarcEdit. Currently, when the tool performs a global change, it reports the number of changes made to the user. However, a handful of folks have been wanting much more. Ideally, they’d like to have a log of every change the application makes, which is hard because the program isn’t built that way. I provided the following explanation to the MarcEdit list last week.
The question that has come up a number of times since posting notes about the logger is questions about granularity. There has been a desire to have the tool provide additional information (about the records), more information around change context, and also wondering if this will lead to a preview mode. I think other folks wondered why this process has taken so long to develop. Well, it stems from decisions I make around the development. MarcEdit’s application structure can be summed up by the picture below:
In developing MarcEdit, I have made a number of very deliberate decisions, and one of those is that no one component knows what the other one does. As you can see in this picture, the application parts of MarcEdit don’t actually talk directly to the system components. They are referenced through a messenger, which handles all interactions between the application and the system objects. However, the same is true of communication between the system objects themselves. The editing library, for example, knows nothing about MARC, validation, etc. – it only knows how to parse MarcEdit’s internal file format. Likewise, the MARC library doesn’t know anything about validation, MARC21, or linked data. Those parts live elsewhere. The benefit of this approach is that I can develop each component independent of the other, and avoid breaking changes because all communication runs through the messenger. This gives me a lot of flexibility and helps to enforce MarcEdit’s agnostic view of library data. It’s also how I’ve been able to start including support for linked data components – as far as the tool is concerned, it’s just another format to be messaged.
Of course, the challenge with an approach like this then, is that most of MarcEdit’s functions don’t have a concept of a record. Most functions, for reasons of performance, process data much like an XML sax processor. Fields for edit raise events to denote areas of processing, as do errors, which then push the application into a rescue mode. While this approach allows the tool to process data very quickly, and essentially remove size restrictions for data processing – it introduces issues if, for example, I want to expose a log of the underlying changes. Logs exist – I use them in my debugging, but they exist on a component level, and they are not attached to any particular process. I use messaging identifiers to determine what data I want to evaluate – but these logs are not meant to record a processing history, but rather, record component actions. They can be muddled, but they give me exactly what I need when problems arise. The challenge with developing logging for actual users, is that they would likely want actions associated with records. So, to do that, I’ve added an event handler in the messaging layer. This handles all interaction with the logging subsystem and essentially tracks the internal messaging identifier and assembles data. This means that the logger still doesn’t have a good concept of what a record is, but the messenger does, and can act as a translator.
Anyway – this is how I’ll be providing logging. It will also let me slowly expand the logging beyond the core editing functions if there is interest. It is also how I’ll be able to build services around the log file – to provide parsing and log enhancement, for users that want to add record specific information to a log file, that goes beyond the simple record number identifier that will be used to track changes. This would make log files more permanent (if for example the log was enhanced with a local identifier), but due to the way MarcEdit is designed, and the general lack of a standard control number across all MARC formats (in merging for example, merging on the 001 trips checks of 9 other fields that all could store associated control data), it is my belief that providing ways to enhance the log file after run, while an extra step, will allow me the most flexibility to potentially make greater user of the processing log in the future. It also enables me to continue to keep MARCisms out of the processing library – and focus only on handling data edits.
So that’s pretty much the work in a nut shell. So what do you get. Well, once you turn it on, you get lots of stuff and a few new tools. So, let’s walk through them.
Turning on Logging:
Since Logging only captures changes made within the MarcEditor, you find the logging settings in the MarcEditor Preferences Tab:
Once enabled, the tool will generate a new session in the Log folder each time the Editor starts a new Session. With the logs, come log management. From within the MarcEditor or the Main window, you find the following:
From the MarcEditor, you’ll find in Reports:
Functionally, both areas provide the same functionality, but the MarcEditor reports entry is scoped to the current session logfile and current record file loaded into the Editor (if one is loaded). To manage old sessions, use the entry on the Main Window.
Advanced Log Management
To of the use cases that were laid out for me were the need to be able to enhance logs and the ability to extract only the modified records from a large file. So, I’ve included an Advanced Management tool for just these kinds of queries:
This is an example run from within the MarcEditor.
Anyway – this is a quick write-up. I’ll be recording a couple sessions tomorrow. I’ll also be working to make a new plugin available.
Posted Sept. 1, this update resolves a couple issues. Particularly:
* Bug Fix: Custom Field Sorting: Fields without the sort field may drop the LDR. This has been corrected. * Bug Fix: OCLC Integration: regression introduced with the engine changes when dealing with diacritics. This has been corrected. * Bug Fix: MSI Installer: AUTOUPDATE switch wasn’t being respected. This has been corrected. * Enhancement: MARCEngine: Tweaked the transformation code to provide better support for older processing statements.
* Bug Fix: Custom Field Sorting: Fields without the sort field may drop the LDR. This has been corrected. * Bug Fix: MARCEngine: Regression introduced with the last update that caused one of the streaming functions to lose encoding information. This has been corrected. * Enhancement: MARCEngine: Tweaked the transformation code to provide better support for older processing statements.
I’ll be adding a knowledge-base article, but I updated the windows MSI to fix the admin command-line added to allow administrators to turn off the auto-update feature. Here’s an example of how this works: >>MarcEdit_Setup64.msi /qn AUTOUPDATE=no
I don’t believe the AUTOUPDATE key is case sensitive – but the documented use pattern is upper-case and what I’ll test against going forward.
One of the gaps in the Mac version of MarcEdit has been the lack of a console mode. This update should correct that. However, a couple things about how his works…
1) Mac applications are bundles, so in order to run the console program you need to run against the application bundle. What does this look like? From the terminal, one would run >>/Applications/MarcEdit.app/Contents/MacOS/MarcEdit –console
The –console flag initializes the terminal application and prompts for file names. You can pass the filenames (this must be fully qualified paths at this point) via command-line arguments rather than running in an interactive mode. For example: >>/Applications/MarcEdit.app/Contents/MacOS/MarcEdit –s /users/me/Desktop/source.mrc –d /users/me/Desktop/output.mrk –break
The above would break a MARC file into the mnemonic format. For a full list of console commands, enter: >>/Applications/MarcEdit.app/Contents/MacOS/MarcEdit –help
In the future, the MarcEdit install program will be setting an environmental variable ($MARCEDIT_PATH) on installation. At this point, I recommend opening your .bash_profile, and add the following line: export MARCEDIT_PATH=/Applications/MarcEdit.app/Contents/MacOS
Over the past month, I’ve been working with ExLibris (thank you to Ori Miller at ExLibris) and Boston College (thanks to Margaret Wolfe) to provide direct integration between MarcEdit and Alma via the Alma Apis. Presently, the integration allows users to search, create, and update records. Setup is pretty easy (I think) and once you have your API access setup correctly – you should be off and running. But, it will be interesting to see if that’s the case as more people play around with this in their sandboxes.
Setting up integration
MarcEdit Alma integration requires that you configure an API key with Alma that supports the bib api and the user api. The bib api represents the endpoints where the record editing and retrieval happen, while the user api is used to provide a thin layer of authentication before MarcEdit attempts to run an operation (since Alma doesn’t have it’s own authentication process separate from having a key).
I’d recommend testing this first in your Sandbox. To do this, you’ll need to know your sandbox domain, and be able to configure the API accordingly. If you don’t know how to do this, you’ll want to contact ExLibris.
Once you have your API key, open MarcEdit’s main window and click the Preferences icon.
This will open the Preference’s window. Select the ILS Integration Link, and then check the Enable ILS Integration Checkbox, select Alma from the listbox and then enter the domain for your sandbox. Alma’s API doesn’t require a username, so leave that blank, but enter your API key into the Password Textbox. Finally, you’ll need to have setup a Z39.50 connection to your instance. This is how MarcEdit searches Alma for record retrieval. If you haven’t setup a Z39.50 Connection, you can do that here, or you can open the Z39.50 Client, Select Modify Databases, Add a new Z39.50 Server, and enter the information for your Alma Instance. Here’s an example configuration (minus the username and password) for Boston College’s Sandbox:
With your Z39.50 Server configured and selected – the ILS Integration Preference’s window will look something like this:
Save these settings. Now, when you open the MarcEditor, you’ll see a new menu item:
This menu item will allow you to search and update/create records. To find items, click on the menu and select Search. You’ll get the following window:
If I run a search for Boston, I’ll retrieve 5 results based on the limit set in the Limit textbox:
You can either download all the items by clicking the Download All Items, or you can select the items individually that you want to download, and right click on the Results. This will give you a menu allowing you to download the records.
When downloaded, the record will be opened into MarcEdit like the below:
Couple notes about the download. If the download includes an 852 (and they can) – you’ll want to delete that field, otherwise the field will get duplicated. Right now, I’m trying to figure out if MarcEdit should just remove the value, or if there is an applicable use case for keeping it.
Download the record, make the edits that you want to make to the record, and then click the Update/Create option from the Alma window.
When you click the Update/Create – the tool will upload your data to your Alma server. If there is an error, you’ll receive the returned error message. If the process was successful, you’ll get an message telling you that the data had been processed. If you are interesting in seeing the resulting XML output – MarcEdit automatically copies the data to the clipboard.
Couple of notes about the process – in my testing, I found that updating Serials records was spotty. I’m thinking this might have something to do with permissions – but I’m not positive about that. I’m hoping to do a bit more investigation – but I wanted to get this out for folks to start playing with it and maybe providing some feedback.
Secondly, there is a holdings API – it would be possible to allow users to modify holdings data via MarcEdit, but I’d need use-cases in order to see how it fits into this process.
I’m sure this will be a process that I’ll be refining over the next few weeks – but in the mean time, I’d welcome any and all comments.
* I’ll be posting a short youtube video and will update the url here.
This all started with a conversation over twitter (https://twitter.com/_whitni/status/583603374320410626) about a week ago. A discussion about why the current version of MarcEdit is so fragile when being run on a Mac. The short answer has been that MarcEdit utilizes a cross platform toolset when building the UI which works well on Linux and Windows systems, but tends to be less refined on Mac systems. I’ve known this for a while, but to really do it right, I’d need to develop a version of MarcEdit that uses native Mac APIs, which would mean building a new version of MarcEdit for the Mac (at least, the UI components). And I’ve considered it – mapped out a road map – but what’s constantly stopped me has been a lack of interest from the MarcEdit community and a lack of a Mac system. On the community-side, I can count on two hands the number of times I’ve had someone request a version of MarcEdit specifically for a Mac. And since I’ve been making a Mac App version of MarcEdit available – it’s use has been fairly low (though this could be due to the struggles noted above). With an active community of over 20,000, I try to put my time where it will make the most impact, and up until a week ago, better support for Mac systems didn’t seem to be high on the list. The second reason is I don’t own a Mac. My technology stack is made up of about a dozen Windows and Linux systems embedded around my house because they play surprisingly well together, where as, Apple’s walled garden just doesn’t thrive within my ecosystem. So, I’ve been waiting and hoping that the cross-platform toolset would get better and that in time, this problem would eventually go away.
I’m giving that background because apparently I’ve been misreading the MarcEdit community. As I said, this all started with this conversation on twitter (https://twitter.com/_whitni/status/583603374320410626) – and out of that, two co-conspirators, Whitni Watkins and Francis Kayiwa set out to see just how much interest there actually was in having dedicated version of MarcEdit for the Mac. The two set out to see if they could raise funds to acquire a Mac to do this development and indirectly, demonstrate that there was actually a much larger slice of the community interested in seeing this work done. And, so, off they went – and I set back and watched. I made a conscious decision that if this was going to happen, it was going to be because the community wanted it and in that, my voice wasn’t necessary. And after 8 days, it’s done. In all, 40 individuals contributed to the campaign, but more importantly to me, I heard directly from around 200+ individuals that were hopeful that this project would proceed.
Now the hard work starts. MarcEdit is a program that has been under constant development since 1999 – so even just rewriting the UI components of the application will be a significant undertaking. So, I’m breaking up this work in chunks. I figure it would take approximately 8-12 months to completely port the UI, which is a long-time. Too long…so I’m breaking the development into 3 month “sprints”. the first sprint will target the 80%, the functionality that would make MarcEdit productive when doing MARC editing. This means porting the functionality for all the resources found in the MARC Tools and much of the functionality found in the MarcEditor components. My guess is these two components are the most important functional areas for catalogers – so finishing those would allow the tool to be immediately useful for doing production cataloging and editing. After that – I’ll be able to evaluate the remainder of the program and begin working on functional parity between all versions of the application.
But I’ll admit, at this point, the road map is somewhat even cloudy to me. See, I’ve written up the following document (http://1drv.ms/1ake4gO) and shared it with Whitni and asked her to work with other Mac users to refine the list and let me know what falls into that 80%. So, I’ll be interested to see where their list differs from my own. In the mean time, I’ll be starting work on the port – creating wireframes and spending time over the next week hitting the books and familiarizing myself with Apple’s API docs and the UI best practices (though, I will be trying to keep the program looking very familiar to the current application – best practices be damned). Coding on the new UI will start in earnest around May 1 – and by August 1, 2015, I hope to have the first version built specifically for a Mac available. For those interested in following the development process – I’ll be creating a build page on the MarcEdit website (http://marcedit.reeset.net) and will be posting regular builds as new areas of the application are ported so that folks can try them, and give feedback.
So, that’s where this stands and this point. For those interested in providing feedback, feel free to contact me directly at firstname.lastname@example.org. And for those of you that reached out or participated in the campaign to make this happen, my sincere thanks.
The MarcEdit 101 Webinar Series were created over the course of multiple months for the CARLI (http://www.carli.illinois.edu/) consortium in Spring 2015. In late March 2015, CARLI reached out to me and requested that these webinars be made available to the larger MarcEdit community, so if you find these webinars useful, please reach out and thank the folks at CARLI.
Couple of notes, these webinars are being made available as is, save for the following modifications:
Attendee names have been anonymized. While I’m certain most attendees would have no problem with their names showing up in these webinar lists, the original intended audience was locally scoped to CARLI and it’s members. Masking attendees was done primarily because of this change of scope.
The Q/A at the end of the sessions has generally been removed from the webinars. Again, these are localized webinars and questions asked during the webinars tend to be within the scope of this consortia.
I’ll be making these video available over the next couple of months. Again, if you find these webinars useful, please make sure you let the folks at CARLI know.
Over the past couple of weeks, I’ve been working on expanding the linking services that MarcEdit can work with in order to create identifiers for controlled terms and headings. One of the services that I’ve been experimenting with is NLM’s beta SPARQL endpoint for MESH headings. MESH has always been something that is a bit foreign to me. While I had been a cataloger in my past, my primary area of expertise was with geographic materials (analog and digital), as well as traditional monographic data. While MESH looks like LCSH, it’s quite different as well. So, I’ve been spending some time trying to learn a little more about it, while working on a process to consistently query the endpoint to retrieve the identifier for a preferred Term. Its been a process that’s been enlightening, but also one that has led me to think about how I might create a process that could be used beyond this simple use-case, and potentially provide MarcEdit with an RDF engine that could be utilized down the road to make it easier to query, create, and update graphs.
Since MarcEdit is written in .NET, this meant looking to see what components currently exist that provide the type of RDF functionality that I may be needing down the road. Fortunately, a number of components exist, the one I’m utilizing in MarcEdit is dotnetrdf (https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/browse/). The component provides a robust set of functionality that supports everything I want to do now, and should want to do later.
With a tool kit found, I spent some time integrating it into MarcEdit, which is never a small task. However, the outcome will be a couple of new features to start testing out the toolkit and start providing users with the ability to become more familiar with a key semantic web technology, SPARQL. The first new feature will be the integration of MESH as a known vocabulary that will now be queried and controlled when run through the linked data tool. The second new feature is a SPARQL Browser. The idea here is to give folks a tool to explore SPARQL endpoints and retrieve the data in different formats. The proof of concept supports XML, RDFXML, HTML. CSV, Turtle, NTriple, and JSON as output formats. This means that users can query any SPARQL endpoint and retrieve data back. In the current proof of concept, I haven’t added the ability to save the output – but I likely will prior to releasing the Christmas MarcEdit update.
Proof of Concept
While this is still somewhat conceptual, the current SPARQL Browser looks like the following:
At present, the Browser assumes that data resides at a remote endpoint, but I’ll likely include the ability to load local RDF, JSON, or Turtle data and provide the ability to query that data as a local endpoint. Anyway, right now, the Browser takes a URL to the SPARQL Endpoint, and then the query. The user can then select the format that the result set should be outputted.
Using NLM as an example, say a user wanted to query for the specific term: Congenital Abnormalities – utilizing the current proof of concept, the user would enter the following data:
The idea behind creating this as a general purpose tool, is that in theory, this should work for any SPARQL endpoint. For example, the Project Gutenberg Metadata endpoint. The same type of exploration can be done, utilizing the Browser.
At this point, the SPARQL Browser represents a proof of concept tool, but one that I will make available as part of the MARCNext research toolset:
As part of the next update. Going forward, I will likely refine the Browser based on feedback, but more importantly, start looking at how the new RDF toolkit might allow for the development of dynamic form generation for editing RDF/BibFrame data…at least somewhere down the road.
**A number of members of the MarcEdit community provided feedback while working on these changes. Specifically, Heidi Frank (NYU) and Jim Taylor (of http://jtdata.com) for contributing their time and some artistic skill in creating some of the new functional icons.
In addition to a handful of bug fixes and enhancements, one of the big changes coming to the next MarcEdit update will be around UI changes. I’ve been taking some time and collapsing menus to try and shorten them a bit (they are getting long) and refreshing a few of the screens and tools. The first screen to be refreshed and includes some significant enhancements is the start screen.
Current Start Screen
The MarcEdit start screen has been largely unchanged for close to 5 years. The start screen has included a start screen that includes access to a handful of tools and utilities that I have heard are fairly commonly used.
Current MarcEdit Start Screen
Over the years, I periodically change the tools and utilities available on the start page, but by and large, it has stayed largely static.
Updated Start Screen
The next update will reflect a shift in the start screen design. First, the page will move from a more textual/information screen, to one that is more reliant on both graphics and text to help users find the right tool. Secondly, the start screen will be customizable. The screen will provide the ability for users to define what tools that they want to have quick access too.
Updated Default Screen
The Updated Start Screen will include larger images, with text – to help users quickly locate the tool that they are looking for on the start screen. However, unlink past versions, users can change the tools available from this screen. By clicking on the lower right hand configurations icon, or selecting Preferences from the Tools menu, users will be presented with the following new configuration option:
New Configuration Options
The next configuration options pull out the 12 most commonly used tools/add-ins. Users can select up to 4 of these items and place them on their start screen. By selecting new items, and clicking OK, the user will find that their application lay out changes:
User Configured Interface
Here, I changed the default options to selected the Delimited Text Translator, the Merge Records Tool, the RDA Helper, and the Call Number Generator. These will now be available to me on the front screen whenever I open MarcEdit. And since these configuration changes are linked to a user’s profile, multiple users, on the same computer, could have different Start Screens depending on how the utilize the program.
Selecting 3 items
As noted above, you can select up to 4 user tools to display on the front page. But users have the option to select as few options as they want as well. In this example, I removed an option and only selected the most common 3 tools for the Start Screen.
These UI changes are the first of what will be a handful of changes that I’ll be making to the tool over the next couple of months as I refresh the interface, clean up some old code and look to improve some of the workflows in the application. I’ll be posting wireframes through the MarcEdit listserv when I’m planning major changes, so if you are interested in having a voice on upcoming changes, keep and eye open on the MarcEdit list.
One of the often requested enhancements to the MarcEdit Delimited Text Translator is the ability to auto generate the arguments list. For many users, their spreadsheets or delimited text documents include a line at the beginning of the document defining the data found in the file. I’ve often had folks wonder if I could do anything with that data to help auto generate the arguments list used by MarcEdit to translate the data.
Well, in anticipation of Thanksgiving, I finished working on what will be the next MarcEdit update. I won’t post it till the weekend, but this new version will include and Arguments Auto Generation button that will allow MarcEdit to capture the first line of a data file and if properly formatted, auto configure the Arguments List.
The format supported by the Auto Generation feature is pretty straightforward. It essentially is the following: Field$Subfield[ind1ind2punct].
Let me break down the format definition:
Field – represents the field to be mapped to, i.e.: 245. This is a required value.
$Subfield – represents the subfield to be mapped., i.e.: $a. This is a required value.
ind1 – represents the first indicator. This is an optional value, but if defined, indicator 2 must be defined.
ind2 – represents the second indicator. This is an optional value, but if indicator 1 is defined, indicator 2 must also be defined.
punct – represents the trailing punctuation of the field. This is an optional value. However, if you wish to define the punctuation, you must define the indicator 1 and indicator 2 values as well.
Some examples of the syntax:
245$a — no indicators are defined, the default indicators, 2 blanks, will be used.
245$a10 – defines the field, subfield and indicators 1 and 2.
245$a10. – defines the field, subfield, indicators 1 and 2, and defines a period as the trailing punctuation.
In MarcEdit, you can join fields together. This allows users to join data in multiple columns into a single subfield. In MarcEdit, joined fields are represented by an asterisk “*”. If I wanted to join two or more fields, I can add an asterisk group to the field. For example:
MarcEdit will interpret field 0 and field 1 as being joined fields because the asterisk marks them as joined.
I’ve placed a video on YouTube to demonstrate the upcoming functionality. You can find out more about it here:
If you have questions about this new function or suggestions, let me know.