Jun 102013
 

I just posted a new update.  The update includes a major rewrite of the merge algorithm, as well as a few bug fixes and small enhancements.  Here’s the list:

  • Update: Merge Function – this is a rewrite.  The function was rewritten to provide a bit more flexibility, better performance, and a framework for exposing the values used to calculate match points so users can modify weighting if they have specific values that they want to give more importance. 
  • Bug Fix: Find all jump list would report not data found when LDR data was queried.
  • Bug Fix: Jump to record – wouldn’t jump if the record was the last item on a page.
  • Enhancement: Z39.50 – removed the 3 database query limit.  Limit has been bumped up to 250, though at some point, I need to take a look at redoing the interface to make this type of query casting more friendly.
  • Bug Fix: Change Indicator Function – when using two wildcards, the indicators search would report no items found.  This has been corrected.

As always, you can find the new updates at:

–tr

 Posted by at 11:42 pm
May 302013
 

I meant to post this earlier, but then got busy with travel.  Three weeks ago, I was sitting down to make a few changes to the swap field function in MarcEdit and realized after scanning the function, I couldn’t remember how it worked.  This doesn’t happen very often (I probably over document my code), but I hate when it does, and when it does, it’s time to refractor (something I really try not to do very often).  The problem is the swap field function in general.  I’ve known for sometime that I was going to have to address it.  In fact, in the comments header of the function, I had written two years ago (when I first started to think about doing this)

‘****************************************************************************************
* For the love of God, don’t work on this function.  You’ll only break it.
* trust me! – TR 2011-10-17
****************************************************************************************’

Apparently, I must have been interested in enhancing the function back in 2011, and came to the same conclusion I’d come to this past month.

The problem with the Swap Field function is that it’s old…older than old.  If MarcEdit was the Tardis, the swap field function would be like the 3rd Dr. old.  It’s been around forever, and was part of the original codebase written in Assembly.  And that was my problem.  When MarcEdit migrated from Assembly to C#, I rewrote nearly the entire program – you pretty much had to.  But there were a few functions that were really complicated, and they worked, so rather than rewrite, I ported….and porting code from Assembly to a higher level language can sometimes have ugly results.  In my case, 12K lines of ugly.  So, I’ve been ignoring, until I couldn’t any longer.

So what was the enhancement I couldn’t ignore?  Well, it was a request for the tool to do something that it probably ought to, but I hadn’t considered.  And when I looked at the function, I realized that it would take me longer to figure out how to bolt it onto the existing function than it would to rewrite from scratch…and at that point, I started planning the rewrite.

When rewriting a significant amount of code like this, it’s a great opportunity to redefine the scope of the tool, and that’s what I did here.  I had one option in the swap field tool called “Override Default behavior”.  At some point, that meant something to me – but no longer.  I decided to drop it.  It also gave me a change to include new functionality like joining subfields together, creating new fields conditionally, adding support for multiple field appending, etc.  I was able to plan my development so that I could create a new framework for the functionality that would allow me to close 10 or so enhancement requests and provide flexibility to grow into the future.  And that’s what I did.

The total rewrite process took approximately 10 hours (a weekend), reduced the total code for the function from 12k down to around 2.5-3K, and become much more modular, and I was able to break it up into smaller code parts and reuse some already existing functions that exist to support other MarcEdit functions.  Overall, I’m very happy with the result.

In the end, the Swap Field function window did change too:

image

From the above, you can see the new find textbox in the Modified data area, as well as the add to existing/create new option.  The program now also supports joining multiple subfields together – so for example swapping subfields ab from a 050 into a 500 a can be done by setting the Original Field Data subfields to ab and the Modified Data Subfields to a – the program will join the resulting subfield data together.  Lots of small enhancements, as well as improved performance.  All good things.

Of course, the best part of rewriting this function is that I understand it again, and it’s now much easier to understand because the code guides you through the process.  And I got to test this out last week.  I released the updated code 2 weeks ago, and in the first week, a couple of bugs were reported, and correcting those bugs ended up being a trivial exercise.  I also got an enhancement request, and that could easily be accommodated thanks to the new scaffolding.

Anyway, I bring this up for two reasons.  One, I see the first month or so as a shake out period, so if you are a MarcEdit user and happen to see a problem with the function, or you notice the behavior has changed and not for the better, let me know.  I’ve found in the last week that people made heavy use of some undocumented behavior (and by undocumented, I mean, completely unintentional), and as long as it makes sense, I’ll formalize the support.

The second reason is that I think a lot of times, we can let things go and stop innovating because things work and we don’t necessarily have the tools to move it forward.  You may not print it out as plainly as I did in the source code – but it definitely happens and when it does, services stagnate and we get, maybe, a little too attached to the status quo.  Making changes in these circumstances is hard, its painful, and it sometimes means taking a step backward…but in the end, they can provide a much clearer path to future innovation.

–tr

 Posted by at 10:37 am
May 252013
 

With my move to The Ohio State University Libraries, one of the things that I’ve been thinking a lot about is how to tie together a lot of seemingly desperate systems to start creating a coherent digital initiatives infrastructure.  It’s a fun problem to be working though…something I was trying to explain to a friend the other day when they had asked me about my recent career change.  Either I haven’t found a way to make my work sound particularly interesting, or I have a widely different idea of what is fun, but after discussing some of the challenges and opportunities that I see available for Libraries, I got contemplative silence, followed by a comment that they’d worry they might screw everything up.

The funny thing is, I don’t think I really worry about that anymore.  Don’t get me wrong, I can see where they are coming from.  It seems like every day, new projects, new initiatives, new wiz-bang solutions are popping up everyday.  How is a library to decide where to invest their time and money – and what happens if you are wrong (because invariably you will be on occasion).  Between concerns related to changing workflows, migrating legacy data, and just building up capacity within your staff – how do you go about making changes confidently. 

Obviously, there are a lot of things that go into making a decision like this…but as I was working on dinner this evening, I think I found a better way to think about this type of work.  I enjoy cooking, and I enjoy taking recipes or meals that I’m familiar with, and start mixing an matching different flavor profiles together.  Today, for example, I was looking for a way to spice up a pasta dish, and decided that it might be fun to make a thai flavored chili sauce to cook the chicken in and coat the pasta noodles with (rather than a traditional alfredo sauce).  It sounded good, and happily, it tasted good.  But, the point is, I have absolutely no problem messing up dinner (and if you ask my kids, it happens).  The way I look at, is that is, if I screw it up, it’s only one meal and I can always make folks peanut butter and apple butter sandwiches.  I think I like to look at system building the same way.  Invariably, you are going to start down a path and realize that a new tool or component doesn’t match within your environment.  It might have looked like it would fit, but the end result just doesn’t “taste” right.  That’s ok – just try a new recipe and start again. 

I think my favorite thing about cooking is I love making new things.  Sure, there are recipes, but I see those more like loose guidelines, rather than hard and fast rules (this is probably why I don’t do desserts – I find they tend to be a bit less forgiving).  I look at the work I do in libraries the same way.  There are a lot of recipes for building digital infrastructure, but they are really more like loose guidelines.  The trick is to take those recipes and come up with a flavor that meets the needs of your local environment. 

–tr

 Posted by at 5:34 pm
May 232013
 

Over the past few months, I’ve fielded a lot of questions from colleagues at libraries either looking for or starting the process of selecting a new ILS system.  It’s a good time for it – as all the major vendors in the ILS space are shopping new systems and currently trying to court customers from their competitors.  And while I have only ever worked at libraries that are III systems, working with MarcEdit has given me the opportunity to work with libraries, and migrate libraries, from many different systems.

When I get these questions and talk to folks, I think that they are often disappointed that I don’t have a pat answer – I don’t think that there is one right system out there for libraries.  With this new crop of offerings, each have different pain points, and honestly, figuring out what are your points of tolerance really go a long way to determining which system is likely going to be the best fit.  Of course, even with that information, the right choice for an institution, may not be the “right” choice…and this is where I’ve been thinking about III.

While at Oregon State University, I made no secret that I thought that the business-model (Ala carte) and system design (closed box) of the past regime were bad for libraries in general, but especially bad for our library.  And over the years, much of the work that we did in the library was to figure out ways to minimize our reliance on the ILS as a public facing system, and essentially write it out of our infrastructure.  Now, that was not always something that could be easily, or elegantly done, but nearly every project that required access to data from the ILS started with the question of, how can we get the data out of the system so we can do something with it.

I know that my experience isn’t unique.  There is a reason why Millennium has long been a punch-line in the library development community – which is why it is interesting to me to watch how Innovative’s new management and the library community, react to Sierra – and the impact that history and reputation can become one of those pain points.

I guess for full disclosure, moving to The Ohio State University Libraries, I found myself back at an Innovative library (and one working through a migration to Sierra).  This was a bit of u-turn, since Oregon State University, as part of the Orbis/Cascades Alliance, was migrating to ExLibris’s Alma product.  So, I’d already started to make the mental shift to begin working with and getting to know the ExLibris community.  But, I find myself back in the III fold and again find myself thinking about how the ILS fits into the overall library’s infrastructure.

However, after taking a year-long sabbatical from the III community, one of the things that has struck me about the current discussions that I hear around Sierra and I think, underlines one of the major challenges that III’s new management is going to face, moving into the future.

Since rejoining the III community, the most common thing that I hear when people discuss Sierra is that its basically just a spiffed up version of Millennium.  I heard this when we looked at it as part of the Orbis/Cascades Alliance and I hear it now talking to folks in OhioLink thinking about their current migrations.  The problem with this sentiment is that I honestly think that this isn’t a fair statement.  I won’t get into all the reasons why, but I think that it’s not quite fair because it glosses over some of the important work that has been done in Sierra.  Unfortunately for III, that important work isn’t work that users or staff will see, but rather happened at the fundamental system level.  By moving away from their legacy web server and database and simply adopting apache and postgres, III has given III libraries a reliable way to read their metadata.  For the first time, in a long-time, I’m looking at the ILS as a place where I may actually be able to mine data from, rather than simply as a something I would generally ignore.

The problem I see for Innovative’s management at this point, is two fold:

  1. They have a messaging problem.  III wants to talk about the leap forward that they made with Sierra, but the really interesting stuff that would make those gains easy to see (things like Read/Write API access) don’t exist at this point.  Right now, all the improvements are hidden from view, and honestly, unless they provide some reasons for people to care, they will be changes made for a small niche group of people willing to work with postgres (or at least, willing to replicate postgres and index the data into a tool like solr for better response time and flexibility for report writing), and the current sentiment around Sierra will be the reality (regardless of if it is true or not).
  2. Secondly, III has a trust problem.  III’s previous regime had a tendency to overly fragment their system, to the point that the running joke was that if you wanted to do something interesting with the software, you had to buy x and the question was how many zeros would it cost.  III’s ala carte pricing works for a lot of people (I think, I mean, it must, for someone) – but I think that they need to re-evaluate what is part of the core system.  The API for example, this needs to be part of the core system.  Likewise, given a history of unfulfilled promises around API development, III needs to do this development in the open.  Right now, I’ve been hearing about the API development for 2 years, yet the community is still waiting for some kind of document highlighting what will be included.  I’d argue, more of this work needs to be done in the public.

What I’ll be interested in, especially now that I’m back at a III library, is how they work to manage these challenges.  III currently serves a lot of libraries – but I don’t think that these challenges can be understated.  In the past two weeks, I’ve spoken to 8 libraries, all currently Innovative, all seriously looking to migrate to something else.  In talking about their pain points, it’s clear that these libraries have started to wonder how much longer they can trust and wait for III to make the necessary changes to allow III libraries to re-engage with their ILS systems.

At the same time, I’m hopeful, maybe more so than I’ve been  in a very long time.  I’ve spoken with many of the new leaders at III and I think that they understand the problem.  I’m also seeing movement towards collaborations that simply wouldn’t have been possible under the previous management – so it will be really interesting to see how this turns out.  It will definitely be something worth watching,

–tr

 Posted by at 7:30 pm
May 232013
 

This past week, I’ve been out in San Francisco doing some speaking.  One of the groups that I spoke to was the Northern California Technical Processing Group.  A fantastic group, that included a mix of people currently in libraries, as well as a bunch of bright, young, soon to be minted librarians.  So, a really fun group to be around.  My talk tended to focus on some of the trends that I see or hope to see coming out of the work of these large national and multi-national projects like the DPLA, and what I’m hoping they will mean for metadata sharing, aggregation, reuse, etc.  Essentially, I’m trying to convince folks that not only should information be open and free, but it should be slutty, hooking up with anyone and everyone.  And, yes, I’ve posted my slides to slideshare (http://www.slideshare.net/reese_terry/rethinking-shared-metadata-at-the-platform-level).

The funny thing is, while I had a great time, and some great conversations, the thing I found myself most thinking about most was a sentiment that Sarah Houghton made (well actually, a couple things she said…some of which make me realize that she is way more dedicated to the profession that I would be in similar circumstances, I’m sure) about the need for libraries to cultivate those intimate moments, the special stories in peoples lives where they have made a difference.  In Oregon, I served on a public library board, so I can definitely say that this type of connection is important (especially when you have to go before the community and ask for money) – but what got me thinking about this was wondering how many people at the conference have those types of experiences that played into their reason for becoming a librarian, or how many of those folks were like me, accidental librarians that really don’t have those emotional connections to a library (or the idea of libraries, really)

Don’t get me wrong, I wouldn’t work in a library if I didn’t think what I (we) do is important.  And working with public librarians, I was amazed at the impact that can make on their communities and their kids (they remind me of public school teachers in a lot of ways).  But I’ve never been a library user (can I say that?).  Even when I chaired my local library board, I never had a public library card.  And at the same time, I certainly work in libraries because I believe in the transformational nature of the work we do, and of the library as the great social equalizer, and maybe that is enough for me.  Honestly, its the pace of change, the questions, the potential research opportunities that make me excited to work at a research library.

I can say that I hope that my kids have these stories, and I know that they do.  They are avid readers and that’s in no small part to the summer reading programs and involved librarians we’ve had the good fortune of knowing.  And who knows, maybe when I finally get off the promotion and tenure track, I’ll take some time to slow down and start reading for fun again…and maybe then I’ll get an opportunity to get to know my public library…

–TR 

 Posted by at 6:54 pm
Apr 162013
 

Just a couple of notes.  I’ve been spending quite a bit of time as of late testing the current MONO builds against the current MarcEdit codebase.  It looks like the latest stable version works just fine with the program, while the 3.0.x branch has a few rendering issues (not sure if they are mine or not — I’m looking into it).  Anyway, what this means is that I’m looking at working on trying to simplify the installation process for folks using the software on Linux/Mac systems.  I’m starting with Linux (since I run Linux).  Right now, I’m working on using a tool called: makeself (http://megastep.org/makeself/) which will create a self extracting archive and then allow me to execute a cleanup script to take away a number of the necessary steps during the install process.  Ideally, the steps would be:

  1. Run the install script
  2. Run MarcEdit
           OR
  3. Get a report of missing dependencies

This would remove the need to remember to run the bootloader to clean up and set install paths, as well as get the configuration right for Z39.50 use if it yaz is installed on the system.  It should also give me a method to check dependencies and provide a list of packages needing to be installed to make installation as simple as possible (though at this point, the only dependencies are the mono-core and the System.Windows.Forms libraries — which sometimes is and sometimes isn’t included in specific distributions definition of “core” files).

Once I have this build process defined, I’ll find myself a mac and give MONO’s MacPackager a whirl.  It seems like that might be a simple solution to packaging the program, and  at this point, all the Z39.50 components can be disabled to greatly simplify the process.

Ideally, once I have an installer in place, I should be able to tweak MarcEdit’s automated updater so that Mac/Linux users receive notifications and downloads of the current installers.  From there, they would just need to spawn them within their own specific environments.  Easier — I hope so.  And as always, I’ll continue to provide a plan zip file for download, for those folks in environments that don’t allow the installation of software.

Questions, comments — let me know.

–tr

 Posted by at 10:12 am

MarcEdit 5.9 Hotfix

 MarcEdit  Comments Off
Apr 132013
 

Sorry folks – apparently a hotfix Microsoft pushed out caused the setup tool I used to corrupt when working with 64-bit systems.  I’ve removed the hotfix and rebuilt both version (32 and 64) of the program and it looks like things are in good shape now.

If you tried updating today and had trouble, give it another go.  If you updated and didn’t have trouble, you should be fine (though you’ll be prompted to update again).

Sorry about that folks.

–tr

 Posted by at 3:25 pm
Apr 132013
 

So, I’ve been having a few more questions asking about how to setup MarcEdit on Linux and a Mac.  In general, the program works very well on both platforms, but takes a bit of know-how to get them running.  This is primarily due to the Z39.50 components that I link to.  On Linux, installation is very straightforward – on a Mac, the easiest process I’ve found is using Mac Ports – but it can take as long as an hour to install all the dependencies.  Given that Z39.50 is becoming less and less important as a protocol – I’ve worked out a process so that the program can essentially turn off these components by default when installed on non-Windows systems.  To that end, I’m reworking the zip download, and then will be looking at how I can make an installer that can be used on non-Windows platforms to make the installation a little bit easier.  On linux, I don’t think this will be all that difficult (I work primarily in a Windows/Linux environment) – on a Mac, it might take me just a little bit longer because I simply don’t use the platform often. 

If you have questions or some interest in this work, let me know.

–tr

 Posted by at 10:01 am

MarcEdit 5.9 update

 MarcEdit  Comments Off
Apr 132013
 

Here’s the final list of updates that I’ve been working on for this weekend:

  1. Enhancement: The MARCEngine will now allow users to embed mnemonics into a UTF8 encoded file and those mnemonics will be expanded into proper UTF8. This means that if you have a UTF8 file, and are working in the MarcEditor, you could enter {aacute} in the record and MarcEdit will expand it when compiling the records into the proper character: á.
  2. Enhancement: Rest the MarcEdit Config Data. Sometimes, folks will find that MarcEdit is having trouble saving configuration data. Generally, this is due to a file corruption. While it doesn’t happen often, when it does – it can be a pain. This option allows you to quickly reset your configuration data.
  3. Enhancement: Delimited Text Translator – when using the Autogenerate option, the program will now automatically ignore the first line of the file it’s processing (the rules line) when converting data to MARC.
  4. Bug Fix: RDA Helper: Updated the 345 field to capture motion picture information related to screen-size and 3D.
  5. Bug Fix: RDA Helper: The GMD autogenerator was creating duplicate entries when processing data that could be both an electronic resource and something else. When this happens, the program defaults to electronic resource.
  6. Bug Fix: Processing Greek UTF8 to Greek 1253 – the stream wasn’t properly outputting the data. This has been corrected. Moving from Greek 1253 to Greek UTF8 wasn’t affected by this issue.
  7. Enhancement: Export Tab Delimited Records – the tool now allows for the Export of fixed field data by position. I.E., add a field less than 10 – the subfield text box changes to position. Format expected is position:length (i.e., 5:1)
  8. Bug Fix: Fixed the typo in the OAIDCtoMARCXML.xsl (which is used in the default oai-dc translation by the OAI harvester).
  9. Bug Fix: Script wizard – fixed a typo in the outputted script.
  10. Enhancement: MARCValidator – added the ability to find incorrect field lengths. Updated some error descriptions to make them more descriptive.
  11. Bug Fix: RDA Helper – in some cases the first indicator in the 260 wasn’t being retained when the data is moved into a 264.
  12. UI Tweaks.
  13. Bug Fix: Corrected Indicator replacement options
  14. Linux/Mac Installs – I’ve made some changes so you can disable the Z39.50 part of the program and simplify installation. I’ll be updating the process/video to document the new method.
  15. Automatic Updater – I’ve made the changes necessary to support the redirections to the new server where marcedit is being hosted.

For a while, I’ll be hosting the MarcEdit files on both my Oregon State web space and the new domain, marcedit.reeset.net. However, sometime around June, I believe the Oregon State web space will go away.

Direct downloads:

If you run into trouble, let me know.

 Posted by at 12:21 am