LibraryFind code refactoring

By reeset / On / In LibraryFind, Programming, ruby

I’ve been spending time this weekend refactoring a major piece of the LibraryFind code partly in an effort to make it easier to add protocol classes.  This change affects a lot of the current API code-base, but the biggest change comes in the meta_search.rb file where nearly all the business logic relating to searching, etc. will be removed in favor of a loose plug-in architecture that I’m hoping will make it easier to add additional search classes to the program.  However, with all refactoring, there’s a bit of debugging that happens and I tell you, this morning at 4 am, it just wasn’t happening.  The big change deals with about 200 lines of code in the meta_search.rb file (which in turn affects the current files that actually make up the protocols and searching).  These 200 lines of code have been replaced by the following block:

if is_in_cache == false 
    _tmparray = 
    objSearch = nil 
    eval("objSearch = " + _collect.conn_type.capitalize + "") 
    _tmparray = objSearch.SearchCollection(_collect, _qtype, _qstring, _start.to_i, _max.to_i, _last_id, _session_id, _action_type, _data, _bool_obj) 
    if _tmparray != nil: record.concat(_tmparray) end 

Basically, this code snippet is called if the query isn’t located in the cache.  Originally, the code that this snippet replaced was a large case statement that performed different actions depending on what protocol was being utilized.  This snippet moves all this logic into models, where search classes are then plugable.  The protocol functions will have a naming convention, take the same values (though they’ll do different things with them) and in theory will make it easier to add support for new search types.  At least, this is what I’m seeing at this moment as I add the ability to query OpenSearch targets to LF. 

Anyway, I worked on this for about 2 hours this morning.  The new plug-ins were working fine — items were going into the cache and results were being returned.  However, they were being lost during the transition from the API to the UI.  Odd.  Couldn’t figure out what was going on and debugging is difficult because the application is threaded (another change — a dedicated global thread pump) so sometimes errors occur while other parts of the application are executing.  Anyway, at 4 am, I decided to knock off and come back to it in the morning with new eyes to see if I could see what I was doing.

Well, I’m glad I decided to sleep on it.  As I was in church this morning and I had an epiphany.   Its in lines 7 and 8.  The way Ruby’s threading works, variables within the threads are isolated and protected from the rest of the application.  To deal with that, ruby has a syntax that allows you to create thread variables that can be accessed outside the application.  So for example, if I have a variable that I want to access outside of the thread, I would use something like:

Thread.current[“myrecord”] =

This syntax is how plug-ins utilizing the global thread pump will return data to the application.  And there was the rub.  I’d forgotten that Ruby always returns from a function.  Simply for clarity, I always explicitly note what is being returned at the end of each function using the older return syntax:

return xxxxx

I’d conveniently forgotten that feature in the language, and this is what was gumming up the process.  The thread pump would finish evaluating the threads, capturing the thread data and then a string or common array would also be returned (outside of the thread pump) which, since not nil, would overwrite the current record variable.  Once I had these plug-ins start returning nil values and allow data processing to be handled by the thread pump, all was right in the world again.  Unfortunately, I lost 2 hours of sleep last night on this problem, and I’d like to have them back. 🙂


Roy Tennant vs RDA (and AutoCat) :)

By reeset / On / In Digital Libraries, Uncategorized

Ah, what fun.  Working in Technical Services, I tend to lurk on the AutoCat list to keep up and get an idea of what folks are chatting about there.  Normally the conversation is on traditional cataloging issues, but Roy’s latest musings in Library Journal (“Will RDA Be DOA”, url: seems to have raised peoples hackles.  Predictably, catalogers were offended by the article, in part I think, because much of the blame for how our ILS systems currently function unfairly seems to fall at their feet.  This is unfortunate, because I think that Roy’s point has gotten lost in the current discussion on the list — that being that our current bibliographic frameworks are not sufficient for meeting future needs.  But I think more explanation is needed here since many people will read this statement and read into it that I’ve just said that MARC, AACR2 and the people that use them suck — which isn’t the case.   Rather, it represents a need to look at our current bibliographic frameworks (MARC, AACR2 and RDA) and evaluate not how they are working for us today (or yesterday) — but if they will meet our needs in a future where the library community and its data have become less isolated from the rest of the world.  We live in a changing information ecosystem — and libraries need to change with it.  While the retirement of MARC and AACR2 may be the eventual end-game, I doubt those that create such records would really see a big difference in what they do.  In fact, I should point out, to some degree this is already occurring.  Folks that catalog using OCLC’s Connexion client are already cataloging in XML.  The client saves data in XML templates — transmits data in XML — but generates MARC records for export.  So I certainly could envision a future where MARC has been replaced by something else, but where current catalogers simply describe things as they always have.  Anyway…

So what do I mean when I say that our current bibliographic frameworks are not sufficient for meeting future needs?  Well, lets talk about this in terms of AACR2, RDA and MARC.  There are two glaring issues as they relate to our current bibliographic frameworks — and I’m not certain how we solve this issue until we, as a community, move from MARC to something else.  I’ll also note that I don’t hear many people talking about them, which I think is too bad because I think that they are issues that cataloger may relate better to.  Generally, this conversation regarding bibliographic frameworks is framed in relation to what systems folks or coders don’t believe MARC can do.  Sometimes they’re right, sometimes there wrong, most of the time, they are running into real-world implementation of a framework that is constantly in a state of flux, being interpreted by different individuals.  However, in many ways, I think that this line of conversation is fruitless.  I’d like to focus my discussion on two issues that I run into helping MARC users around the globe.

  1. MARC doesn’t interoperate in its current form.  What do I mean?  Well, during the current thread, Roy had discussed a need to isolate full-text materials within his catalog.  AutoCat’rs quickly noted that this information can, if encoded correctly, be inferred from the 856 field — which encodes the URL.  Well, no.  In MARC21 when utilizing AACR2, the 856 field encodes the URL information.  However, this is different in CHINMARC, FINMARC, UNIMARC, etc.  The point is, MARC has lots of flavors spanning many different charactersets.  Having created MarcEdit, I’ve gotten the opportunity to work with catalogers around the world and I can tell you without hesitation that MARC flavors do not play well together.  It’s a struggle because OCLC, Library of Congress, they allow our profession to have a very North American focus (which I know RDA is hoping to overcome) but as long as flavors of MARC exist, so to will the cataloging community continue to be splintered.  Believe it or not, OCLC represents only a small part of the current MARC records being created and not everyone uses the Library of Congress as their gold standard.  MARC21 uses MARC8 and UTF8, but I work with a number of folks in Asia where they use Big5 or others — making these records completely incompatible with MARC21 records.  This is one of the benefits of a metadata schema like MODS — title, etc. are placed in the same place, no matter what descriptive rules are applied to the framework.  Users many use different punctuation rules, etc., but the data will be the same.  This isn’t currently the case when dealing with MARC.
  2. MARC, AACR2 and I believe RDA continue to isolate our community.  Who else uses MARC?  Anyone?  While AACR2, MARC, etc. have served our communities for a number of years (~40), it might be time to put this pony out to stud and develop a framework and metadata schema that will allow the library community to leverage mindshare from outside our small community.  Currently, the tools and professional vision of our profession continues to be shaped by a small number of vendors providing solutions for our MARC data.  If the library community adopted MODS or a variety of metadata schemas (for example, FGDC for cartographic materials, MODS for books and serials, etc).  then while our bibliographic frameworks would still be our own and library centric — the ability to build tools for, integrate data with — would expand beyond our little community.  The global IT community speaks XML, not MARC.  It’s time we join the rest of the world in this regard.

While I realize that agreeing with Roy, even partly, may cause me to forfeit my secret technical service decoder ring :), but I think that at some point, this is a conversation that the technical services community needs to seriously have.  I know, I know — we are having this conversation now the RDA.  Well no, because RDA allowed our current bibliographic framework to be part of the discussion and to some degree guide decisions.  We need to have this conversation without considering what we are doing now.  For me, the biggest concern that I have with our current bibliographic frameworks is the way in which it isolates our community.  There are a number of very bright people working in libraries — but imagine what our community could do if we could tap into the mindshare outside our little community and leverage open source projects directly — without having to first take our data out of MARC and into something like MARC21XML, MODS, etc.  As someone doing some of this work, I find it telling that the first step to designing any system around data currently in MARC, is that I have to take the data out of MARC, correct it for inconsistencies, massage it to make it more straightforward — just so that the information is useful within non-library systems. 


MarcEdit 5 update

By reeset / On / In MarcEdit

I’ve posted an update to MarcEdit.  Couple of changes:

  1. MARC21XML output: Larry Dixon from LC let me know that the xmlschema location value was incorrectly set on occasion.  Didn’t notice for two reasons: 1) It doesn’t affect my xslt processors and 2) I don’t use namespaces.
  2. Export Selected/Delete Selected — ability to use a file to include a batch set of arguments to select.  If you want to use this function, you will need to include: FILE#:[filename] in the find text box.
  3. Export Selected/Delete Selected — improved performance — particularly adding items to the list (using a virtual list now)
  4. Script wizard changes:  I need to make a few more, but I’ll post these tomorrow.
  5. Tab Delimiter Translator: couple of small updates to continue cleaning the regular expression engine.


You can download the update at: MarcEdit50_Setup.exe