LibraryFind notes & stuff

By reeset / On / In LibraryFind

So, I just wanted to update folks on some LibraryFind work.  0.8.5.1 is pretty much ready to go.  The only things that I’ve been waiting on are changes to two core components — the oai.gem and the sru gem.  So, I added libxml support to the sru gem (both support for the current and past libxml branches).  This has been done.  This change should allow ruby sru processing to be done much faster (especially large sets).  When I added libxml support to the oai gem, performance went up by 4-800%.  I’m expecting that parsing large datasets will see similar speed increases.  The beauty of the changes is that they should be transparent to the user.  Prior to the change, this is how you would make a search_retrieve request:

#Example search_retrieve request.
require ‘rubygems’
require ‘sru’

#An iterator for search results which allows you to do stuff like:

client = SRU::Client.new ‘http://z3950.loc.gov:7090/voyager’
for record in client.search_retrieve(‘”title=building digital libraries”‘)
   puts record
end

puts “\n\n”
puts “finished”

For users wanting to continue to use rexml as your processor, this syntax will still work.  For users wanting to use the libxml parser, all you need to do is add an optional parameter to the client object:

#Example search_retrieve request.
require ‘rubygems’
require ‘sru’

#An iterator for search results which allows you to do stuff like:

client = SRU::Client.new ‘http://z3950.loc.gov:7090/voyager’,:parser=>’libxml’
for record in client.search_retrieve(‘”title=building digital libraries”‘)
   puts record
end

puts “\n\n”
puts “finished”

That’s it.  Now, rather than returning a REXML::Document reference for each record, the program will return a LibXML::XML::Node reference. 

Internally, I removed the protected xpath processing functions and borrowed the xpath module I created for the oai gem.  Now, all xpath processing is pushed through this module, allowing the gem to use the appropriate parsing calls based on the document.class value.  Within LibraryFind, I’m adding metadata handlers to work with MARCXML and Dublin Core data.  At some point, I might migrate this code into the sru gem to include two predefined metadata handlers so that you can retrieve metadata either as the raw data object or as a formatted object.

I’ve also finished work on the oai gem.  I checked in some changes about a week ago (adding support for the current version of libxml), but will be checking in an additional update this weekend (after a little testing).  The changes add some additional unit tests, as well as corrects the oai gems providers functionality to fix the dates generated by the oai output.  The dates needed to be updated to use the utc.xmlschema format (rather than the localtime format).  This has been done and now oai data generated by the provider class is validating again.  It’s currently passing all unit tests and once I confirm it’s functionality is some real world applications here at OSU, I’ll update the code and push the changes for download. 

Once these components are downloaded, LibraryFind 0.8.5.1 should will be ready to package and ship.  This will be an intermediate release, as 0.9.0 is also finished needing only formal testing.  So, once 0.8.5.1 is put to bed, I’m hoping to have only about a month, month and a half between the 0.9.0 release, which will bring with it a much more interactive UI.

–TR