Metasearch work

By reeset / On / In Digital Libraries

Whew! It’s been a while since my last post. Busy, busy, busy. Well, I’ve got a few posts I’d like to make tonight, so lets start with OSU’s Metasearch development. I’ve been busy. New API for doing inline filtering, adding new metadata formats and then building sample XSLT transform types for our UI folks so that they can then get working on making it “pretty”. šŸ™‚

So what am I still looking to do? Well, to start — harvest. Harvesting images to start. I’ve identified some 15 million CONTENTdm images that we will be harvesting into our site. This should give our patrons access to one of the largest academic, archive image repositories around. Second, faster….I’m going to start experimenting with the Yaz proxy to see if I can get some of these poky databases to respond a little faster. When working with just EBSCO, our catalog and our harvested content, queries average about 6 seconds for response and rendering. When I add CSA into the mix, processing time jumps to about 12 seconds. Maybe the proxy can help (I hope).

So a screenshot — here is the latest from our UI group. I think its starting to look pretty good.


On the todo list:

  1. Citations — building email citations for all resource types.
  2. Saving/Exporting search results. This will allow users to save their search results to disk or online and just open these files any time that they like.
  3. Caching queries. This is being done already to some degree — but this will be enhanced.

So lets talk about some of the questions I’ve received:

  • How’s OpenURL being utilized?
    Well, this is actually the fun part. Since I’ve written my own OpenURL resolver, its given us a lot of flexibility in terms of what get resolves. For one, I’m actually resolving all resources. So, when a query is done, the OpenURL checks to see if we have holdings. If we do — the OpenURL resolver quickly resolves the entries to see if the user can actually get to the resource. If they can, the direct URL is offered to the UI. If not, the OpenURL resolve quickly inserts a link directly to the journal instead. A link to the OpenURL resolve is also displayed to the user via the UI so users can see if we have the resource in our catalog or ill the item directly through the system. I guess that the gist is, our OpenURL resolver is actually running the federated search. In most cases, items we query do not return URLs to the item. They return metadata. The OpenURL resolve solves a number of problems relating to resolving resources as well as providing resolution capabilities for doi’s, pmid’s, lccns, oclc numbers, etc.
  • How do you plan on dealing with the knowledge-base?
    This is a harder question. The program is metadata-based with definitely saves me from having to code connectors, but as Roy Tennant has pointed out to me, you still have to maintain the metadata to the connectors — and if you have to deal with thousands of items, this could be tedious. Well, one thing that I’m doing to try to help make this easier, is make the knowledge-base a community commodity. I’ve done this with the OpenURL implementation that I’ve created (I’ve been sharing it with some folks in Oregon that need an OpenURL resolver) and will do this with the Metasearch tool as well. This way, the process becomes a community effort.Another element that I’ve added to the knowledge-base managment is the ability to create virtual collections. This will allow a user to create sets that can then inherit the properties of the virtual collection. An example of where this might be used. We subscribe to Ebscohost — ~12 databases…each with its own connection profile. However, the profile is hardly unique. The only change to the connection profile is essentially the database being connected to. By creating a virtual collection, I can then propogate common properties to all items that belong to this collection. This way, if Ebscohost changes their connection profile, I only have to modify the virtual collection and then all 12 databases change as well. Pretty cool.
  • Filtering, FRBR, etc?
    Filtering actually turned out to be easy. One api call and a specialized stylesheet and filtering was a snap. FRBR will be coming. In many cases, resources provide subjects, authors, etc. We’ll be initially setting up FRBR elements that show groupings of Subjects and Authors. Well have to see what else we do from there.
  • What to do about databases that cannot be search?
    Well, I have a couple of ideas. We have a handful of very rich databases that store our database resources. I’m thinking that what I’m planning on doing is adding some API that will take the search criteria — break it down — and then find the databases within these resources that best match the search criteria.