Open Source Federated search project

By reeset / On / In Digital Libraries, Programming

Something we’ve been working on for a little bit is an open source federated search project.  I’m sure folks are probably wondering why we would bother considering the wide variety of vendor solutions currently available — but we do have a good reason I think.  In a sense, we are looking at building a library platform, rather than simply just a federated search tool — and of course, a major part of this will be OpenURL.  I’ve created my own OpenURL program (which will be opensourced soon) that provides support for all the currently defined namespaces in the OpenURL 0.1 and 1.0 specifications.  I created it initially to test COINS, but now am finding that using the resolver, I can simplify the linking of resources as I’m building this federated search tool.

Anyway, I see two big problems that need to be addressed if we are going to make this successful at our institution.  First, the program needs to be as fast as possible.  Obviously, this type of searching will always take some time — but I’ve been experimenting with some thread pooling and caching techniques that should provide a big speed improvement for most users.  Secondly, maintaining the knowledge-base.  Within current federated search software, connectors, as they are often called, tend to be compiled code.  I don’t think that they have to be.  I’ve created some generic connection objects and am assigning value to these “connectors” via metadata.  The idea here being that anyone can edit metadata, so anyone should be able to update the knowledge-base. 

At least that’s the theory.  We have the project in production on a very small scale — but now its time to start ramping it up.  I’ll try to make an demo of what we are working on available at some point in the near future.  I’m in the process of migrating the current codebase to a new server — adding relevancy ranking, caching and thead pooling support so I’ll post something when that’s finished.


3 thoughts on “Open Source Federated search project

  1. What kind of timeline are you looking at before having something releasable as open source? I’m considering trying to install dbWiz this summer, but if yours is going to be available within a couple of months, I might hold off and see what you come up with.
    Also, what platform (linux, windows, etc.? distro/version?) are you developing for?


  2. Its hard to say — we are looking to make it available to our general public before the end of may (its going through interal testing now) and then, give the requirements of the LSTA grant — we will be providing the source to our LSTA partners. When its all said and done, maybe 6-8 months…possiblly sooner. For sue, I’ll post on this blog a schedule as soon as I know one.


  3. On, and as far as platform, its being developed to be platform independence.