Federated Search limitations

By reeset / On / In Digital Libraries, Innovative Interfaces

Today we got to see first hand the fragility associated with the current state of federated search technology. As a beta site for Innovative, we have been testing their 2006 general release software prior to general distribution and an update last night caused ports to fail, resources to be limited and screwed up the keyword index searching leading to wonky results. The problems were first noticed by our Technical Services group — but were seen later in LibraryFind — which searches our III catalog remotely. The problems introduced by the updated have essentially crippled outside interaction with our ILS — and while this will be quickly corrected by Innovative (I hope) — it does underline something that I’ve been aware of throughout this process of working on LibraryFind — that being our current model, the federated search model, is broken.

Why broken? Well, within the federated search model, there are just to many unknown variables that the system literately has no control. Start with query normalization. Targets will interpret user queries differently across the board meaning that a query at on target will search very different from a query at another. This leads to some difficulty as any tool created then must either code specifically for each target or provide a generalized search that normalizes to the most common target format. Likewise, as seen with our ILS, sometimes targets fail. And when they fail, there is really nothing that the system can do but report the error to the user.

So how do we fix the model? This is why LibraryFind is as much a harvester/indexer as it is a federated search tool. Our underlying belief while developing this program is that we want to harvest and index as much data as possible — so the tool is setup that way so that as vendors become more comfortable allowing their data to be harvested — we can take advantage of it. Of course, this can bring its own set of issues to the forefront — but I would gladly deal with database/indexing related issues over the current federated searching issues.

So what are we planning on doing with our ILS? Well, at this point, we will continue to remotely query our data — but we are in the process of looking at ways to simply harvest all our data and index it locally. The problem of course, is that this needs to be a process that is is in fairly real time — and I’m not sure if we will find and exporting method that can support those requirements, but we shall see.