Apparently the Corvallis Gazette Times contacted Jeremy about our metasearch project. Here’s the article URL: http://gazettetimes.com/articles/2006/03/31/news/community/friloc01.txt
Cool beans.
–Terry
Apparently the Corvallis Gazette Times contacted Jeremy about our metasearch project. Here’s the article URL: http://gazettetimes.com/articles/2006/03/31/news/community/friloc01.txt
Cool beans.
–Terry
Just thought I’d post an update on our work since we have given the ok for our library administration to start internally testing the tool.
API Development:
Lets see, new enhancements to the API/backend server:
New enhancements to the UI services:
Anyway, lots of cool stuff going on with this project. I image the UI will continue to change in the next few days/weeks before we look to cut this over to our production environment — but at this point — I’m pretty happy with the results. And hopefully our patrons will be as well. Given that our current vendor supplied metasearch tool has set such a low-bar, I don’t think we have much to worry about. In fact, I’m more concerned about too much success. Given federated searchings fairly large footprint — I’m thinking we will have to carefully monitor resource useage for a while to make sure our current systems can handle expected load.
–Terry
I remember it wasn’t too long ago that I could work close to 72 hours straight without crashing. Well, this weekend, I worked close to 34 hours from Friday to Saturday around 5 pm — and then I actually had to take a nap. A nap. Who takes naps. Of course, after my 1 1/2 hour power nap, I was ready to go again — but I’ll admit, I didn’t recover nearly as fast as I normally do. How sad.
–Terry
Just a few random thoughts about Google Chat. Since use my Gmail account for quite a bit of communication, I find that I’m nearly always logged in. Well, because of that, I use Google’s Chat a lot. Generally, in other chat clients, I tend to never log my conversations — but just today, I was wishing I had and found that Google does. I’m not sure what I think about that — personally. On the one hand, it was really handy having an archive of this conversation — while on the other — there are many potential areas for abuse. I realize that you can simply delete conversations as you would any email — but I’m not sure I like the direction Google has been going lately. In the past, Google took a very hands off approach to software and feature development — essentially forcing users to opt in when giving up information. However, that’s seemed to change to an opt out model. Its a small change — but a big one.
–Terry
I’m starting to work on a new webservice to run on one of our Unix boxes (thank you MONO). Essentially, I’m looking to create a webservice that will be accessible via a WSDL file. Basically, this will let any pipe files to this webservice and allow them to translate metadata into or out of any metadata framework current defined within MarcEdit (which is a few). Its not ready at this point — but will be soon. When its ready, I’ll let folks know so that they can start testing it out.
–Terry
Whew! It’s been a while since my last post. Busy, busy, busy. Well, I’ve got a few posts I’d like to make tonight, so lets start with OSU’s Metasearch development. I’ve been busy. New API for doing inline filtering, adding new metadata formats and then building sample XSLT transform types for our UI folks so that they can then get working on making it “pretty”.
So what am I still looking to do? Well, to start — harvest. Harvesting images to start. I’ve identified some 15 million CONTENTdm images that we will be harvesting into our site. This should give our patrons access to one of the largest academic, archive image repositories around. Second, faster….I’m going to start experimenting with the Yaz proxy to see if I can get some of these poky databases to respond a little faster. When working with just EBSCO, our catalog and our harvested content, queries average about 6 seconds for response and rendering. When I add CSA into the mix, processing time jumps to about 12 seconds. Maybe the proxy can help (I hope).
So a screenshot — here is the latest from our UI group. I think its starting to look pretty good.
On the todo list:
So lets talk about some of the questions I’ve received:
–Terry
I made the following changes:
xsl:template name=”topic”
topic
xsl:if test=”@tag=550 or @tag=750″
xsl:call-template name=”subfieldSelect”
xsl:with-param name=”codes”ab/xsl:with-param
/xsl:call-template
/xsl:if
xsl:call-template name=”setAuthority”/
xsl:call-template name=”chopPunctuation”
xsl:with-param name=”chopString”
xsl:choose
xsl:when test=”@tag=180 or @tag=480 or @tag=580 or @tag=780″
xsl:apply-templates select=”marc:subfield[@code='x']“/
/xsl:when
xsl:otherwise
xsl:call-template name=”subfieldSelect”
xsl:with-param name=”codes”ab/xsl:with-param
/xsl:call-template
/xsl:otherwise
/xsl:choose
/xsl:with-param
/xsl:call-template
/topic
xsl:apply-templates/
/xsl:template
The first part of this template is called if a marc field 550 or 750 is encountered — however, rather than breaking out of the template, it allows the data to be extracted again later in the template. I’ve found that you can just remove the first section so that it looks like:
xsl:template name=”topic”
topic
xsl:call-template name=”setAuthority”/
xsl:call-template name=”chopPunctuation”
xsl:with-param name=”chopString”
xsl:choose
xsl:when test=”@tag=180 or @tag=480 or @tag=580 or @tag=780″
xsl:apply-templates select=”marc:subfield[@code='x']“/
/xsl:when
xsl:otherwise
xsl:call-template name=”subfieldSelect”
xsl:with-param name=”codes”ab/xsl:with-param
/xsl:call-template
/xsl:otherwise
/xsl:choose
/xsl:with-param
/xsl:call-template
/topic
xsl:apply-templates/
/xsl:template
But this is just one of what looks like many problems with this stylesheet. Anyway, as always, the update can be downloaded from: MarcEdit50_Setup.exe
–Terry
For those that have been interested and following development, I’ve completed the harvesting component of the Metasearch tool. Basically, we are invisioning this tool as a hybrid search…we harvest as much data as we can but federate search when we have to. Then we bring together the results and rank them within the context of the returned results. Anyway, here’s the updated search screen:
You can see from the screen shot, the search is querying ~25 databases in about 8 seconds. The reason why we are getting such good results is many of these items have been harvested an indexed within our mysql harvested database (which by the way, is internally normalized to dublin core. I realize we lose some granularity in the metadata, but for our purposes [search], I think that its ok — though I guess we’ll see).
Currently, the ranking algorithem is fairly simple. It uses the following to create a numeric rank:
The number that comes up isn’t a percentage by any sense of the word — but it does seem to do a pretty good job of putting the most relevant result in the returned record set on the top. Anyway, I have a list of 2500 actual user searches and I’m going to be writing a script to beat the heck out of this tool, capturing error messages, time to process, number of results, etc. to see how this might work under load. Currently, we have a metasearch tool that we pay for, Innovatives MetaFind. However, looking at the numbers sent to us by III, usage for this tool (and you have to realize, its been available for a year), has hovered around 90 queries a day. I know the system could easily handle this type of load — but we are expecting this to be successful.
–Terry
Due to a request by Dan Chudnov in regards batch processing a set of MARC Authority records into MADS — I’ve made a couple of modifications to MarcEdit’s Batch Processing tool. First, I’ve setup MarcEdit so that it now can batch process any file type that is currently defined in the XML functions list. Second, I added the MARC=>MADS crosswalk to MarcEdit.
As always, the program can be downloaded from: http://oregonstate.edu/~reeset/marcedit/software/development/MarcEdit50_Setup.exe
–Terry
Oregon obviously isn’t known for its snow — especially on the Valley floor. But I get up today to ride my bike to work and what do I see….SNOW. Not a lot, but enough and in March. How odd is that. Anyway, got go hope on my bike.
–Terry