Nov 302006
 

I love strong typed languages.  Languages that require you to define all variables before use.  They save me from myself, or, from my attrocious spelling.  I was working on LibraryFind, OSU’s opensource metasearch service, three days ago making a few tweaks and changes.  One of the things that this tool has that is somewhat researchy — is this idea of caching all results and how that cache gets reutilized in different sessions.  Anyway, I was making a few changes to the following lines:


      #======================================================
      # Check to see if data was cached -- if it is load
      #======================================================
      if _search_id != nil
         _lxml = CachedSearch.retrieve_metadata(_search_id, _collect.id, _max.to_i)
         if _xml != nil
           if _lxml.status == LIBRARYFIND_CACHE_OK
             if _lxml.data != nil
               _lrecord =  _objRec.unpack_cache(_lxml.data, _max.to_i)
               _results_count = _results_count +  _lrecord.length
               record = record.concat(_lrecord)
               is_in_cache = true
             else
               is_in_cache = true
             end
           elsif _lxml.status == LIBRARYFIND_CACHE_EMPTY
             is_in_cache = true
           end
         end
      end


 

See what I did?  I’m sure you do.  However, I didn’t until last night when I thought that the caching service was running really slow.  So I looked in the code, and found my problem.  A typo.  It should have looked like:


      #======================================================
      # Check to see if data was cached -- if it is load
      #======================================================
      if _search_id != nil
         _lxml = CachedSearch.retrieve_metadata(_search_id, _collect.id, _max.to_i)
         if _lxml != nil
           if _lxml.status == LIBRARYFIND_CACHE_OK
             if _lxml.data != nil
               _lrecord =  _objRec.unpack_cache(_lxml.data, _max.to_i)
               _results_count = _results_count +  _lrecord.length
               record = record.concat(_lrecord)
               is_in_cache = true
             else
               is_in_cache = true
             end
           elsif _lxml.status == LIBRARYFIND_CACHE_EMPTY
             is_in_cache = true
           end
         end
      end

Since Ruby, like many scripting languages, happily created variables for you, I didn’t notice it.  And since the program kept running — abiet, more slowly — it didn’t dawn on me that I’d caused a boo-boo.  It wasn’t until last night, 2 days later, that I found it while doing a code audit.  A practice I have when dealing with scripting languages is to audit modified code weekly to inventory the life of each variable/process.  It’s something I do mostly as a way to eliminate variable useage — but in this case, it helped me find this problem.  Ack.

 

–TR

 Posted by at 10:25 am
Nov 292006
 

I have a number of posts that I’ve started over the past month, but obviously been busy.  At some point, I’ll start writing about why — but for now, I’m catching up.  For example:

Alyce and I took Kenny and Nathan trick or teating this halloween.  They were cuties.  Kenny is obviously Superman, Nathan — he’s Elmo.  I liked to joke that he was Tickle Me Elmo, because if you tickle him, he laughs :).  Everyone loved Nathan’s costume — and we received much candy as a result.  In fact, its nearly December, and we still have a bag full of it.  Ugh. 

 Posted by at 11:48 pm
Nov 292006
 

Kenny and Nathan found their way into our local newspaper just before Thanksgiving.  They were down at the library playing turkey bingo (well, Kenny was playing, Nathan was playing) and got their pictures taken.  Here’s the picture with associated captioning information:

 
URL: http://www.itemizerobserver.com/images/4787.jpg

Credit: Photo by Sarah Hillman
Date Published to Web: 11/22/2006
Caption:
  Kenny Reese, 5, throws his arms wide in triumph as brother Nathan, 2, at center smiles.

 

Kenny was pretty excited.  We got one of the newspapers so he could keep a copy of his special paper.  Lots of fun.

–TR

 Posted by at 2:02 pm

Google Spell API — Ruby

 ruby  Comments Off
Nov 292006
 

Someone had asked if I could post the ruby code we use to interact with the google toolbar spell api.  Well, here it is.


require 'net/https'
require 'uri'
require 'rexml/document'

class GoogleSpell
   def GetWords(phrase)
     results = []
     x = 0
     i = 0

     phrase = phrase.downcase
     phrase = phrase.gsub("&", "&")
     phrase = phrase.gsub("<", "<")
     phrase = phrase.gsub(">", ">")
     word_frag = phrase.split(" ")
     word_frag.each do |lookup|
       words = "" + lookup + ""
       gword = Hash.new()
       gword["original"] = lookup;
       gword["data"] = ""
       http = Net::HTTP.new('www.google.com', 443)
       http.use_ssl = true
       http.verify_mode = OpenSSL::SSL::VERIFY_NONE
       response =  http.start {|net|
         net.request_post("/tbproxy/spell?lang=en", words) {|response|
           doc = REXML::Document.new response.body
           nodelist = doc.elements.to_a("//c")
           nodelist.each do |item|
             if item.text.downcase != gword["original"]
               gword["data"] = item.text.downcase
             else
               gword["data"] = ""
             end
           end
         }
       }
       results << gword
     end
     return results
   end
end

--TR

 Posted by at 1:44 pm
Nov 052006
 

Ouch — on slashdot today, two not so flattering articles surrounding wikipedia.  In the first, there are reports of the German version of wikipedia being used as a platform for speading a virus.  An interesting idea.  Given that folks trust wikipedia, noone seems to think twice about clicking on links that go outside of the tool. 

And then the second article — this goes to the trust issue.  As wikipedia pushes itself into the mainstream, questions a plagiarism are sure to come up — and they have.  Testing 12,000 articles, a researcher found a number of instances (128) of plagiarism within the encyclopedia.  See this article here.

 

–TR

 Posted by at 1:51 pm
Nov 032006
 

I’d run across these a few weeks ago and thought they were pretty nifty. Essentially, I was looking for something that would allow Oregon State University’s CONTENTdm collections to be harvested by Google. Since CONTENTdm has an OAI interface, and Google’s Scholar supports OAI harvesting, I thought there must be an easy way to get this set up. Fortunately, Google’s Sitemap facility provides a method for this to happen. Using the OAI server as the sitemap — I was able to get Google to quickly harvest and index our CONTENTdm collections.

Information on the Google Sitemaps can be found on the Google Sitemaps Help documentation site.

[update]
Some folks have asked (like the comment below) — how this works. Well, I created a small script that replaces the oai.exe process in CONTENTdm, at least for Google’s purposes. The script basically just handles the OAI request. Here’s the simple code:




–Terry

 Posted by at 12:38 am