I’d run across these a few weeks ago and thought they were pretty nifty. Essentially, I was looking for something that would allow Oregon State University’s CONTENTdm collections to be harvested by Google. Since CONTENTdm has an OAI interface, and Google’s Scholar supports OAI harvesting, I thought there must be an easy way to get this set up. Fortunately, Google’s Sitemap facility provides a method for this to happen. Using the OAI server as the sitemap — I was able to get Google to quickly harvest and index our CONTENTdm collections.
Information on the Google Sitemaps can be found on the Google Sitemaps Help documentation site.
[update]
Some folks have asked (like the comment below) — how this works. Well, I created a small script that replaces the oai.exe process in CONTENTdm, at least for Google’s purposes. The script basically just handles the OAI request. Here’s the simple code:
header("Content-type: text/xml");
//print file_get_contents("http://digitalcollections.library.oregonstate.edu/cgi-bin/oai.exe?" . $_SERVER['QUERY_STRING']);
$handle = @fopen("http://digitalcollections.library.oregonstate.edu/cgi-bin/oai.exe?" . $_SERVER['QUERY_STRING'], "r");
if ($handle) {
while (!feof($handle)) {
$buffer = fgets($handle, 4096);
echo $buffer;
}
fclose($handle);
}
?>
–Terry
Comments
One response to “Google Sitemaps”
I was just trying to submit our OAI feed to google for indexing, but it’s unhappy that the OAI URL: /cgi-bin/oai.exe is a directory lower than the root of the site. Do you know of a way around this?