Native RSS Support in CONTENTdm (kindof) :)

By reeset / On / In CONTENTdm, Uncategorized

Updated: Fixed a couple of typos below

Updated two: Thanks to Josh Kline for pointing out that the PubDate wasn’t RFC 822 complient. This has been updated.

About a year ago, I created an RSS generator for CONTENTdm.  At the time, CONTENTdm really didn’t have an API that could be worked with, so in building the generator, I created a perl script that would simply ping the server periodically and report back changes.  This has been working fine, even with the newer 4.0 interfaces — but a few things had broken…just who has the time to fix everything.  🙂 

Anyway, over the past month, I have been updating all my older tools, documenting new ones and getting thing posted onto my CONTENTdm projects website (http://oregonstate.edu/~reeset/contentdm).  A number of the new tools that I’ve been creating are relating to social software.  I.e., I’ve created a commenting and tagging plugin for CONTENTdm (which I’ll likely post about once I finish documenting), updated this RSS feed and then have finally added LDAP authentication support to our CDM interface (though I wish DiMeMia allowed more customizability to the administration interfaces so I could integrate this better [and again, I’ll post a code snippet once the docs are completed]). 

The new RSS plugin is written entirely in PHP (to match the rest of the CONTENTdm files) and can generate feeds for the entire server or individual collections.  The plugin makes use of the OAI server to extract information and then reformat for delivery in an RSS 2.0 wrapper.  Here is an example of what this looks like:

For the most part, the plugin requires no changes to current CONTENTdm interfaces other than the presences of a link to the feed.  We’ve done the following at our main collection page: http://digitalcollections.library.oregonstate.edu/

Now the code — drop dead easy. 

Code:

<?
  /*
   * Terry Reese
   * Modified: August 30, 2006
   *
   * Changes:
   *    Updated PubDate to correct date format.  Thanks to Josh Kline for pointing this out
   */
  define("CONTENTdmPath", "/usr/local/Content/docs/");
  define("DMSCRIPTS", "dmscripts/");
  define("BaseURL", "http://" . $_SERVER['SERVER_NAME'] . "/");
  define("OAIURL", BaseURL . "cgi-bin/oai.exe?verb=ListRecords&metadataPrefix=oai_dc{set}&from={start}&until={end}");
  define("DEF_TITLE", "OSU CONTENTdm Image Collection");
  include(CONTENTdmPath . DMSCRIPTS . "DMSystem.php");
  
  if (isset($_GET['set'])) { $set = $_GET['set']; } else { $set = "";}

  class RSS {
     function header ($title,$link) {
        $string = '<?xml version="1.0" encoding="UTF-8"?>' .  "\n" .
		  '<rss version="2.0"' . "\n" .
		  'xmlns:content="http://purl.org/rss/1.0/modules/content/"' . "\n" . 
	 	  'xmlns:wfw="http://wellformedweb.org/CommentAPI/"' . "\n" . 
		  'xmlns:dc="http://purl.org/dc/elements/1.1/">' . "\n" . 
	          '<channel>' . "\n" . 
		  '<title>'.$title.'</title>' . "\n" . 
		  '<link>'.$link.'</link>' . "\n" . 
		  '<description></description>' . "\n" . 
		  '<pubDate>Mon, 28 Aug 2006 15:48:41 +0000</pubDate>' . "\n" . 
		  '<language>en</language>' . "\n"; 
	 return $string;
     }

     function footer () {
	$string = "</channel>\n";
    	$string .= "</rss>";
	return $string;
     }

     function buildItem($DCValues) {
  	$string = "<item>\n" . 
		  "<title>".$DCValues["title"]."</title>\n" . 
		  "<link>".$DCValues["identifier"]."</link>\n" .
		  "<pubDate>". date("D, d M Y",  strtotime($DCValues["datestamp"])) . " 00:00:00 +0000</pubDate>\n" . 
		  "<dc:creator>".$DCValues["creator"]."</dc:creator>\n";  
	if (strlen($DCValues["description"])>255) {
		$string .= "<description><![CDATA[" . substr($DCValues["description"], 0, 255) . "[...]]]></description>\n";
        } else {
		$string .= "<description><![CDATA[" . $DCValues["description"] .  "]]></description>\n";
	}
	$string .= "<content:encoded><![CDATA[" . $DCValues["description"] . "]]></content:encoded>\n";
	$string .= "</item>\n";   
        return $string;
     } 

     function encodeDescription($set, $description, $subjects,  $uri) {
	$tarr = explode("/", $uri);
	$parr = explode(",", $tarr[count($tarr)-1]);
	$ptr = $parr[1];
	$set = $parr[0];
        $string = "<img src=\"" . BaseURL .  "cgi-bin/getimage.exe?CISOROOT=/" . $set . "&CISOPTR=" . $ptr . "&DMSCALE=10.5&DMWIDTH=250&DMHEIGHT=250\" border=\"0\" />";
	$string .= "<p>" . $description . "<br /><br />\n" . 
		   "Subjects: " . $subjects . "<br />\n" .
		   "<a href=\"" . $uri . "\">Get MetaData</a></p>";
	return $string;
      }
  } 
 
  $objRSS = new RSS;
 
  if ($set!="") {
     $oai_url = str_replace("{set}", "&set=" . $set, OAIURL);
  } else {
     $oai_url = str_replace("{set}", "", OAIURL);
  }

  $date = date("m");
  $year = date("Y");

  $oai_url = str_replace("{start}", $year . "-" . sprintf("%02d", $date) . "-01", $oai_url);
  if ($date == "12") {
     $year = intval(date("Y")) + 1;
     $date = "01";
  } else {
     $date = intval(date("m")) + 1;
  }
  $oai_url = str_replace("{end}", $year . "-" . sprintf("%02d", $date) . "-01", $oai_url);

  $_xml = file_get_contents($oai_url); 
  $p = xml_parser_create();
  xml_parse_into_struct($p, $_xml, $vals);
  xml_parser_free($p);

  $dc = array();
  $dc['title'] = "";
  $dc['setspec'] = "";
  $dc['identifier'] = "";
  $dc['subject'] = "";
  $dc['creator'] = "";
  $dc['description'] = "";
  $dc['datestamp'] = "";
 
  if ($set=="") {
	$coll_title = DEF_TITLE;
	$coll_link = BaseURL;
  } else {
        $rc = dmGetCollectionParameters("/" . $set, $coll_title, $path);
 	$coll_link = BaseURL . $set;
  }
  header("Content-type:  text/xml\n\n");
  print $objRSS->header($coll_title, $coll_link);
  foreach ($vals as $tag) {
     if ($tag['type'] == 'complete') {
	if ($tag['tag']=='DC:TITLE' && $dc['title']=="") {
	   $dc['title'] = htmlspecialchars($tag['value']);
	} else if ($tag['tag'] == 'DC:IDENTIFIER') {
	   $dc['identifier'] = $tag['value'];
	} else if ($tag['tag'] == 'SETSPEC' && $dc['setspec'] == "") {
	   $dc['setspec'] = $tag['value'];
	} else if ($tag['tag'] == 'DATESTAMP' && $dc['datestamp'] == "") {
	   $dc['datestamp'] = $tag['value'];
  	} else if ($tag['tag'] == 'DC:DESCRIPTION' && $dc['description'] == "") {
	   $dc['description'] = $tag['value'];
	} else if ($tag['tag'] == 'DC:CREATOR' && $dc['creator'] == "") {
	   $dc['creator'] = htmlspecialchars($tag['value']);
	} else if ($tag['tag'] == 'DC:SUBJECT' && $dc['subject'] == "") {
	   $dc['subject'] = htmlspecialchars($tag['value']);
	}
     } else if ($tag['type']=='close' && $tag['tag']=='RECORD') {
	$dc['description'] = $objRSS->encodeDescription($set, $dc['description'], $dc['subject'],  $dc['identifier']);
	print $objRSS->buildItem($dc);
	$dc['title'] = "";
	$dc['setspec'] = "";
	$dc['identifier'] = "";
	$dc['subject'] = "";
	$dc['creator'] = "";
	$dc['description'] = "";
	$dc['datestamp'] = "";
     }
  }
  print $objRSS->footer();
  
?>

And that’s it.

TR