Aug 302006
 

Updated: Fixed a couple of typos below

Updated two: Thanks to Josh Kline for pointing out that the PubDate wasn’t RFC 822 complient. This has been updated.

About a year ago, I created an RSS generator for CONTENTdm.  At the time, CONTENTdm really didn’t have an API that could be worked with, so in building the generator, I created a perl script that would simply ping the server periodically and report back changes.  This has been working fine, even with the newer 4.0 interfaces — but a few things had broken…just who has the time to fix everything.  :) 

Anyway, over the past month, I have been updating all my older tools, documenting new ones and getting thing posted onto my CONTENTdm projects website (http://oregonstate.edu/~reeset/contentdm).  A number of the new tools that I’ve been creating are relating to social software.  I.e., I’ve created a commenting and tagging plugin for CONTENTdm (which I’ll likely post about once I finish documenting), updated this RSS feed and then have finally added LDAP authentication support to our CDM interface (though I wish DiMeMia allowed more customizability to the administration interfaces so I could integrate this better [and again, I'll post a code snippet once the docs are completed]). 

The new RSS plugin is written entirely in PHP (to match the rest of the CONTENTdm files) and can generate feeds for the entire server or individual collections.  The plugin makes use of the OAI server to extract information and then reformat for delivery in an RSS 2.0 wrapper.  Here is an example of what this looks like:

For the most part, the plugin requires no changes to current CONTENTdm interfaces other than the presences of a link to the feed.  We’ve done the following at our main collection page: http://digitalcollections.library.oregonstate.edu/

Now the code — drop dead easy. 

Code:

<?
  /*
   * Terry Reese
   * Modified: August 30, 2006
   *
   * Changes:
   *    Updated PubDate to correct date format.  Thanks to Josh Kline for pointing this out
   */
  define("CONTENTdmPath", "/usr/local/Content/docs/");
  define("DMSCRIPTS", "dmscripts/");
  define("BaseURL", "http://" . $_SERVER['SERVER_NAME'] . "/");
  define("OAIURL", BaseURL . "cgi-bin/oai.exe?verb=ListRecords&metadataPrefix=oai_dc{set}&from={start}&until={end}");
  define("DEF_TITLE", "OSU CONTENTdm Image Collection");
  include(CONTENTdmPath . DMSCRIPTS . "DMSystem.php");
  
  if (isset($_GET['set'])) { $set = $_GET['set']; } else { $set = "";}

  class RSS {
     function header ($title,$link) {
        $string = '<?xml version="1.0" encoding="UTF-8"?>' .  "\n" .
		  '<rss version="2.0"' . "\n" .
		  'xmlns:content="http://purl.org/rss/1.0/modules/content/"' . "\n" . 
	 	  'xmlns:wfw="http://wellformedweb.org/CommentAPI/"' . "\n" . 
		  'xmlns:dc="http://purl.org/dc/elements/1.1/">' . "\n" . 
	          '<channel>' . "\n" . 
		  '<title>'.$title.'</title>' . "\n" . 
		  '<link>'.$link.'</link>' . "\n" . 
		  '<description></description>' . "\n" . 
		  '<pubDate>Mon, 28 Aug 2006 15:48:41 +0000</pubDate>' . "\n" . 
		  '<language>en</language>' . "\n"; 
	 return $string;
     }

     function footer () {
	$string = "</channel>\n";
    	$string .= "</rss>";
	return $string;
     }

     function buildItem($DCValues) {
  	$string = "<item>\n" . 
		  "<title>".$DCValues["title"]."</title>\n" . 
		  "<link>".$DCValues["identifier"]."</link>\n" .
		  "<pubDate>". date("D, d M Y",  strtotime($DCValues["datestamp"])) . " 00:00:00 +0000</pubDate>\n" . 
		  "<dc:creator>".$DCValues["creator"]."</dc:creator>\n";  
	if (strlen($DCValues["description"])>255) {
		$string .= "<description><![CDATA[" . substr($DCValues["description"], 0, 255) . "[...]]]></description>\n";
        } else {
		$string .= "<description><![CDATA[" . $DCValues["description"] .  "]]></description>\n";
	}
	$string .= "<content:encoded><![CDATA[" . $DCValues["description"] . "]]></content:encoded>\n";
	$string .= "</item>\n";   
        return $string;
     } 

     function encodeDescription($set, $description, $subjects,  $uri) {
	$tarr = explode("/", $uri);
	$parr = explode(",", $tarr[count($tarr)-1]);
	$ptr = $parr[1];
	$set = $parr[0];
        $string = "<img src=\"" . BaseURL .  "cgi-bin/getimage.exe?CISOROOT=/" . $set . "&CISOPTR=" . $ptr . "&DMSCALE=10.5&DMWIDTH=250&DMHEIGHT=250\" border=\"0\" />";
	$string .= "<p>" . $description . "<br /><br />\n" . 
		   "Subjects: " . $subjects . "<br />\n" .
		   "<a href=\"" . $uri . "\">Get MetaData</a></p>";
	return $string;
      }
  } 
 
  $objRSS = new RSS;
 
  if ($set!="") {
     $oai_url = str_replace("{set}", "&set=" . $set, OAIURL);
  } else {
     $oai_url = str_replace("{set}", "", OAIURL);
  }

  $date = date("m");
  $year = date("Y");

  $oai_url = str_replace("{start}", $year . "-" . sprintf("%02d", $date) . "-01", $oai_url);
  if ($date == "12") {
     $year = intval(date("Y")) + 1;
     $date = "01";
  } else {
     $date = intval(date("m")) + 1;
  }
  $oai_url = str_replace("{end}", $year . "-" . sprintf("%02d", $date) . "-01", $oai_url);

  $_xml = file_get_contents($oai_url); 
  $p = xml_parser_create();
  xml_parse_into_struct($p, $_xml, $vals);
  xml_parser_free($p);

  $dc = array();
  $dc['title'] = "";
  $dc['setspec'] = "";
  $dc['identifier'] = "";
  $dc['subject'] = "";
  $dc['creator'] = "";
  $dc['description'] = "";
  $dc['datestamp'] = "";
 
  if ($set=="") {
	$coll_title = DEF_TITLE;
	$coll_link = BaseURL;
  } else {
        $rc = dmGetCollectionParameters("/" . $set, $coll_title, $path);
 	$coll_link = BaseURL . $set;
  }
  header("Content-type:  text/xml\n\n");
  print $objRSS->header($coll_title, $coll_link);
  foreach ($vals as $tag) {
     if ($tag['type'] == 'complete') {
	if ($tag['tag']=='DC:TITLE' && $dc['title']=="") {
	   $dc['title'] = htmlspecialchars($tag['value']);
	} else if ($tag['tag'] == 'DC:IDENTIFIER') {
	   $dc['identifier'] = $tag['value'];
	} else if ($tag['tag'] == 'SETSPEC' && $dc['setspec'] == "") {
	   $dc['setspec'] = $tag['value'];
	} else if ($tag['tag'] == 'DATESTAMP' && $dc['datestamp'] == "") {
	   $dc['datestamp'] = $tag['value'];
  	} else if ($tag['tag'] == 'DC:DESCRIPTION' && $dc['description'] == "") {
	   $dc['description'] = $tag['value'];
	} else if ($tag['tag'] == 'DC:CREATOR' && $dc['creator'] == "") {
	   $dc['creator'] = htmlspecialchars($tag['value']);
	} else if ($tag['tag'] == 'DC:SUBJECT' && $dc['subject'] == "") {
	   $dc['subject'] = htmlspecialchars($tag['value']);
	}
     } else if ($tag['type']=='close' && $tag['tag']=='RECORD') {
	$dc['description'] = $objRSS->encodeDescription($set, $dc['description'], $dc['subject'],  $dc['identifier']);
	print $objRSS->buildItem($dc);
	$dc['title'] = "";
	$dc['setspec'] = "";
	$dc['identifier'] = "";
	$dc['subject'] = "";
	$dc['creator'] = "";
	$dc['description'] = "";
	$dc['datestamp'] = "";
     }
  }
  print $objRSS->footer();
  
?>

And that’s it.

TR

 Posted by at 3:58 pm
Aug 302006
 

I’m playing with MS’s Live Writer (which I actually like) and kept getting an error message.  I’d assumed that meant that it didn’t post.  Apparently not. :)

 

–TR

 Posted by at 11:25 am
Aug 302006
 

At OSU, we have very few Dspace collections configured to allow direct submission to the repository.  Nearly anyone on campus can submit an item into Dspace, but that item is then vetted through Technical Services where metadata is looked at and corrected before being added to Dspace.  To do this, we have ~4 individuals (though primarily 3) that can take a task from the pool for evaluation. 

Now those folks that use Dspace know that Dspace places items into the task list in the order that it was received.  This means that if a cataloger was responsible for a particular collection, they would have to always look over the entire task list to see if any items from their collections had been submitted.  It was a fairly time consuming process and one that constantly soured staff on working within the Dspace interface.

Usually, my level of caring for this problem would be ancillary.  Up until August of this year, we had a programmer that handled the majority of the Dspace customizations.  I think that my well-known aversion to Java might have had a hand in this — but to be honest I didn’t mind.  (Yes, I never was a Java convert.  I use it when I have to — but traditionally, I’ve always preferred a more procedural style found in Assembler or C.  However, over the past two years, my experience with C# has really softened my stance on Java a bit.  I still find some of the syntax non-intuitive).  Anyway, in August, I was asked to spend some time working with Dspace since it would allow changes to be incorporated faster since changes could now be made just by knocking on my door.

Anyway, getting back to the task pool.  During one of my weekly Digital Production Unit meetings, my staff let me know that this was an issue.  What they wanted was the pool, sorted by collection/date.  Seemed easy enough — and it was.  Now I’ll admit, this is a bit of a quick and dirty hack — but as I look at Dspace, I seem to see a lot of these types of hacks, so mine should fit in.  Changes need to be made only to the main.jsp file in the mydspace directory. 

Original Code:

lns: 204-212

String row = “even”;
for (int i = 0; i < pooled.length; i++)
{
   DCValue[] titleArray = pooled[i].getItem().getDC(“title”, null, Item.ANY);
  String title = (titleArray.length > 0 ? titleArray[0].value : LocaleSupport.getLocalizedMessag(pageContext,”jsp.general.untitled”) ); 
EPerson submitter = pooled[i].getItem().getSubmitter();

 

Modified:

String[] tcoll = new String[pooled.length];
for (int i = 0; i < pooled.length; i++) {
tcoll[i] = pooled[i].getCollection().getMetadata(“name”) + “_” + String.valueOf(i);
}
Arrays.sort(tcoll);

for (int z = 0; z < tcoll.length; z++)
{
int i = Integer.parseInt(tcoll[z].substring(tcoll[z].lastIndexOf(“_”)+1));

 

As folks can see, basically, the modification reads the collection name of each item in the task book and stores the data in an array as: [collection name]_number.  The number stored is the position that the item occurs in the list.  Once the new array is setup, its sorted and then it is this array, not the pooled array, that is used to step through the tasks.  The index number for accessing items in the pool array is pulled by processing the position number from the collection array.

We’ve been using this for ~2 1/2 weeks now, and the staff are much happier. 

Now that I’m working on Dspace, I may periodically post changes that we are making to the application if I find them interesting.  Whether anyone else will, well, we’ll see.

 

–TR

Aug 282006
 

I was only a matter of time — but I finally broke down and took the family down to the coast to do some blackberry picking.  You see, in riding my bike between Independence and Corvallis every day, I get to smell the blackberries along side the road.  Now, I wouldn’t eat any of the berries along side the road (for fear that someone might have sprayed them), but oh, they make my ride smell like someone is always cooking blackberry pie.  My wife’s been a good sport about the whole thing — as our conversations when I first come into the door have been something along the line of…”Mummm…I could almost taste the blackberries today.  I wonder where we could go to find some around here” and so forth.  Unfortunately, outside of a farm, I’m not really sure I’d pick berries from anywhere near by — since there is a very real chance the that city, state or even nearby private property owners could have sprayed them.  Also, I’m really more into the eating and possibly the picking, of blackberries.  The actual making of the pie is a bit outside my cooking expertise.  I don’t do pastries well — because I don’t like to measure things.  Dinners, etc…these meals rarely require precise measurements, but pastries I’ve found, really tend to. 

Anyway, about two weeks ago, I finally decided that I couldn’t go without some blackberries.  So, my wife, bless her heart, found that the blackberry festival in Coos Bay was going on over the 26th-27th.  So, we packed up the kids and drove down for a visit.  The festival was fun, visiting family was great — but I now have enough blackberries for 4 pies in the freezer….Mummm.  So mission accomplished. :)

 

–TR

 Posted by at 8:48 am
Aug 252006
 

FYI — a couple of changes to the MarcEdit website.  First, the homepage now pulls its information for the current news from my blog.  Second, I’ve started putting all the MarcEdit documentation into a Wiki.  This should allow me to more easily create documentation as well as export the documentation both in PDF and HTML formats for local consumption.  Since I’ve already written the documentation (though I already have edits to make) — I’m hoping that moving data between the raw HTML to the wiki will take a week or more.  Anyway, when the documentation is available, the program will move to 5.1. 

–TR

 Posted by at 11:12 pm

MarcEdit 5.0 Update

 MarcEdit  Comments Off
Aug 242006
 

Ok — I’ve done my testing and am sufficiently happy that the new XSLT engine switching works and that the Saxon engine will work with current XSLT stylesheets.  Benchmarks:

MARCXML => MARC

  • MSXML Engine
  • 500 Records: first run (1.2 secs)
  • 500 Records: second/third run (0.09 secs)
  • Saxon.NET
    • 500 Records: first run (5.4 secs)
    • 500 Records: second/third run (0.8 seconds)

    The first time through, each component takes a little longer to run because the assembly needs to be loaded by the VM — however, after the first load, the item is placed into the Assembly cache so it runs much faster.  Each benchmark was on the save 500 records, but 3 different files to keep file catching from skewing the results.

    The file can be downloaded from: MarcEdit50_Setup.exe.  For those downloading this program, you will notice that the program is now ~7.9 MB.  This is 4 MB larger than the previous file — and directly relates to the size of the Saxon files added to MarcEdit.  Because of the size, I’d considered making folks download the Saxon libraries instead of packaging them — but I figured that this would make it easier for folks to use.

     

    –TR

     Posted by at 11:37 pm
    Aug 222006
     

    So I spent a little more time last night doing my final testing before uploading the new build of MarcEdit which includes the choice of utilizing the Saxon or MSXML XSLT engines, and I ended up making a last minute change to make XSLT processing more granular.  When completed, users will now have two locations where they can set their preferred XSLT engine for processing.  First, users will be able to set the XSLT engine globally utilizing MarcEdit’s global preferences:

     

    Or users will have the option to set the XSLT engine on a stylesheet by stylesheet basis.  This means that a user could use the MSXML engine as the default XSLT processor, but utilize the SAXON.NET processor for specific translations.  This option is set at the XSLT translation registration window.

    So, I’m spending time today and tonight looking over the new build and making sure that the XSLT engine selection is honored based on the following order of precidence:

    Defined by Transformation => Defined Globally => Default Settings

    I’m hoping that this will give folks the flexiblity that they need to start moving beyond XSLT 1.0 and building more slimmed down translations based on more current technologies if desired.

    –TR

     Posted by at 1:46 pm
    Aug 192006
     

    Seems that we have been doing a bit more home improvement jobs this summer than I’d originally thought I would.  Of course, the bigest improvement that we made this year was putting in wood floors — but we’ve been slowly upgrading a number of components in the house.  Today however, was an unplanned improvement.  I had been cleaning under the kitchen sink and noticed that there was a bit of slim on the bottom of the floor.  A quick investigation showed that all the seals on the garbage disposal were shot.  But what was worst was that the disposal is a cheep one.  Plastics everwhere.  I looked around and this type of machine just doesn’t seem to be sold around here — so I decided to upgrade the disposal with a solid steel, Kenmore disposal.  This is the first time that I’ve ever pulled one of these things out and put one in — really was all the hard.  Though — we’ll see if I’m still this happy with the job come tomorrow.  When installing, I decided not to use plumbers putting when sealing the sink components.  When I pulled the old one out — the putty was rotten, wet and stunk.  So, I decided to use silicone.  I use it all the time sealing cracks around the house because its a great adhesive — so down to the hardware store I went and picked up some silicone selant specifically for kitchen work and used that for all my sealing.  I think that this will work great — but I can’t tell till tomorrow since it takes 24 hours to cure.  So, I have everything setup — have tested for leaks everywhere but in the sink tube where the silicone is still drying and so far so good.  So I’m keeping my fingers crossed.

    Now, if I can just find time to do my next project — the garage door openner. :)

     

    –TR

     Posted by at 10:31 pm
    Aug 182006
     

    I’ve been doing quite a bit of programming of late in Ruby and I have to say, I’m really disappointed in Ruby’s XML support.  REXML — the built-in Ruby library, well, stinks.  I’ve been trying to use it to parse some simply MarcXML records, and what I’ve finding is that its taking the process ~0.4-0.5 seconds to load the file.  These are small files.  Then processing — use XPATH and you pay dearly, use the more well documented convenience functions, again, you pay dearly. 

    Anyway, the point of this was I was looking through some code and trying to figure out why some portions of my rails app was doggy.  From my investigation, I found two doggy point — one the result of a misunderstanding as to the best way to do an operation in ruby — the second, the REXML processor.  So what to do?  Well, I’m going the libxml2 way.  There is a ruby gem that allows you to work directly with the libxml2 library and problem solved.  In testing today, I found that this library was 30-50% faster in nearly all cases, with speed performance increasing the larger the document that was loaded.  I’m hoping that once I integrate the use of libxml into my app and gut the REXML code that I’ll continue to see these speed improvements.

    –TR

     Posted by at 11:58 pm
    Aug 182006
     

    Anyway, once MarcEdit starts allowing users to utilize the Saxon XSLT engine, a new COM property will be made available to allow users scripting to the MARCEngine the ability to modify which engine is in use.  Here’s how it would look:

    Const MSXML = 1
    Const SAXON = 2
    lret = 0

    Set obj_MARC=CreateObject(“MARCEngine5.MARC21″)
    obj_MARC.Set_XSLT_Engine = SAXON
    lret = obj_MARC.XML2MARC(“c:\test.xml”, “c:\test.mrc”, “c:\ead2marc21.xsl”, “c:\marc21slim2mnemonic.xsl”)

    msgbox “finished”

     –TR

     Posted by at 12:49 am