Translating Project Gutenberg records

By reeset / On / In Uncategorized

I wrote this up some time ago, but I still occasionally get questions about it (in fact, I got one today, hence this note).  The Project Gutenberg (PG) provides its metadata for download in RDF format on it’s website at: http://www.gutenberg.org/feeds/catalog.rdf.zip.  I wrote an XSLT transformation for this data (fairly basic) when I was visiting the Internet Archive last year, and posted it here: http://oregonstate.edu/~reeset/marcedit/html/downloads.html (or directly, at: Project Gutenberg RDF = MARC).  

Running the RDF records through MarcEdit using this stylesheet produce the following MARC21 recordset: http://oregonstate.edu/~reeset/marcedit/anonymous/catalog.zip.  The process is really a straightforward one.  You download the above XSLT stylesheet, register it with MarcEdit and then you can be off on your merry way translating data to your heart’s content.  Of course, occasionally, folks ask about translating into other metadata formats, and that’s cool too.  If you can work with the API, you can do this in one step.  However, if you plan on using the MarcEdit UI, you need to do it in two. 

  1. Setup a PG => MARCXML translation (the xslt stylesheet above will do that)
  2. Create or use one of MarcEdit’s provided MARCXML => [format] stylesheets to complete the translation.

So, it’s really a two step process.  While many of the YouTube videos that I’ve uploaded in the past few days cover parts of this process, I decided to upload on final video on the topic that demonstrates this process (processing the PG data into both MARC and MODS3) as a reference case for future users looking to do something similar.  Hopefully this will help.  You can find the video here:

http://www.youtube.com/watch?v=1zHKIJ6D_dA

Cheers,

–TR