ETDs and automatic MARC generation

By reeset / On / In Digital Libraries

Since Sept., OSU has started accepting ETDs in electronic format.  The materials are stored in Dspace — and this has been a bit of a boon for us.  I’ve spent a little time this week putting together an xslt stylesheet that can be used, in conjunction with MarcEdit 5.0’s Metadata harvester, to harvest the metadata from Dspace and automatically generate MARC records for these items.  Its a pretty slick process — the only part that I can’t quite figure out is our keyword/subjects.  Since we are getting unqualified DC out of Dspace, I can’t determine which dc:subject is a keyword and which is an LC term — so for now, we are just generating all as LC subjects, and having the keywords removed at the point of final review. 

Anyway, the process is a simple one — in MarcEdit 5.0, there is a function in the MarcEditor called Metadata harvester.  This is an OAI harvester that supports a number of different metadata types.  Its looks something like this:

Metadata Harvester

Anyway, setting the config properties and harvesting the site produces very good MARC records…here’s an example:


Some of these fields are generated via the XSLT, some taken from the Dspace Metadata, some configured from the Dspace data.  Of interest, the subjects are encoded correctly by using a special template that examines the structure of the subject and then determining the type of each part of the field — however, unfortunately, not all data encoded in the 650 field are actually LC subjects.  We mix student provided keywords with our LCSH subjects within multiple subject fields — but you can’t tell the difference between the two types when harvesting via the OAI.  I’m hoping when I look at Dspace that maybe I can have it suppress export of specific fields within a collection.  Pretty interesting.

I’ll let folks know how this works out as we start to implement this new workflow.