How do I generate MARC authority records from the Homosaurus vocabulary?

Step by step instructions here: https://youtu.be/FJsdQI3pZPQ

Ok, so last week, I got an interesting question on the listserv where a user asked specifically about generating MARC records for use in one’s ILS system from a JSONLD vocabulary.  In this case, the vocabulary in question as Homosaurus (Homosaurus Vocabulary Site) – and the questioner was specifically looking for a way to pull individual terms for generation into MARC Authority records to add to one’s ILS to improve search and discovery.

When the question was first asked, my immediate thought was that this could likely be accommodated using the XML/JSON profiling wizard in MarcEdit.  This tool can review a sample XML or JSON file and allow a user to create a portable processing file based on the content in the file.  However, there were two issues with this approach:

  1. The profile wizard assumes that data format is static – i.e., the sample file is representative of other files.  Unfortunately, for this vocabulary, that isn’t the case. 
  2. The profile wizard was designed to work with JSON – JSON LD is actually a different animal due to the inclusion of the @ symbol. 

While I updated the Profiler to recognize and work better with JSON-LD – the first challenge is one that doesn’t make this a good fit to create a generic process.  So, I looked at how this could be built into the normal processing options.

To do this, I added a new default serialization, JSON=>XML == which MarcEdit now supports.  This allows the tool to take a JSON file, and deserialize the data so that is output reliably as XML.  So, for example, here is a sample JSON-LD file (homosaurus.org/v2/adoptiveParents.jsonld):

{
  "@context": {
    "dc": "http://purl.org/dc/terms/",
    "skos": "http://www.w3.org/2004/02/skos/core#",
    "xsd": "http://www.w3.org/2001/XMLSchema#"
  },
  "@id": "http://homosaurus.org/v2/adoptiveParents",
  "@type": "skos:Concept",
  "dc:identifier": "adoptiveParents",
  "dc:issued": {
    "@value": "2019-05-14",
    "@type": "xsd:date"
  },
  "dc:modified": {
    "@value": "2019-05-14",
    "@type": "xsd:date"
  },
  "skos:broader": {
    "@id": "http://homosaurus.org/v2/parentsLGBTQ"
  },
  "skos:hasTopConcept": [
    {
      "@id": "http://homosaurus.org/v2/familyMembers"
    },
    {
      "@id": "http://homosaurus.org/v2/familiesLGBTQ"
    }
  ],
  "skos:inScheme": {
    "@id": "http://homosaurus.org/terms"
  },
  "skos:prefLabel": "Adoptive parents",
  "skos:related": [
    {
      "@id": "http://homosaurus.org/v2/socialParenthood"
    },
    {
      "@id": "http://homosaurus.org/v2/LGBTQAdoption"
    },
    {
      "@id": "http://homosaurus.org/v2/LGBTQAdoptiveParents"
    },
    {
      "@id": "http://homosaurus.org/v2/birthParents"
    }
  ]
}

In MarcEdit, the new JSON=>XML process can take this file and output it in XML like this:

<?xml version="1.0"?>
<records>
    <record>
        <context>
            <dc>http://purl.org/dc/terms/</dc>
            <skos>http://www.w3.org/2004/02/skos/core#</skos>
            <xsd>http://www.w3.org/2001/XMLSchema#</xsd>
        </context>
        <id>http://homosaurus.org/v2/adoptiveParents</id>
        <type>skos:Concept</type>
        <identifier>adoptiveParents</identifier>
        <issued>
            <value>2019-05-14</value>
            <type>xsd:date</type>
        </issued>
        <modified>
            <value>2019-05-14</value>
            <type>xsd:date</type>
        </modified>
        <broader>
            <id>http://homosaurus.org/v2/parentsLGBTQ</id>
        </broader>
        <hasTopConcept>
            <id>http://homosaurus.org/v2/familyMembers</id>
        </hasTopConcept>
        <hasTopConcept>
            <id>http://homosaurus.org/v2/familiesLGBTQ</id>
        </hasTopConcept>
        <inScheme>
            <id>http://homosaurus.org/terms</id>
        </inScheme>
        <prefLabel>Adoptive parents</prefLabel>
        <related>
            <id>http://homosaurus.org/v2/socialParenthood</id>
        </related>
        <related>
            <id>http://homosaurus.org/v2/LGBTQAdoption</id>
        </related>
        <related>
            <id>http://homosaurus.org/v2/LGBTQAdoptiveParents</id>
        </related>
        <related>
            <id>http://homosaurus.org/v2/birthParents</id>
        </related>
    </record>
</records>

The ability to reliably convert JSON/JSONLD to XML means that I can now allow users to utilize the same XSLT/XQUERY process MarcEdit utilizes for other library metadata format transformation.  All that was left to make this happen was to add a new origin data format to the XML Function template – and we are off and running.

The end result is users could utilize this process with any JSON-LD vocabulary (assuming they created the XSLT) to facilitate the automation of MARC Authority data.  In this case of this vocabulary, I’ve created an XSLT and added it to my github space: https://github.com/reeset/marcedit_xslt_files/blob/master/homosaurus_xml.xsl

but have included the XSLT in the MarcEdit XSLT directory in current downloads.

In order to use this XSLT and allow your version of MarcEdit to generate MARC Authority records from this vocabulary – you would use the following steps:

  1. Be using MarcEdit 7.5.8+ or MarcEdit Mac 3.5.8+ (Mac version will be available around 4/8).  I have not decided if I will backport to 7.3-
  2. Open the XML Functions Editor in MarcEdit
  3. Add a new Transformation – using JSON as the original format, and MARC as the final.  Make sure the XSLT path is pointed to the location where you saved the downloaded XSLT file.
  4. Save

That should be pretty much it.  I’ve recorded the steps and placed them here: https://youtu.be/FJsdQI3pZPQ, including some information on values you may wish to edit should you want to localize the XSLT. 


Posted

in

by

Tags:

Comments

One response to “How do I generate MARC authority records from the Homosaurus vocabulary?”

  1. […] To better support the translation of data from JSON to MARC, I’ve included a JSON => MARC algorithm in the MARCEngine.  This will allow JSON data to serialized into XML.  The benefit of including this option, is that I’ve been able to update the XML Functions options to allow JSON to be a starting format.  This will specifically useful for users that want to make use of linked data vocabularies to generate MARC Authority records.  Users can direct MarcEdit to facilitate the translation from JSON to XML, and then create XSLT translations that can then be used to complete the process to MARCXML and MARC.  I’ve demonstrated how this process works using a vocabulary of interest to the #critcat community, the Homosaurus vocabulary (How do I generate MARC authority records from the Homosaurus vocabulary? – Terry’s Worklog (…). […]