MarcEdit Linked Data

One of the changes in the current MarcEdit update is the introduction of a linked data rules file to help the program understand what data elements should be processed for automatic URL generation, and how that data should be treated. The Rules file is found in the Configs directory and is called: linked_data_profile.xml

The rules file is pretty straightforward. At this point, I haven’t created a schema for it, but I will to make defining data easier. Until then, I’ve added references in the header of the document to note fields and values.

Here’s a small snippet of the file:

<?xml version=”1.0″ encoding=”UTF-8″?>
<marcedit_linked_data_profile>
<!–
    rules block:
        top level: field
            Attributes:
                type: authority, bibliographic, authority|bibliographic
            tag (required):
                Value: Field value
                Description: field to process
            subfield (required):
                Value: Subfield codes
                Description: subfields to use for matching
            index (optional):
                Values: subfield code or empty
                Description: field that denotes index
            atomize(optional):
                Values: 1 or empty
                Description: determines if field should be broken up for uri disambiguation
            special_instructions (optional):
                Values: name|subject|mixed
                Description: special instructions to improve normalization for names and subjects.
            uri (required):
                Values: subfield code to include a url
                Description: Used to determine which subfield is used to embed a URI
            vocab (optional):
                Values (see supported vocabularies section)
                Description: when no index is supplied, you can predefine a supported index


Supported Vocabularies:
    Value: lcshac
    Description: LC Childrens Subjects

    Value: lcdgt
    Description: LC Demographic Terms

    Value: lcsh
    Description: LC Subjects

    Value: lctmg
    Description: TGM

    Value: aat
    Description: Getty Arts and Architecture Thesaurus

    Value: ulan
    Description: Getty ULAN

    Value: lcgft
    Description: LC Genre Forms

   Value: lcmpt
   Descirption: LC Medium Performance Thesaurus

   Value: naf
   Description: LC NACO Terms

   Value: naf_lcsh
   Description: lcsh/naf combined indexes.

   Value: mesh
   Description: MESH indexes
    –>
<rules>
    <field type=”bibliographic”>
      <tag>100</tag>
      <subfields>abcdqnp</subfields>
      <uri>0</uri>
      <special_instructions>name</special_instructions>
    </field>
    <field type=”bibliographic”>
      <tag>110</tag>
      <subfields>abcdqnp</subfields>
      <uri>0</uri>
      <special_instructions>name</special_instructions>
    </field>
</rules>
</marcedit_linked_data_profile>

The rules file is pretty straightforward. You have a field where you define a type. Acceptable values are: authority, bibliographic, authority|bibliographic. This tells the tool which type of record the process rules apply to. Second you define a tag, subfields to process when evaluating for linking, a uri field (this is the subfield used when outputting the URI, special instructions (if there are any), where the field is atomized (i.e., broken up so that you have one concept per URI), and vocab (to preset a default vocabulary for processing). So for example, say a user wanted to atomize a field that currently isn’t defined as such — they would just find the processing block for the field and add: <atomize>1</atomized> into the block — and that’s it.

The idea behind this rules file is to support the work of a PCC Task Force while they are testing embedding of URIs in MARC records. By shifting from a compiled solution to a rules based solution, I can provide immediate feedback and it should make the process easier to customize and test.

An important note — these rules will change. They are pretty well defined for bibliographic data, but authority data is still being worked out.

–tr

Comments

One response to “MarcEdit Linked Data–Rules file”

MarcEdit update Posted – Terry's Worklog

January 25, 2016

[…] The significant change was a shift in how the linked data processing works. I’ve shifted from hard code to a rules file. You can read about that here: https://blog.reeset.net/archives/1887 […]