MarcEdit Linked Data–Rules file

By reeset / On / In MarcEdit

One of the changes in the current MarcEdit update is the introduction of a linked data rules file to help the program understand what data elements should be processed for automatic URL generation, and how that data should be treated.  The Rules file is found in the Configs directory and is called: linked_data_profile.xml




The rules file is pretty straightforward.  At this point, I haven’t created a schema for it, but I will to make defining data easier.  Until then, I’ve added references in the header of the document to note fields and values. 

Here’s a small snippet of the file:

<?xml version=”1.0″ encoding=”UTF-8″?>
    rules block:
        top level: field
                type: authority, bibliographic, authority|bibliographic
            tag (required):
                Value: Field value
                Description: field to process
            subfield (required):
                Value: Subfield codes
                Description: subfields to use for matching
            index (optional):
                Values: subfield code or empty
                Description: field that denotes index
                Values: 1 or empty
                Description: determines if field should be broken up for uri disambiguation
            special_instructions (optional):
                Values: name|subject|mixed
                Description: special instructions to improve normalization for names and subjects. 
            uri (required):
                Values: subfield code to include a url
                Description: Used to determine which subfield is used to embed a URI
            vocab (optional):
                Values (see supported vocabularies section)
                Description: when no index is supplied, you can predefine a supported index
  Supported Vocabularies:
    Value: lcshac
    Description: LC Childrens Subjects
    Value: lcdgt
    Description: LC Demographic Terms
    Value: lcsh
    Description: LC Subjects
    Value: lctmg
    Description: TGM
    Value: aat
    Description: Getty Arts and Architecture Thesaurus
    Value: ulan
    Description: Getty ULAN
    Value: lcgft
    Description: LC Genre Forms
   Value: lcmpt
   Descirption: LC Medium Performance Thesaurus
   Value: naf
   Description: LC NACO Terms
   Value: naf_lcsh
   Description: lcsh/naf combined indexes.
   Value: mesh
   Description: MESH indexes
    <field type=”bibliographic”>
    <field type=”bibliographic”>

The rules file is pretty straightforward.  You have a field where you define a type.  Acceptable values are: authority, bibliographic, authority|bibliographic.  This tells the tool which type of record the process rules apply to.  Second you define a tag, subfields to process when evaluating for linking, a uri field (this is the subfield used when outputting the URI, special instructions (if there are any), where the field is atomized (i.e., broken up so that you have one concept per URI), and vocab (to preset a default vocabulary for processing).  So for example, say a user wanted to atomize a field that currently isn’t defined as such – they would just find the processing block for the field and add: <atomize>1</atomized> into the block – and that’s it.

The idea behind this rules file is to support the work of a PCC Task Force while they are testing embedding of URIs in MARC records.  By shifting from a compiled solution to a rules based solution, I can provide immediate feedback and it should make the process easier to customize and test. 

An important note – these rules will change.  They are pretty well defined for bibliographic data, but authority data is still being worked out. 


One thought on “MarcEdit Linked Data–Rules file