One of the changes in the current MarcEdit update is the introduction of a linked data rules file to help the program understand what data elements should be processed for automatic URL generation, and how that data should be treated. The Rules file is found in the Configs directory and is called: linked_data_profile.xml
The rules file is pretty straightforward. At this point, I haven’t created a schema for it, but I will to make defining data easier. Until then, I’ve added references in the header of the document to note fields and values.
Here’s a small snippet of the file:
<?xml version=”1.0″ encoding=”UTF-8″?>
<marcedit_linked_data_profile>
<!–
rules block:
top level: field
Attributes:
type: authority, bibliographic, authority|bibliographic
tag (required):
Value: Field value
Description: field to process
subfield (required):
Value: Subfield codes
Description: subfields to use for matching
index (optional):
Values: subfield code or empty
Description: field that denotes index
atomize(optional):
Values: 1 or empty
Description: determines if field should be broken up for uri disambiguation
special_instructions (optional):
Values: name|subject|mixed
Description: special instructions to improve normalization for names and subjects.
uri (required):
Values: subfield code to include a url
Description: Used to determine which subfield is used to embed a URI
vocab (optional):
Values (see supported vocabularies section)
Description: when no index is supplied, you can predefine a supported index
Supported Vocabularies:
Value: lcshac
Description: LC Childrens Subjects
Value: lcdgt
Description: LC Demographic Terms
Value: lcsh
Description: LC Subjects
Value: lctmg
Description: TGM
Value: aat
Description: Getty Arts and Architecture Thesaurus
Value: ulan
Description: Getty ULAN
Value: lcgft
Description: LC Genre Forms
Value: lcmpt
Descirption: LC Medium Performance Thesaurus
Value: naf
Description: LC NACO Terms
Value: naf_lcsh
Description: lcsh/naf combined indexes.
Value: mesh
Description: MESH indexes
–>
<rules>
<field type=”bibliographic”>
<tag>100</tag>
<subfields>abcdqnp</subfields>
<uri>0</uri>
<special_instructions>name</special_instructions>
</field>
<field type=”bibliographic”>
<tag>110</tag>
<subfields>abcdqnp</subfields>
<uri>0</uri>
<special_instructions>name</special_instructions>
</field>
</rules>
</marcedit_linked_data_profile>
The rules file is pretty straightforward. You have a field where you define a type. Acceptable values are: authority, bibliographic, authority|bibliographic. This tells the tool which type of record the process rules apply to. Second you define a tag, subfields to process when evaluating for linking, a uri field (this is the subfield used when outputting the URI, special instructions (if there are any), where the field is atomized (i.e., broken up so that you have one concept per URI), and vocab (to preset a default vocabulary for processing). So for example, say a user wanted to atomize a field that currently isn’t defined as such — they would just find the processing block for the field and add: <atomize>1</atomized> into the block — and that’s it.
The idea behind this rules file is to support the work of a PCC Task Force while they are testing embedding of URIs in MARC records. By shifting from a compiled solution to a rules based solution, I can provide immediate feedback and it should make the process easier to customize and test.
An important note — these rules will change. They are pretty well defined for bibliographic data, but authority data is still being worked out.
–tr
Comments
One response to “MarcEdit Linked Data–Rules file”
[…] The significant change was a shift in how the linked data processing works. I’ve shifted from hard code to a rules file. You can read about that here: https://blog.reeset.net/archives/1887 […]