OCLC’s Connexion XML — why, oh why?

By reeset / On / In C#, General Computing, MarcEdit

As I’d noted previously (http://blog.reeset.net/archives/479), some early testers had found that the Connexion plug-in that I’d written for MarcEdit stripped the 007.  I couldn’t originally figure out why — it’s just a control field and their syntax for control fields is pretty straightforward.  However, after looking at a few records with 007 records, I could see why.  In Connexion, OCLC lets folks code the 007 using delimiters like a normal variable MARC field (when its not) — and they save it as such — using delimiters.  For example:

<v007 i2=" " i1=" " im="0">
  <sa>
    <d>s</d>
  </sa>
  <sb>
    <d>d</d>
  </sb>
  <sd>
    <d>f</d>
  </sd>
  <se>
    <d>s</d>
  </se>
  <sf>
    <d>n</d>
  </sf>
  <sg>
    <d>g</d>
  </sg>
  <sh>
    <d>n</d>
  </sh>
  <si>
    <d>n</d>
  </si>
  <sj>
    <d>z</d>
  </sj>
  <sk>
    <d>u</d>
  </sk>
  <sl>
    <d>u</d>
  </sl>
  <sm>
    <d>u</d>
  </sm>
  <sn>
    <d>d</d>
  </sn>
</v007>

I’ll admit — I have no idea why they went with this format.  From my perspective, its clunky.  The 007, as a single control field, is fairly easy to parse as it can have up to 13 bytes, with number of bytes specified 0 byte of the data element.  In this format, you actually have to create 9 different templates for the different possibilities in order to account for different field lengths, byte combinations and delimiter settings.  Honestly, my first impression when looking at this was that its a perfect example of how something so simple can become much more difficult than need be.  Personally, I would have been happier had they broke from their MARCXML like syntax for this one field to create an special 007 element.  Again, this is something that could have been easily abstracted in the XSLT translation — but to be fair, I don’t think that they figured anyone but OCLC’s connexion team would ever be trying to work with this. 

So how I’m solving it?  Well, one of the cool things working with XSLT (and .NET in general) is the ability to use extensions to help fill in missing functionality in the XSLT language (in my case, the ms:script extension in the msxml library).  Since this transformation isn’t one that I’m really sharing (outside the plug-in), I’m not too worried about its portability.  So, what I’ve done is created a number of helper C# functions and embedded them within the xslt document to aid processing.  For example,

<xsl:stylesheet version="1.0"
xmlns:marc="http://www.loc.gov/MARC21/slim"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ms="urn:schemas-microsoft-com:xslt"
 xmlns:osu="urn:oregonstate-edu:xslt"
 extension-element-prefixes="osu">
  <xsl:output method="xml" indent="yes" />
  <ms:script language="C#" implements-prefix="osu">
    <![CDATA[
        public int length(string s) {
          s = s.ToLower();
          if (s=="c") {
             return 14;
          } else if (s=="d") { return 6;}
          else if (s=="a") { return 8;}
          else if (s=="h") { return 13;}
          else if (s=="m") { return 10;}
          else if (s=="k") { return 6;}
          else if (s=="g") { return 9;}
          else if (s=="r") { return 11;}
          else if (s=="s") { return 14;}
          else if (s=="f") { return 10;}
          else if (s=="v") { return 9;}
          else { return 8;}
        }
      ]]>
  </ms:script>
 

This is a simple function that I’m using to track the number of elements needed for the processing template.  This is because I don’t want to create 9 different XSLT templates for each processing type, so I’m using some embedded C# to simplify the process.  On the plus side, using these embedded scripts make the translation process much faster on the .NET side (since .NET compiles xslt to byte code anyway before running any translation process), and this is a technique that I’ve never really had to use before so I was able to get a little practical experience.  Still don’t like it though.

–TR

2 thoughts on “OCLC’s Connexion XML — why, oh why?

  1. One thing I’ve always wondered is how many of the people who create XML actually design templates or write code that uses it. The whole point of XML is that it’s supposed to be much simpler and easier to use, yet we constantly encounter examples of insane structures that are very awkward to work with.