Roy Tennant vs RDA (and AutoCat) :)

By reeset / On / In Digital Libraries, Uncategorized

Ah, what fun.  Working in Technical Services, I tend to lurk on the AutoCat list to keep up and get an idea of what folks are chatting about there.  Normally the conversation is on traditional cataloging issues, but Roy’s latest musings in Library Journal (“Will RDA Be DOA”, url: http://libraryjournal.com/article/CA6422278.html) seems to have raised peoples hackles.  Predictably, catalogers were offended by the article, in part I think, because much of the blame for how our ILS systems currently function unfairly seems to fall at their feet.  This is unfortunate, because I think that Roy’s point has gotten lost in the current discussion on the list — that being that our current bibliographic frameworks are not sufficient for meeting future needs.  But I think more explanation is needed here since many people will read this statement and read into it that I’ve just said that MARC, AACR2 and the people that use them suck — which isn’t the case.   Rather, it represents a need to look at our current bibliographic frameworks (MARC, AACR2 and RDA) and evaluate not how they are working for us today (or yesterday) — but if they will meet our needs in a future where the library community and its data have become less isolated from the rest of the world.  We live in a changing information ecosystem — and libraries need to change with it.  While the retirement of MARC and AACR2 may be the eventual end-game, I doubt those that create such records would really see a big difference in what they do.  In fact, I should point out, to some degree this is already occurring.  Folks that catalog using OCLC’s Connexion client are already cataloging in XML.  The client saves data in XML templates — transmits data in XML — but generates MARC records for export.  So I certainly could envision a future where MARC has been replaced by something else, but where current catalogers simply describe things as they always have.  Anyway…

So what do I mean when I say that our current bibliographic frameworks are not sufficient for meeting future needs?  Well, lets talk about this in terms of AACR2, RDA and MARC.  There are two glaring issues as they relate to our current bibliographic frameworks — and I’m not certain how we solve this issue until we, as a community, move from MARC to something else.  I’ll also note that I don’t hear many people talking about them, which I think is too bad because I think that they are issues that cataloger may relate better to.  Generally, this conversation regarding bibliographic frameworks is framed in relation to what systems folks or coders don’t believe MARC can do.  Sometimes they’re right, sometimes there wrong, most of the time, they are running into real-world implementation of a framework that is constantly in a state of flux, being interpreted by different individuals.  However, in many ways, I think that this line of conversation is fruitless.  I’d like to focus my discussion on two issues that I run into helping MARC users around the globe.

  1. MARC doesn’t interoperate in its current form.  What do I mean?  Well, during the current thread, Roy had discussed a need to isolate full-text materials within his catalog.  AutoCat’rs quickly noted that this information can, if encoded correctly, be inferred from the 856 field — which encodes the URL.  Well, no.  In MARC21 when utilizing AACR2, the 856 field encodes the URL information.  However, this is different in CHINMARC, FINMARC, UNIMARC, etc.  The point is, MARC has lots of flavors spanning many different charactersets.  Having created MarcEdit, I’ve gotten the opportunity to work with catalogers around the world and I can tell you without hesitation that MARC flavors do not play well together.  It’s a struggle because OCLC, Library of Congress, they allow our profession to have a very North American focus (which I know RDA is hoping to overcome) but as long as flavors of MARC exist, so to will the cataloging community continue to be splintered.  Believe it or not, OCLC represents only a small part of the current MARC records being created and not everyone uses the Library of Congress as their gold standard.  MARC21 uses MARC8 and UTF8, but I work with a number of folks in Asia where they use Big5 or others — making these records completely incompatible with MARC21 records.  This is one of the benefits of a metadata schema like MODS — title, etc. are placed in the same place, no matter what descriptive rules are applied to the framework.  Users many use different punctuation rules, etc., but the data will be the same.  This isn’t currently the case when dealing with MARC.
  2. MARC, AACR2 and I believe RDA continue to isolate our community.  Who else uses MARC?  Anyone?  While AACR2, MARC, etc. have served our communities for a number of years (~40), it might be time to put this pony out to stud and develop a framework and metadata schema that will allow the library community to leverage mindshare from outside our small community.  Currently, the tools and professional vision of our profession continues to be shaped by a small number of vendors providing solutions for our MARC data.  If the library community adopted MODS or a variety of metadata schemas (for example, FGDC for cartographic materials, MODS for books and serials, etc).  then while our bibliographic frameworks would still be our own and library centric — the ability to build tools for, integrate data with — would expand beyond our little community.  The global IT community speaks XML, not MARC.  It’s time we join the rest of the world in this regard.

While I realize that agreeing with Roy, even partly, may cause me to forfeit my secret technical service decoder ring :), but I think that at some point, this is a conversation that the technical services community needs to seriously have.  I know, I know — we are having this conversation now the RDA.  Well no, because RDA allowed our current bibliographic framework to be part of the discussion and to some degree guide decisions.  We need to have this conversation without considering what we are doing now.  For me, the biggest concern that I have with our current bibliographic frameworks is the way in which it isolates our community.  There are a number of very bright people working in libraries — but imagine what our community could do if we could tap into the mindshare outside our little community and leverage open source projects directly — without having to first take our data out of MARC and into something like MARC21XML, MODS, etc.  As someone doing some of this work, I find it telling that the first step to designing any system around data currently in MARC, is that I have to take the data out of MARC, correct it for inconsistencies, massage it to make it more straightforward — just so that the information is useful within non-library systems. 

–TR

11 thoughts on “Roy Tennant vs RDA (and AutoCat) :)

  1. Two part comment:

    Folks that catalog using OCLC’s Connexion client are already cataloging in XML. The client saves data in XML templates — transmits data in XML — but generates MARC records for export.

    Really? I didn’t know that. Good for OCLC.

    MARC, AACR2 and I believe RDA continue to isolate our community. Who else uses MARC? Anyone?

    For me, this argument nails it. If we were speaking XML, we’d have a shot at some minimal form of interoperability via XSL-driven crosswalks. (The crosswalks would undoubtedly be a lossy translation, so in and of itself it isn’t a perfect solution.) But our collective insistence that we rely on an obscure binary record format throws us back to the dark ages.

  2. What I find most distressing about this whole incident (note, I don’t subscribe to AUTOCAT-L, but have followed the commentary – it’s bled over into NGC4LIB a bit – I have read Roy’s article, though) is how I keep reading about the ‘tension between systems and cataloging’.

    I am at my third systems job at three different ARL libraries and at no stop have I seen any tension between systems and tech services (outside the normal tensions between any two departments at any sort of organization that have different sets of priorities). In fact, the only places I have ever seen this discord is on mailing lists and at conferences.

    The real disconnect in my mind is why criticism of MARC, AACR2 or RDA is automatically taken to be a criticism of catalogers or the skills needed to catalog. Do I think MARC holds us back from the outside world? Yes (although not as much as AACR2/ISBD). Do I hold catalogers responsible for that? No. Do I think we could find the same functionality that we have in MARC with more accessible metadata formats? Certainly. In fact, I think rather than invent a new standard, we should be looking at the common metadata models out there (like you mention) and possibly pick and choose what is right from a variety of sources.

    What I don’t understand is why there isn’t more delegation of expertise amongst the appropriate communities and open communication to build consensus. Let the catalogers build a model. Let the technologists figure out how to store, manipulate and transmit said model. Let those that work with the public figure out how to display it.

    The problem now is that we don’t have a metadata model. We have a file format that does double duty as a model and lots of rules of how to work around the constraints it imposes.

    Our impediments seem to be pride, arrogance, suspicion, ignorance and, in many cases, a financial incentive to maintain the status quo. These qualities are true of each of the communities, I am singling out no one. However, until we overcome them, we’re going to remain in this quagmire of bickering and polarity.

  3. Ross,

    I hadn’t thought about it, but you are right regarding local relationships. In the two libraries I’ve worked at, systems and technical services have always worked very closely together (I know — because at both organizations, I split time in each), and in most places I visit, I’d say that this is the same. To some degree, I think these conversations come about because:
    1) Roy is a polarizing figure because of his body of work, dating back to his original MARC must Die article.
    2) MARC, AACR2 and RDA represent a big investment on behalf of the cataloging community and in recent years, more and more literature is coming out on the systems community is criticising the current work, current models and to some degree, those that perform this work (or at least openly hint that its the technical services community that is holding back the profession).
    3) I think there is apprehension relating to how this would change work in Technical Services and if it would mean an end to thoughtful bibliographic description. If you rememeber, OCLC and others have also positioned XML-based bibliographic description systems like Dublin Core as a “cheap” alternative to MARC that can be done by anyone. I think that this message was too successful and contributes to the current tension between the two groups when this conversation invariably comes up. Which is really too bad. At Oregon State University, we have taken an approach in some formats to do original metadata creation in the XML alternative because it allows for richer description. For example, cartographic materials. It’s very rare that we create an actual MARC record for them — since we’ve moved to describing all cartographic materials in FGDC and then creating derivative MARC records from these masters for inclusion into our ILS and OCLC.

    –TR

  4. Despite my aversion to letting lists inundate my inbox, this post has motivated me to subscribe to the AutoCat list to see how catalogers discuss their work and the issues it. Since a transition from old to new models always involves people, I am interested at how aware catalogers are of their mental models about bibliographic description and analysis and how well they might transfer these to new formats and standards. Based on previous participation in other forums and the conversations that I have followed in AutoCat so far, it seems that much thinking about change is so tied to particular formats and its attendant rules which is useful for doing the current work but is an impediment to thinking about broader frameworks for change for the future. I would like to see some thinking in catalogers that would take simple ideas like objects having properties, properties having values, objects belonging to classes, etc. and relate that easily to describing and analyzing information resources in any format. We are lucky to have people like Terry who could easily grasp these and even translate these into the language of systems as well as user needs but I see that many catalogers have a hard time going over those bridges. Perhaps this is a function of LIS education or something but some research is needed to find out more how catalogers themselves think about the ideas underlying their daily tasks.

  5. While I agree that MARC creates an interoperative separation among various communities, I would propose that in many cases XML schemas and DTDs do not provide the necessary granularity that MARC currently provides. With a lack of agreed upon XML encoding standards, this becomes as much of a problem for the sharing of data as it solves. So we can get the data out, but can we display it in a meaningful way?

    The fact that MARC imposes more constraints on how data is entered has prevented alot of the data encoding anarchy that currently reigns in text based encoding.

  6. Mark, do you have any proof to back up your assertion that XML is not granular enough? I’m not entirely sure where you get that, since, in fact, MARC has been mapped to XML. If you are referring to DC, well then, no, it’s not as granular, but that’s roughly akin to saying that a globe is not a good road map. I’m not sure anyone is trying to push DC as a MARC replacement (although it certainly could have a role).

    Have you looked at MODS? Do you find this lacking in granularity? Perhaps, but since it’s based on MARC semantics, I’m skeptical.

    Of course, all this assumes that we would use only one format for all aspects of current cataloging or that we would use an existing format or that we would even use XML at all. I think that’s a flawed assumption, honestly.

    And don’t for a moment think A) that an XML schema couldn’t impose much stricter constraints on data than MARC or B) that there isn’t a ton of absolutely horrid MARC data because our tools don’t bother with validation.

    Sorry, but I guess I see straw men all over the place, but no actually valid arguments.

  7. There doesn’t seem to be much evidence trotted out in any of these discussions. I am as guilty of this as anyone, and I am not on Autocat, but how often does anyone reference the kind of research that gets published in, say, ACM’s SIGIR material? There is some seriously scary stuff coming out of approaches like latent semantic indexing, or look at the references in some of the salton award lectures, or heck, consider how netflix has built a seemingly successful recommendation system from the most minimal of data. I wonder if any the structures we define are too removed from the state of the art in the IR community, and in turn, if the software we build is not leveraging either the most granular or even the most sloppy content to its full potential. I am not sure if there are answers to any of this, and maybe what we have is the best of what’s available/possible, but I like Ross’ notion of valid arguments. Watch two health care professionals interact over a diagnosis or prescription for example, I can’t imagine there is a great deal of concern about the possibility of offending the other in the conversation, and I suspect, in most cases, that a dizzying array of data is brought forward to bolster a viewpoint. Maybe we don’t have the data we need, in which case, gathering it is probably what the priority needs to be.

  8. “The fact that MARC imposes more constraints on how data is entered has prevented alot of the data encoding anarchy that currently reigns in text based encoding.”

    Hmm, I think there is a fundamental failure to communicate going on here.

    MARC’s problem is not that is imposes more constraints. The Karen Coyle/Diane Hillman type argument is NOT that MARC imposes too many constraints. If anything, it’s the opposite, that MARC _lacks_ the proper constraints.

    I think Terry makes a really really good point–Terry, Diane Hillman, Karen Coyle, Roy, NONE of them, to my reading, are advocating an end to ‘thoughtful bibliographic description’ that some seem to see as the attack. Maybe some administrators are looking for ‘cheaper’, but I think they’re equally wrong. We’re looking for ‘better’. The problem with MARC is not that it’s “too detailed”, or that it has “too much information” or “too many constraints”.

    It’s instead that the entire current system of cataloging (meaning cataloger practice + cooperative cataloging environment + AACR2 + MARC, and the universe of data they all produce)—-produces data that is way too hard, and in some cases impossible, for computer systems to use! That thoughtful description is going to waste, when it can’t be acted upon by systems.

  9. A random set of responses:

    Jonathan says “Maybe some administrators are looking for ‘cheaper’…”

    Maybe? Are you kidding? Most library administrators I know could not care less about proper data modeling, RDA, RDF, XML, etc., etc., etc. They want their cataloging operations to be cheaper. Period. They don’t see why “those rules” have to be so complicated — increasingly, they don’t see why they need people (as opposed to computer magic) to create metadata at all, or now in the Age of Full-Text, why they even need metadata. This has been developing over the course of the last 15 years. Remember “catloging simplification?” You are right that much of the angst around RDA arises from that history and context.

    Terry is right that the underlying data format could be swapped out tomorrow without catalogers having to change a thing. If the issue were as simple as MARC=bad/XML=good, that would be a straight-up systems issue, but it’s not.

    In addition, for most catalogers, if MARC is the problem, there is bugger all they can do about it. The system they’ve got is the system they’ve got, and they’ve been made to feel like they’re the bad guys. They are busy producing metadata as fast and as well as they can, in a wide variety of languages on a wide variety of topics. Most of them don’t have the luxury of thinking about “simple ideas like objects having properties, properties having values, objects belonging to classes, etc. and relat[ing] that easily to describing and analyzing information resources in any format.”

    Ross, MODS is nowhere near as granular as MARC, nor was it intended to be. Not even close. That doesn’t alter your point that arguing granularity of XML is meaningless. A particular XML schema can be as granular or not as the schema-designer makes it.