I was glad to see Gary Smith from OCLC finally post OCLC’s official response regarding the current MARBI proposal regarding the techniques for conversion of Unicode to MARC-8. For those that haven’t see the proposal, the general gist of the document is that the current recommendation is to have non-transformable characters dropped, replaced by a fill character. Personally, I was for one of the other options in the report, the generation of NCR (Numeric Character References), like you see in XML, so that translation between Unicode and MARC-8 and MARC-8 to Unicode would be a lossless process — a process that would be lost if a fill character was utilized. However, Gary sums up a very good reason to give this further thought in his post…he writes that:
OCLC does not support this proposal. Our recent experience in dealing
with Unicode data has shown that we require a lossless representation
for our own operations. We expect that many of our users will have
similar requirements. The use of a replacement character constitutes a
permanent loss of information. If we produce and distribute records
containing replacement characters, they will inevitably come back to us
— and to every other system that takes in data from another system —
in a degraded and unrepairable form.
And he’s right…there are a number of toy ILS systems that will continue to require and share data in legacy formats. Heck, we use Innovative Interfaces here at OSU and our system hasn’t been converted to Unicode (though we could if asked Innovative to do the conversion — however, there are consequences to this decision that we haven’t worked through yet), so its not simply a toy ILS problem at this point. The fact the OCLC or any system would be ingesting these records at some point would be problematic. Currently, MarcEdit generates NCR’s for unmappable characters when moving between UTF-8 and MARC-8, however, I’ll eventually support whatever MARBI blesses as the desired technique — so I’ll be keeping an eye on this and attending the discussion at midwinter.