So, a couple of a additional notes on this. I got some feedback from my latest attempt and while I’m closer, there are still some significant issues. Also, my testers and I weren’t seeing the same things on the screen when testing so I thought I’d follow up on two aspects of working with Right to Left languages that are not completely intuitive.
1) While Windows does support Right to Left rendering by default, it will not apply the bidi algorithm correctly unless you have explicitly turned on support for complex scripts (at least in XP-, having check my vista or Windows 7 testing boxes yet). This wasn’t obvious. It seemed to me that if I could see the text, could switch on the Right to Left processing, that certainly, I would be seeing what an Arabic user would be seeing. Well, no. If you want to actually have Windows support correct rendering of Right to left data, you need to open the Regional and Language Control Panel Applet and check the Install files for complete script and right to left languages:
Obvious – right? Once I dug up my Windows XP disk, I was able to install the support files and now, I can see what the same thing that an Arabic user would see. So, at least we are now working with the same screwed up display.
2) The bidi is an interesting algorithm, and the more I’ve been looking at it, the more that I think that options actually exist in the algorithm to solve my problem. At issue is how the algorithm treats data that shouldn’t be treated as right to left or mixed displays. The best explanation that I’ve read comes from here: http://blogs.msdn.com/vsarabic/archive/2008/03/24/mixed-time-date-display.aspx#comments explaining why mixed Time/Date elements often display incorrectly within Arabic interfaces. It explains how numbers and neutral characters are processed and how the rendering shifts when non-neutral, non-arabic characters are encountered. For MarcEdit’s purposes, I have a pretty good idea which non-neutral characters are causing my problems – the delimiters (since these can be a-z0-9). But the bidi (which you can read more about here: http://unicode.org/reports/tr9/) includes a set of character overrides that can be embedded to force certain data to be interpreted as strong, weak or neutral characters for processing. Playing around with this a little bit, I found that I could embed a few key elements and change the output to look like the following:
Still not perfect, but getting much closer to what I’m looking for. Of course, embedding these character codes invalidates the MARC record so they would need to be filtered out when saved or the saved data would be a mess – but I think that this could potentially be useful, specifically for people doing web display – since these embedded characters would allow you to essentially mark control/display data so that the rendering doesn’t affect the overall output of the text. Make sense? Not much to me either.