OCLC’s Connexion XML — why, oh why?

By reeset / On / In C#, General Computing, MarcEdit

As I’d noted previously (http://blog.reeset.net/archives/479), some early testers had found that the Connexion plug-in that I’d written for MarcEdit stripped the 007.  I couldn’t originally figure out why — it’s just a control field and their syntax for control fields is pretty straightforward.  However, after looking at a few records with 007 records, I could see why.  In Connexion, OCLC lets folks code the 007 using delimiters like a normal variable MARC field (when its not) — and they save it as such — using delimiters.  For example:

<v007 i2=" " i1=" " im="0">
  <sa>
    <d>s</d>
  </sa>
  <sb>
    <d>d</d>
  </sb>
  <sd>
    <d>f</d>
  </sd>
  <se>
    <d>s</d>
  </se>
  <sf>
    <d>n</d>
  </sf>
  <sg>
    <d>g</d>
  </sg>
  <sh>
    <d>n</d>
  </sh>
  <si>
    <d>n</d>
  </si>
  <sj>
    <d>z</d>
  </sj>
  <sk>
    <d>u</d>
  </sk>
  <sl>
    <d>u</d>
  </sl>
  <sm>
    <d>u</d>
  </sm>
  <sn>
    <d>d</d>
  </sn>
</v007>

I’ll admit — I have no idea why they went with this format.  From my perspective, its clunky.  The 007, as a single control field, is fairly easy to parse as it can have up to 13 bytes, with number of bytes specified 0 byte of the data element.  In this format, you actually have to create 9 different templates for the different possibilities in order to account for different field lengths, byte combinations and delimiter settings.  Honestly, my first impression when looking at this was that its a perfect example of how something so simple can become much more difficult than need be.  Personally, I would have been happier had they broke from their MARCXML like syntax for this one field to create an special 007 element.  Again, this is something that could have been easily abstracted in the XSLT translation — but to be fair, I don’t think that they figured anyone but OCLC’s connexion team would ever be trying to work with this. 

So how I’m solving it?  Well, one of the cool things working with XSLT (and .NET in general) is the ability to use extensions to help fill in missing functionality in the XSLT language (in my case, the ms:script extension in the msxml library).  Since this transformation isn’t one that I’m really sharing (outside the plug-in), I’m not too worried about its portability.  So, what I’ve done is created a number of helper C# functions and embedded them within the xslt document to aid processing.  For example,

<xsl:stylesheet version="1.0"
xmlns:marc="http://www.loc.gov/MARC21/slim"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ms="urn:schemas-microsoft-com:xslt"
 xmlns:osu="urn:oregonstate-edu:xslt"
 extension-element-prefixes="osu">
  <xsl:output method="xml" indent="yes" />
  <ms:script language="C#" implements-prefix="osu">
    <![CDATA[
        public int length(string s) {
          s = s.ToLower();
          if (s=="c") {
             return 14;
          } else if (s=="d") { return 6;}
          else if (s=="a") { return 8;}
          else if (s=="h") { return 13;}
          else if (s=="m") { return 10;}
          else if (s=="k") { return 6;}
          else if (s=="g") { return 9;}
          else if (s=="r") { return 11;}
          else if (s=="s") { return 14;}
          else if (s=="f") { return 10;}
          else if (s=="v") { return 9;}
          else { return 8;}
        }
      ]]>
  </ms:script>
 

This is a simple function that I’m using to track the number of elements needed for the processing template.  This is because I don’t want to create 9 different XSLT templates for each processing type, so I’m using some embedded C# to simplify the process.  On the plus side, using these embedded scripts make the translation process much faster on the .NET side (since .NET compiles xslt to byte code anyway before running any translation process), and this is a technique that I’ve never really had to use before so I was able to get a little practical experience.  Still don’t like it though.

–TR

Dynamically loading and Unloading Assemblies in C#

By reeset / On / In C#

While working on a plugin manager for a program written in C#, I found myself with a need to be able to load and unload assemblies dynamically be an application.  In C#, loading assemblies is a fairly easy prospect — one just needs to make use of the System.Reflection class.  Something like the following:

System.Reflection.Assembly assembly = System.Reflection.Assembly.LoadFile(@"c:\yourassembly.dll");

However, if you need to unload the assembly — good luck.  The .NET assembly class doesn’t include an unload method.  If you have a need to be able to dynamically load an unload assemblies, you need to work with the AppDomain class.  The .NET framework works on an Application Domain model, so for items like plugins (where you may need to load, unload or modify an assembly), you need to create an Application Domain manager to load assemblies onto.  This way, when you need to unload an assembly, you use the Unload method found within the AppDomain class. 

Of course, when dealing with plugins, you likely will need to create a new application domain for each plugin to be loaded.  This is because the you unload the appdomain, not the assemblies attached to the domains.  So for my project, I decided to create something much like the TempFileCollection.  In a global class, I decided to create a hash that stories a domain name and the domain object.  Using this method, I can do something like the following:

   1:  string path = cglobal.mglobal.AppPath() + "plugins" + System.IO.Path.DirectorySeparatorChar;
   2:              string[] files = System.IO.Directory.GetFiles(path);
   3:   
   4:              lstInstalled.Items.Clear();
   5:              foreach (string f in files)
   6:              {
   7:                  try
   8:                  {
   9:                      System.AppDomain domain = System.AppDomain.CreateDomain(System.IO.Path.GetFileName(f));
  10:                      System.IO.StreamReader reader = new System.IO.StreamReader(f, System.Text.Encoding.GetEncoding(1252), false);
  11:   
  12:                      byte[] b = new byte[reader.BaseStream.Length];
  13:                      reader.BaseStream.Read(b, 0, System.Convert.ToInt32(reader.BaseStream.Length));
  14:   
  15:                      domain.Load(b);
  16:                      System.Reflection.Assembly[] a = domain.GetAssemblies();
  17:                      int index = 0;
  18:   
  19:                      
  20:                      
  21:   
  22:                      for (int x = 0; x < a.Length; x++)
  23:                      {
  24:                          if (a[x].GetName().Name + ".dll" == System.IO.Path.GetFileName(f))
  25:                          {
  26:                              index = x;
  27:                              break;
  28:                          }
  29:                      }
  30:   
  31:                      System.Windows.Forms.ListViewItem item = new ListViewItem();
  32:   
  33:                      item.Text = a[index].GetName().Name + ".dll";
  34:                      item.SubItems.Add(a[index].GetName().Version.ToString());
  35:                      item.SubItems.Add(reader.BaseStream.Length.ToString());
  36:                      lstInstalled.Items.Add(item);
  37:                      reader.Close();
  38:                      cglobal.mglobal.domains.Add(System.IO.Path.GetFileName(f), domain);
  39:                      
  40:                  }
  41:                  catch { }
  42:              }
 

Then, if we need to unload the assembly, we can unload the domain that its attached to.  Something like:

   1:  for (int x = 0; x < lstInstalled.Items.Count; x++)
   2:              {
   3:                  if (lstInstalled.Items[x].Selected == true) {
   4:                      try {
   5:                          if (System.IO.File.Exists(cglobal.mglobal.AppPath() + "plugins" + System.IO.Path.DirectorySeparatorChar + lstInstalled.Items[x].Text)) {
   6:                              System.AppDomain.Unload((System.AppDomain)cglobal.mglobal.domains[lstInstalled.Items[x].Text]);
   7:                              cglobal.mglobal.domains.Remove(lstInstalled.Items[x].Text);
   8:                              System.IO.File.Delete(cglobal.mglobal.AppPath() + "plugins" + System.IO.Path.DirectorySeparatorChar + lstInstalled.Items[x].Text);
   9:                          }
  10:                      }
  11:                      catch {}
  12:                  }
  13:              }

Seems a little more involved that it has to be, but once you know how it works, its not that big of a deal.

–TR

.NET 64-bit processor memory issues when using sendmessage to access a winform element

By reeset / On / In C#, MarcEdit, Microsoft

I’m posting this in hopes that it will save someone else a lot of time or someone that knows .NET a bit better than I can provide a better solution. 

Problem:

Last week, I had someone ping me regarding MarcEdit and a problem that they were running into with the Editor running it on a 64-bit version of Windows 2003 Server.  MarcEdit is compiled for any processor, so in theory, the framework should adjust the variable types to the current CPU type and go on it’s merry way.  And was it not that I have to work with some unmanaged code within my application, I’m sure that this would be the case.  However, when opening the MarcEditor, the user was getting the following error message:

This is odd because I test MarcEdit on every version of Windows from 98 to Vista.  The problem however, is I’ve never ran the program in a 64-bit version of Windows. 

Background:

I did a little bit of research, and found what I thought to be the problem.  The 64-bit version of windows shares many of the same signatures as its 32-bit counter-part, but one place where the signatures differ is in the Messaging Queue.  SendMessage, for example, which uses integers to pass values between processes had been updated to 64 bit integers and would crash if the wrong data type is sent into the function.  No problem, I fixed the signature issue, but the error message remained.  What I didn’t realize is that this wasn’t the actual problem (though it was a problem).  The real problem seemed to be related to simply accessing the RichTextbox Handle and passing it the callback.  Anytime the Handle was touched and passed, this error would be generated.

Solution:

So, Microsoft does make the Enterprise version of Windows 2003 Server available on a trial basis for developers wanting to test their software.  So, I dug up a box with an AMD-64 bit processor and set to installing the software.  Next, I installed SharpDevelop, an Open Source IDE for .NET.  I created a small sample program to isolate the code that was causing me problems.  In my case, the code that was causing the problem is necessary because of MARC being a UTF8 encoded data format.  Microsoft’s Richtext library supports the loading of plaintext (ASCII), Unicode text, text with OLE objects and text in just about any character format, including UTF8.  Unfortunately, the .NET framework only exposes plaintext and Unicode text as supported formats.  This means that in order to load UTF8 data and utilize the components streaming nature to minimize the memory footprint during loading, we need to essentially write our own EditStreamCallback function, create the delegates, the EDITSTREAM struct, etc.  And in that, there is the rub.  When compiling the code in SharpDevelop, I specified that the code should be targeted specifically for a 64-bit processor.  During compile, I got two warning messages that two core .NET components are compiled specifically for 32-bit processors.  Since the signatures on the 64 and 32 bit machines are identical, one can generally ignore these compilation warnings, as the framework does it’s magic.  However, the fact that I’m utilizing functionality from one of these two components within an unmanaged code block causes the problem.  Within the .NET (and 64-bit environment in general), an 64-bit process cannot load a library compiled for a 32-bit process.  A 32-bit process can run within a 64-bit environment, they just cannot share processes between themselves.  My best guess is that this is what was happening.  Since these two .NET components were compiled specifically for the 32-bit processors, my attempts to load them into a 64-bit process and utilize them within an unmanaged code block caused issues.  The solution is a simply one — for the GUI application of MarcEdit (which doesn’t do much anyway), the program simply needs to be complied to target 32-bit processors.  Now it runs just fine within a 64-bit environment, and will remain so until Microsoft cleans up these two core libraries.  With that said, if anyone has a better way of dealing with this problem (code is attached, so if you can make it work, I’d love to here from you), I’d love to hear about it.

RichText Code:

Finally, it’s pretty difficult to find example code dealing with the Richtext components in C#.  I think this is primarily because most folks that use high level languages like C# either don’t have a need for it or don’t have the background in C++ to understand what is actually happening at the Proc level.  Anyway, to that end, I’m posting the source to my small sample program (get it here) that I used to diagnosis this problem.  The trick to doing this type of interaction is to avoid the use of integer class variables.  In .NET, you have to remember that you are dealing with managed code, so when you make the call to a API like SendMessage, you should be Marshalling all your data, and passing it into the function via the IntPtr structure.  The only exception to that with the SendMessage API is the message argument, which microsoft defines and an unsigned 32-bit integer on all platforms, though for practical purposes, the message argument should be classed as a 32-bit integer.

API/Delegate Declarations

   1:  private const int SF_USECODEPAGE = 0x020;
   2:          private const int SF_TEXT = 0x001;
   3:          private const int SF_RTF = 0x002;
   4:          private const int CP_UTF8 = 65001;
   5:   
   6:          private const int WM_SETREDRAW      = 0x000B;
   7:   
   8:          private const int WM_USER = 0x400;
   9:          private const int EM_STREAMIN = WM_USER + 73;
  10:          private const int EM_GETEVENTMASK   = (WM_USER + 59);
  11:          private const int EM_SETEVENTMASK   = (WM_USER + 69);
  12:          private const int EM_STREAMOUT = WM_USER + 74;
  13:          private const int ENM_NONE =    0;
  14:          private const int EM_SETTEXTMODE        = WM_USER + 89;
  15:   
  16:          private const int TM_PLAINTEXT       = 1;
  17:   
  18:          private const int ECO_AUTOWORDSELECTION = 0x00000001;
  19:          private const int ECO_AUTOVSCROLL = 0x00000040;
  20:          private const int ECO_AUTOHSCROLL = 0x00000080;
  21:          private const int ECO_NOHIDESEL = 0x00000100;
  22:          private const int ECO_READONLY = 0x00000800;
  23:          private const int ECO_WANTRETURN = 0x00001000;
  24:          private const int ECO_SAVESEL = 0x00008000;
  25:          private const int ECO_SELECTIONBAR = 0x01000000;
  26:          private const int ECO_VERTICAL = 0x00400000;
  27:          private const int ECOOP_SET = 0x0001;
  28:          private const int ECOOP_OR = 0x0002;
  29:          private const int ECOOP_AND = 0x0003;
  30:          private const int ECOOP_XOR = 0x0004;
  31:  
  32:          private const int EM_SETOPTIONS = (WM_USER + 77);
  33:          private const int EM_GETOPTIONS = (WM_USER + 78);
  34:   
  35:   
  36:          delegate IntPtr EditStreamCallback(IntPtr dwCookie, IntPtr pbBuff, IntPtr
  37:              cb, out IntPtr pcb);
  38:   
  39:  
  40:          struct EDITSTREAM
  41:          {
  42:              public IntPtr dwCookie;
  43:              public IntPtr dwError;
  44:              public EditStreamCallback pfnCallback;
  45:          }
  46:   
  47:  
  48:   
  49:          [DllImport("user32.dll", CharSet = CharSet.Auto, SetLastError = false)]
  50:          static extern IntPtr SendMessage(HandleRef hWnd, Int32 Msg,
  51:                                          IntPtr wParam, IntPtr lParam);
  52:  
  53:          [DllImport("user32.dll", CharSet = CharSet.Auto, SetLastError = false)]
  54:          static extern IntPtr SendMessage(HandleRef hwnd, Int32 msg, IntPtr
  55:              wParam,    ref EDITSTREAM lParam);

In the declarations, you will see that two forms of SendMessage have been defined.  One where the lParam references the EDITSTREAM structure and on where it references an IntPtr structure.  The former is used when streaming data into the RichText window, the latter is used when sending regular messages between controls.  It should be noted, the later could be removed in .NET 2.0 by making use of the System.Windows.Forms.Message class, which essentially allows you to send messages to controls so long as all arguments can be sent as IntPtrs.

After the declarations, the remainder of the code is setting up the actual streaming, and creating the function that the delegate prototypes.  In this example, I’ve called the streaming function, ReadRichTextStream and the actual streaming function, StreamIn.  These functions would look like the following:

ReadRichTextStream: Accepts a RichTextBox Object and the filename of the file to load.

   1:          private void ReadRichTextStream(System.Windows.Forms.RichTextBox objRich,
   2:              string sfilename)
   3:          {
   4:  
   5:              string filename = sfilename.ToLower();
   6:              objRich.Text = "";
   7:              int eType = SF_TEXT;
   8:              if (filename.EndsWith(".mrk")|filename.EndsWith(".mrk8")|filename.EndsWith(".tmp")|filename.EndsWith(".xml"))
   9:              {
  10:                  eType = (((CP_UTF8)<<16)|SF_USECODEPAGE|SF_TEXT);
  11:              }
  12:              else if (filename.EndsWith(".bmrk"))
  13:              {
  14:                  eType = SF_TEXT;
  15:              }
  16:              else if (filename.EndsWith(".rtf"))
  17:              {
  18:                  eType = SF_RTF;
  19:              }
  20:              else if (filename.EndsWith(".txt"))
  21:              {
  22:                  eType = SF_TEXT;
  23:              }
  24:              else
  25:              {
  26:                  eType = (((CP_UTF8)<<16)|SF_USECODEPAGE|SF_TEXT);
  27:              }
  28:   
  29:              //this.Redraw = false;
  30:              long b_length = 0;
  31:              System.IO.FileStream fs = new System.IO.FileStream(sfilename, System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read);
  32:              b_length = fs.Length;
  33:              Application.DoEvents();
  34:              System.Runtime.InteropServices.GCHandle gch = System.Runtime.InteropServices.GCHandle.Alloc(fs, System.Runtime.InteropServices.GCHandleType.Normal);
  35:              EDITSTREAM es = new EDITSTREAM();
  36:              es.dwCookie = (IntPtr)gch;
  37:              EditStreamCallback callback = new EditStreamCallback(StreamIn);
  38:              es.pfnCallback = callback
  39:  
  40:              SendMessage(new HandleRef(objRich, objRich.Handle), (Int32)EM_STREAMIN, (IntPtr)eType, ref es);
  41:  
  42:              //Remember to free allocated memory to avoid leaks.
  43:              gch.Free();
  44:              fs.Close();
  45:  
  46:  
  47:          }

StreamIn: StreamIn is the function that actually reads the data from the file and pushs the data into the RichTextBox callback to print into the control.

   1:          public IntPtr StreamIn(IntPtr dwCookie, IntPtr pbBuff, IntPtr
   2:              cb, out IntPtr pcb)
   3:          {
   4:              byte[] buffer = new byte[cb.ToInt32()];
   5:              uint result = 0;
   6:   
   7:  
   8:  
   9:  
  10:              System.IO.FileStream fs = (System.IO.FileStream)((GCHandle)dwCookie).Target;
  11:              //pcb = cb;
  12:              try
  13:              {
  14:                  pcb = (IntPtr)fs.Read(buffer, 0, cb.ToInt32());
  15:  
  16:                  if (pcb.ToInt32()<=0)
  17:                  {
  18:                      pcb = IntPtr.Zero;
  19:                      result = 1;
  20:                      return (IntPtr)result;
  21:                  }
  22:                  else
  23:                  {
  24:  
  25:                      System.Runtime.InteropServices.Marshal.Copy(buffer, 0, pbBuff, pcb.ToInt32());
  26:                  }
  27:              }
  28:              catch
  29:              {
  30:                  pcb = IntPtr.Zero;
  31:                  result = 1;
  32:                  return (IntPtr)result;
  33:              }
  34:              fs.Close();
  35:              return (IntPtr)result;
  36:          }

Anyway, the gist of all this, is that by setting the compile option to target 32-bit processors in the MarcEdit gui, I’ve been able to solve this issue.  I’m having the user that found the problem verify that I’ve indeed hunted this bug down and squashed it — so as soon as that’s confirmed, I’ll be pushing this fix out with MarcEdit.

–TR