Oct 162014
 

We hear the refrain over and over – we live in a global community.  Socially, politically, economically – the ubiquity of the internet and free/cheap communications has definitely changed the world that we live in.  For software developers, this shift has definitely been felt as well.  My primary domain tends to focus around software built for the library community, but I’ve participated in a number of open source efforts in other domains as well, and while it is easier than ever to make one’s project/source available to the masses, efforts to localize said projects is still largely overlooked.  And why?  Well, doing internationalization work is hard and often times requires large numbers of volunteers proficient in multiple languages to provide quality translations of content in a wide range of languages.  It also tends to slow down the development process and requires developers to create interfaces and inputs that support language sets that they themselves may not be able to test or validate.   

Options

If your project team doesn’t have the language expertise to provide quality internalization support, you have a variety of options available to you (with the best ones reserved for those with significant funding).  These range of tools available to open source projects like: TranslateWiki (https://translatewiki.net/wiki/Translating:New_project) which provides a platform for volunteers to participate in crowd-sourced translation services.  There are also some very good subscription services like Transifex (https://www.transifex.com/), a subscription service that again, works as both a platform and match-making service between projects and translators.  Additionally, Amazon’s Mechanical Turk can be utilized to provide one off translation services at a fairly low cost.  The main point though, is that services do exist that cover a wide spectrum in terms of cost and quality.   The challenge of course, is that many of the services above require a significant amount of match-making, either on the part of the service or the individuals involved with the project and oftentimes money.  All of this ultimately takes time, sometimes a significant amount of time, making it a difficult cost/benefit analysis of determining which languages one should invest the time and resources to support.

Automated Translation

This is a problem that I’ve been running into a lot lately.  I work on a number of projects where the primary user community hails largely from North America; or, well, the community that I interact with most often are fairly English language centric.  But that’s changing — I’ve seen a rapidly growing international community and increasing calls for localized versions of software or utilities that have traditionally had very niche audiences. 

I’ll use MarcEdit (http://marcedit.reeset.net) as an example.  Over the past 5 years, I’ve seen the number of users working with the program steadily increase, with much of that increase coming from a growing international user community.  Today, 1/3-1/2 of each month’s total application usage comes from outside of North America, a number that I would have never expected when I first started working on the program in 1999.  But things have changed, and finding ways to support these changing demographics are challenging.. 

In thinking about ways to provide better support for localization, one area that I found particularly interesting was the idea of marrying automated language transcription with human intervention.  The idea being that a localized interface could be automatically generated using an automated translation tool to provide a “good enough” translation, that could also serve as the template for human volunteers to correct and improve the work.  This would enable support for a wide range of languages where English really is a barrier but no human volunteer has been secured to provide localized translation; but would enable established communities to have a “good enough” template to use as a jump-off point to improve and speed up the process of human enhanced translation.  Additionally, as interfaces change and are updated, or new services are added, automated processes could generate the initial localization, until a local expert was available to provide the high quality transcription of the new content, to avoid slowing down the development and release process.

This is an idea that I’ve been pursing for a number of months now, and over the past week, have been putting into practice.  Utilizing Microsoft’s Translation Services, I’ve been working on a process to extract all text strings from a C# application and generate localized language files for the content.  Once the files have been generated, I’ve been having the files evaluated by native speakers to comment on quality and usability…and for the most part, the results have been surprising.  While I had no expectation that the translations generated through any automated service would be comparable to human mediated translation, I was pleasantly surprised to hear that the automated data is very often, good enough.  That isn’t to say that it’s without its problems, there are definitely problems.  The bigger question has been, do these problems impede the use of the application or utility.  In most cases, the most glaring issue with the automated translation services has been context.  For example, take the word Score.  Within the context of MarcEdit and library bibliographic description, we know score applies to musical scores, not points scored in a game…context.  The problem is that many languages do make these distinctions with distinct words, and if the translation service cannot determine the context, it tends to default to the most common usage of a term – and in the case of library bibliographic description, that would be often times incorrect.  It’s made for some interesting conversations with volunteers evaluating the automated translations – which can range from very good, to down right comical.  But by a large margin, evaluators have said that while the translations were at times very awkward, they would be “good enough” until someone could provide better a better translation of the content.  And what is more, the service gets enough of the content right, that it could be used as a template to speed the translation process.  And for me, this is kind of what I wanted to hear.

Microsoft’s Translation Services

There really aren’t a lot of options available for good free automated translation services, and I guess that’s for good reason.  It’s hard, and requires both resources and adequate content to learn how to read and output natural language.  I looked hard at the two services that folks would be most familiar with: Google’s Translation API (https://cloud.google.com/translate/) and Microsoft’s translation services (https://datamarket.azure.com/dataset/bing/microsofttranslator).  When I started this project, my intention was to work with Google’s Translation API – I’d used it in the past with some success, but at some point in the past few years, Google seems to have shut down its free API translation services and replace them with a more traditional subscription service model.  Now, the costs for that subscription (which tend to be based on number of characters processed) is certainly quite reasonable, my usage will always be fairly low and a little scattershot making the monthly subscription costs hard to justify.  Microsoft’s translation service is also a subscription based service, but it provides a free tier that supports 2 million characters of through-put a month.  Since that more than meets my needs, I decided to start here. 

The service provides access to a wide range of languages, including Klingon (Qo’noS marcedit qaStaHvIS tlhIngan! nuq laH ‘oH Dunmo’?), which made working with the service kind of fun.  Likewise, the APIs are well-documented, though can be slightly confusing due to shifts in authentication practice to an OAuth Token-based process sometime in the past year or two.  While documentation on the new process can be found, most code samples found online still reference the now defunct key/secret key process.

So how does it work?  Performance-wise, not bad.  In generating 15 language files, it took around 5-8 minutes per file, with each file requiring close to 1600 calls against the server, per file.  As noted above, accuracy varies, especially when doing translations of one word commands that could have multiple meanings depending on context.  It was actually suggested that some of these context problems may actually be able to be overcome by using a language other than English as the source, which is a really interesting idea and one that might be worth investigating in the future. 

Seeing how it works

If you are interested in seeing how this works, you can download a sample program which pulls together code copied or cribbed from the Microsoft documentation (and then cleaned for brevity) as well as code on how to use the service from: https://github.com/reeset/C–Language-Translator.  I’m kicking around the idea of converting the C# code into a ruby gem (which is actually pretty straight forward), so if there is any interest, let me know.

–tr

 Posted by at 6:13 pm
Apr 252012
 

One of the hats I wear is as a member of the Independence Library Board.  I love it because I don’t work with public libraries as often as I’d like to in my real job, and honestly, the Independence Public Library is the center of the community.  The Library is a center for adults looking for education opportunities, kids looking for resources, and home to a number of talented librarians that are dedicated to encouraging a love of reading to our community.  It’s one of the few libraries I’ve ever known to have both a children’s and adult reading programs and takes advantage of that in the summer – by having the adults and kids compete against each other to see who logs the most pages (the kids always win). 

Each board meeting is interesting, because as the economy became more difficult for people, more people turned to the library.  Every month, the library sees more circulations, more bodies in the building, more kids, more adults – just more.  And they do it on a budget that doesn’t accurately reflect the impact that they have on the community. 

Anyway, one of the things that the Library has going for it is a very active friends program – and through that group (and some grant funds), the library was able to purchase a number of Laptop computers for circulation within the Library.  The Library currently has some, 8-10 terminals that are always being used and the laptops would provide additional seats, and allow people to work anywhere within the library using the wifi.

The Library setup the laptops using the usual software – DeepFreeze, etc. to provide a fairly locked down environment.  However, what was missing was a customizable timer on the machines.  Essentially, the staff was looking for a way to make it easier for patrons checking out the laptops to avoid fines.  The Laptops circulate for a finite period of time within the building.  Once that time is over, the clock starts ticking for fines.  To avoid confusion, and help make it easier for patrons to know when the clock was running out – I’d offered to work on building a simplified timer/kiosk program. 

The impetus for this work comes from Access 2007 I think.  I had attended the hackfest before the conference and one of the project ideas was an open source timing program.  I had worked on and developed a proof of concept that I passed on.  And while I never worked on the code since – I kept a copy myself.  When we were talking about things that would be helpful, I was reminded of this work. 

Now, unfortunately, I couldn’t use much of the old project at all.  The needs were slightly different – but it helped me have a place to start so that I wasn’t just looking at a blank screen.  So, with idea in hand, I decided to see how much time it would take to whip together an application that could meet the needs. 

I’ll admit, nights like tonight make me happy that I still do more than write code in scripting languages like python and ruby.  Taking about 3 hours, I put together a feature complete application that meets our specific needs.  I’ll be at the Oregon Library Association meeting this week, and if folks find this kind of work interesting, I’ll make it a bit more generic and post the source for anyone that wants to tinker with it.

So what does it do?  It’s pretty simple.  Basically, it’s an application that keeps time for the user and provides some built-in kiosk functionality to prevent the application was being disabled. 

Here are a few of the screen shots:

image
When the program is running, you see the clock situated in the task tray

image
Click on the icon, and see the program menu

Preferences – password protected

image

image

image
Because we have a large Hispanic population, all the strings will need be able to be translated.  This was essentially is just the locked message.  I’ll ensure the others are customizable as well – maybe with an option to just use Google Translate (even though it far, far from perfect) if a need to just get the gist across is the most important.

image
Run an action (both functions require a password)

image
Place your cursor over the icon to get the minutes

image
Information box letting you know you are running out of time

image
Sample lockout screen

In order to run any of the functions, you must authenticate yourself.  In order to disable the lockout screen, you must authenticate yourself.  What’s more, while the program is running, it creates a low-level keyboard hook to capture and pre-process all keystrokes, disabling things like escape keys, the windows key, ctrl+alt+del so that once this screen comes up – a user can not break out of it without shutting off the computer (which would result in needing to log in).  Coupled with DeepFreeze and some group policy settings, my guess is that this will suffice.

The source code itself is a few thousand lines of code, with maybe a 1000 or 1500 lines of actual business logic and the remainder around the UI/threading components.  Short and simple.

Hopefully, I’ll get a chance to do a little testing and get some feedback later this week – but for now – I’m just happy that maybe I can give a little bit back to the community library that gives so much to my family.  And if I hear from anyone that this might be of interest outside my library – I’ll certainly post the code up to github.

–TR

 Posted by at 12:48 am
Mar 082011
 

When working with the 64-bit flavor of Windows, there are a couple of quirks that you just need to accept.  First, when Microsoft designed windows for 64 bit processing, they weren’t going to break legacy applications and second, how this gets done makes absolutely no sense unless you simply have faith that the Operating System will take care of it all for you.

So why doesn’t it make sense?  Well, Windows 64-bit systems are essentially operating systems with two minds.  One the one hand, you have the system designed to run 64-bit processes, but lives within an ecosystem where nearly all windows applications are still designed for 32-bit systems.  So what is an OS designer to do?  Well, you have both versions of the OS running, obviously.  Really, it’s not that simple (or complex) – in reality, Microsoft has a 32 bit emulator that allows all 32 bit applications to run on 64-bit versions of Windows.  However, this is where it gets complicated.  Since 64 bit processes cannot access 32 bit processes and vise versa – you run into a scenerio where Windows must mirror a number of systems components for both 32 and 64 bit processes.  They do this two ways:

  1. In the registry – Microsoft has what I like to think of as a shadow registry that exists for WOW64 bit processes (that’s the 32 bit emulator if you can’t tell) – to register COM objects and components accessible to 32-bit applications.
  2. The presence of a system32 (this is ironically where the 64bit libraries live) and a SysWow64 (this is where the 32 bit libraries live) system folders which replicate large portions of the Windows API and provide system components for 32-bit and 64 bit processes.

So how does this all work together?  Well, Microsoft makes it work through redirection.  Quietly, transparently – the Windows operating system will access the applicable part of the system based on the process type (being either 64 or 32 bit).  The problem arises when one is running a 32-bit process, but want to have access to a dedicated 64 bit folder or registry key.  Because redirection happens at the system level, programming tools, api access, etc. all are redirected to the appropriate folder for the process type.

Presently, I’m in the process of rebuilding MarcEdit, and one of the requirements is that it run natively in either 32 or 64 bit mode.  The problem is that there are a large cohort of users that are working with MarcEdit on 64 bit systems, but have it installed as a 32 bit app.  So, I’ve been tweaking an Installer class that will evaluate the user’s environment (if 32 bit process running in a 64 bit OS) and move the appropriate 64 bit files into their correct locations so that when they run the program for the first time, the C# code compiled to run on Any CPU (so, if it’s a 64 bit OS, it will run natively as a 64 bit app) – the necessary components will be available.

So how do we do this?  Actually, it’s not all that hard.  Microsoft provides two important API for the task: Wow64DisableWow64FsRedirection and Wow64RevertWow64FsRedirection.  Using these two functions, you can temporarily disable redirection within your application.  However, you want to make sure you re-enable when your required access of these components is over (or apparently, the world as we know it will come to and end).   So, here’s my simple example of how this might work:

#region API_To_Disable_File_Redirection_On_64bit_Machines
public static string gslash = System.IO.Path.DirectorySeparatorChar.ToString();
[DllImport("kernel32.dll", SetLastError = true)]
private static extern bool Wow64DisableWow64FsRedirection(ref IntPtr ptr);
[DllImport("kernel32.dll", SetLastError = true)]
private static extern bool Wow64RevertWow64FsRedirection(IntPtr ptr);
#endregion
#region Fix 64bit 
if (System.Environment.Is64BitProcess == false && System.Environment.Is64BitOperatingSystem == true)
{
//This needs to have some data elements moved from the 64bit folder to the System32 directory
try
{
string[] files = System.IO.Directory.GetFiles(AppPath() + "64bit" + gslash);
string windir = Environment.ExpandEnvironmentVariables("%windir%");
string system32dir = Path.Combine(windir, "System32");
if (system32dir.EndsWith(gslash) == false) {
system32dir += gslash;
}

//We need to run off the redirection that happens on 64 bit systems.
IntPtr ptr = new IntPtr();
bool isWow64FsRedirectionDisabled = Wow64DisableWow64FsRedirection(ref ptr);
foreach (string f in files)
{
try
{
string filename = System.IO.Path.GetFileName(f);
if (System.IO.File.Exists(system32dir + filename) == false)
{
    System.IO.File.Copy(AppPath() + "64bit" + gslash + filename, system32dir + filename);
}
}
catch (System.Exception yyy){//System.Windows.Forms.MessageBox.Show(yyy.ToString()); 
}
}
//We need to run the redirection back on
if (isWow64FsRedirectionDisabled)
{
bool isWow64FsRedirectionReverted = Wow64RevertWow64FsRedirection(ptr);
}

}
catch
{}
}
#endregion

 

And that’s it.  Pretty straightforward.  As noted above, the big thing to remember is to Reenable the Redirection.  If you don’t, lord knows what will happen.

 

–TR

 Posted by at 6:04 pm
Mar 022011
 

One of the questions that consistently comes up with the advent of Windows Vista and Windows 7’s use of the UAC is how to run applications or processes without being prompted for a username/password.  There are a number of places online that talk about how to use the C# classes + LogUser API to impersonate a user.  In fact, there is a very nice class written here that makes it quite easy.   However, these options don’t work for everything – and one specific use case is during installation.  So, for example, you’ve written a program and you want to provide automatic updating through the application.  To do the update, you’d like to simply shell to an MSI.  Well, there is a simple way to do this.  Using the System.Diagnostics assembly you can initiate a new process.  So, for example:

System.Security.SecureString password = 
new System.Security.SecureString(); string unsecured_pass = "your-password"; for (int x = 0; x < unsecured_pass.Length; x++) { password.AppendChar(unsecured_pass[x]); } password.MakeReadOnly(); StringBuilder shortPath = new StringBuilder(255); GetShortPathName(@?c:\your file name.msi?, shortPath,
shortPath.Capacity); System.Diagnostics.ProcessStartInfo myProcess =
new System.Diagnostics.ProcessStartInfo(); myProcess.Domain = "my-domain"; myProcess.UserName = "my-admin"; myProcess.Password = password; myProcess.FileName = "msiexec"; myProcess.Arguments = "/i " + shortPath.ToString(); myProcess.UseShellExecute = false; System.Diagnostics.Process.Start(myProcess);

So, using the above code, a program running as a standard user, could initiate a call to an MSI file and run it as though you were an administrator.  So, why does this work?  First, in order for this type of elevation to work, you need to make sure that UseShellExecute is set to fall.  However, if you set it to fall, you cannot execute any process that isn’t an executable (and a .msi isn’t an executable).  So we solve this by calling the msiexec program instead.  When you run an msi installer, this is the parent application that gets called.  So, we call it directly.  Now, since we are calling an executable, we can attach a username/password/domain to the process we are about to spawn.  This allows use to start the program as if we were another user.  And finally, when setting the path to the MSI file we want to run, we need to remember that we have to call that file using the Windows ShortFile Naming convention.  There is an API that will allow you to accomplish this. 

If you have this setup correctly – when the user runs an MSI file, they will not be prompted for a username or password – however, they will still see a UAC prompt asking to continue with the installation.  It looks like this (sorry, had to take a picture):

image

So, while the user no longer has to enter credentials to run the installer, they do have to still acknowledge that they wish to run the process.  But that seems like a livable trade off.

–TR

 Posted by at 2:05 pm
Jan 022008
 

I’ve been thinking a little bit about some of the things that I use MarcEdit for and have been pushing some of this work off my desk to some of the staff in our technical services department.  We actually use MarcEdit quite a bit when it comes to sharing metadata from our Dspace instance with other systems, like OCLC’s WorldCat and our online Catalog.  For example, we use MarcEdit to automatically generate MARC21 records for our theses submitted through Dspace.  The process seems to work fairly well, and has been very easy for our staff to learn.  Should write an article documenting this process and how its working at OSU at some point. 

To that end, I’m writing a plug-in for MarcEdit that may enable me to mainstream the processing of web page archiving in Dspace.  At this point, the process is a bit too manual for my tastes.  Along with spidering a site (using whatever the chosen depth may be), there is this pesky manual step of flattening the site and making the urls relative.  Not a big deal (unless there are file name collisions [which there always are] when reading depths), but it takes time.  So, I spent some time this afternoon and wrote a threaded web crawler.  Seems to work well.  At this point, I just need to add the logic to flatten all paths, and come up with a naming schema to re-write all urls to provide unique file names.  Once I get that down, building the batch import package for Dspace should be fairly trivial.  Not sure how much time I’ll have to work on this over the week/weekend, but would be a pretty cool project to finish I think.  It would certainly allow the library to provide site archiving as a dspace option (at this point, its only done under very special circumstances) and should simplify the process enough to the point that it could probably become a mainstream process. 

Anyway, if I do get a chance to get this finished, I’ll certainly make it available as a plug-in (with source).  Of course, if someone has already developed a simplified process that requires no manual processing after harvest, I would love to hear it.

–TR

Technorati Tags: ,,
 Posted by at 9:38 pm
Dec 262007
 

I took a couple of minutes and made a few changes to MarcEdit and the OCLC plug-in to provide some additional functionality to the plug-in framework and fix an error in the OCLC plug-in. 

Changes:

MarcEdit:

  • One real change.  In the MacroInterfaces.dll (the library that allows the Scripting interface and the Plug-in interface access to the MarcEditor and its functions) I’ve added two new functions: AddButton and RemoveButton.  These two functions allow users to have plug-ins place buttons on the toolbar of the application (at least, on specific windows). 

OCLC Plug-in:

  • I’ve added the code to initiate a button and interact when it is clicked to the MarcEditor toolbar.
  • I’ve corrected the 007 error.  See: http://blog.reeset.net/archives/480

So what will you see with the changes.  Well, the big change you will see is when you initialize the plug-in in the MarcEditor.  Once you have downloaded the update to both MarcEdit and the Plug-in (you need both), you will see the following when the plug-in is executed:

image

The new button added has been highlighted.  This button now acts as the new Save button when you have made your changes to the OCLC data records.  This will move the data back into the OCLC Save File.  Remember — at this point, you will want to make a backup of your Save Files before you make your changes — just in case there are other fields in OCLC’s XML format that are different than I would have expected. 

If you have downloaded the OCLC Plug-in and would like to update it.  At this point, the process isn’t as streamlined as I’d like (I’ll fix that this week while I’m taking some time off to recharge).  Essentially, you need to Uninstall the plug-in (using the plug-in manager or, delete the oclc_helper.dll from the marcedit plug-ins directory (generally, c:\program files\marcedit 5.0\plugins\).  If you uninstall with the plug-in manager, you will need to close and restart MarcEdit.  Then open the plug-in manager and download the new plug-in.  If you delete the library directly from the plug-ins directory — just open MarcEdit, select the plug-in manager and download the plug-in.

I’ve updated a new version of MarcEdit: MarcEdit51_Setup.exe.  I’ve also uploaded a new version of the plug-in (download this through plug-in manager [see above]).  Source can be downloaded from: oclc_helper.zip.

If you have a strong desire to see how this type of interaction is accomplished in C#, please see the following post and sample project file: http://blog.reeset.net/archives/481.

 

–TR

 Posted by at 12:41 am
Dec 242007
 

Example Project Source: PluginProject.zip

Because I’ve been doing a lot of work with MarcEdit and plug-ins, I thought I’d post some sample code for anyone interested in how this might work.  Essentially, the sample project includes 3 parts — a host application, a set of Interfaces and a Shared library.  Making this work requires a couple of important parts. 

First, the host application (either the form or class), need to implement the set of interfaces.  So for example, if interaction with a form in the hosted application was need, you would configure the form to implement a set of interfaces.  This would look like:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;

namespace HostApp
{
public partial class Form1 : Form, HostInterfaces.IHost
{
//...
}

This implements the IHost class (link to msdn) — a generic class that allows you to

pass objects between dynamically loaded libraries.  .NET includes a IScript interface that allows for scripting functionality as well. 

Anyway, the interfaces are simply like delegates — they define the visible functions/methods that will be accessible to a foreign assembly.  This is the simpliest file to create.  It looks something like this:


using System;
using System.Collections.Generic;
using System.Text;

namespace HostInterfaces
{
public interface IHost
{
System.Windows.Forms.Label label { get;}
System.Windows.Forms.ToolStripButton AddButton(string caption);
void RemoveButton(System.Windows.Forms.ToolStripButton t);

}
}

Finally, the Dynamic assembly has the ability to work with any function/object within the host application that has been made public through the interface.  For this sample project, I’ve shown how to modify a label (on the host application), add a button to a toolbar and respond to click events from that button. 

The project is a simple one — but should go a long way towards showing how this works.

Cheers,

–TR

Technorati Tags: ,,

 Posted by at 8:57 pm
Dec 212007
 

As I’d noted previously (http://blog.reeset.net/archives/479), some early testers had found that the Connexion plug-in that I’d written for MarcEdit stripped the 007.  I couldn’t originally figure out why — it’s just a control field and their syntax for control fields is pretty straightforward.  However, after looking at a few records with 007 records, I could see why.  In Connexion, OCLC lets folks code the 007 using delimiters like a normal variable MARC field (when its not) — and they save it as such — using delimiters.  For example:

<v007 i2=" " i1=" " im="0">
  <sa>
    <d>s</d>
  </sa>
  <sb>
    <d>d</d>
  </sb>
  <sd>
    <d>f</d>
  </sd>
  <se>
    <d>s</d>
  </se>
  <sf>
    <d>n</d>
  </sf>
  <sg>
    <d>g</d>
  </sg>
  <sh>
    <d>n</d>
  </sh>
  <si>
    <d>n</d>
  </si>
  <sj>
    <d>z</d>
  </sj>
  <sk>
    <d>u</d>
  </sk>
  <sl>
    <d>u</d>
  </sl>
  <sm>
    <d>u</d>
  </sm>
  <sn>
    <d>d</d>
  </sn>
</v007>

I’ll admit — I have no idea why they went with this format.  From my perspective, its clunky.  The 007, as a single control field, is fairly easy to parse as it can have up to 13 bytes, with number of bytes specified 0 byte of the data element.  In this format, you actually have to create 9 different templates for the different possibilities in order to account for different field lengths, byte combinations and delimiter settings.  Honestly, my first impression when looking at this was that its a perfect example of how something so simple can become much more difficult than need be.  Personally, I would have been happier had they broke from their MARCXML like syntax for this one field to create an special 007 element.  Again, this is something that could have been easily abstracted in the XSLT translation — but to be fair, I don’t think that they figured anyone but OCLC’s connexion team would ever be trying to work with this. 

So how I’m solving it?  Well, one of the cool things working with XSLT (and .NET in general) is the ability to use extensions to help fill in missing functionality in the XSLT language (in my case, the ms:script extension in the msxml library).  Since this transformation isn’t one that I’m really sharing (outside the plug-in), I’m not too worried about its portability.  So, what I’ve done is created a number of helper C# functions and embedded them within the xslt document to aid processing.  For example,

<xsl:stylesheet version="1.0"
xmlns:marc="http://www.loc.gov/MARC21/slim"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ms="urn:schemas-microsoft-com:xslt"
 xmlns:osu="urn:oregonstate-edu:xslt"
 extension-element-prefixes="osu">
  <xsl:output method="xml" indent="yes" />
  <ms:script language="C#" implements-prefix="osu">
    <![CDATA[
        public int length(string s) {
          s = s.ToLower();
          if (s=="c") {
             return 14;
          } else if (s=="d") { return 6;}
          else if (s=="a") { return 8;}
          else if (s=="h") { return 13;}
          else if (s=="m") { return 10;}
          else if (s=="k") { return 6;}
          else if (s=="g") { return 9;}
          else if (s=="r") { return 11;}
          else if (s=="s") { return 14;}
          else if (s=="f") { return 10;}
          else if (s=="v") { return 9;}
          else { return 8;}
        }
      ]]>
  </ms:script>
 

This is a simple function that I’m using to track the number of elements needed for the processing template.  This is because I don’t want to create 9 different XSLT templates for each processing type, so I’m using some embedded C# to simplify the process.  On the plus side, using these embedded scripts make the translation process much faster on the .NET side (since .NET compiles xslt to byte code anyway before running any translation process), and this is a technique that I’ve never really had to use before so I was able to get a little practical experience.  Still don’t like it though.

–TR

 Posted by at 10:35 am
Nov 192007
 

While working on a plugin manager for a program written in C#, I found myself with a need to be able to load and unload assemblies dynamically be an application.  In C#, loading assemblies is a fairly easy prospect — one just needs to make use of the System.Reflection class.  Something like the following:

System.Reflection.Assembly assembly = System.Reflection.Assembly.LoadFile(@"c:\yourassembly.dll");

However, if you need to unload the assembly — good luck.  The .NET assembly class doesn’t include an unload method.  If you have a need to be able to dynamically load an unload assemblies, you need to work with the AppDomain class.  The .NET framework works on an Application Domain model, so for items like plugins (where you may need to load, unload or modify an assembly), you need to create an Application Domain manager to load assemblies onto.  This way, when you need to unload an assembly, you use the Unload method found within the AppDomain class. 

Of course, when dealing with plugins, you likely will need to create a new application domain for each plugin to be loaded.  This is because the you unload the appdomain, not the assemblies attached to the domains.  So for my project, I decided to create something much like the TempFileCollection.  In a global class, I decided to create a hash that stories a domain name and the domain object.  Using this method, I can do something like the following:

   1:  string path = cglobal.mglobal.AppPath() + "plugins" + System.IO.Path.DirectorySeparatorChar;
   2:              string[] files = System.IO.Directory.GetFiles(path);
   3:   
   4:              lstInstalled.Items.Clear();
   5:              foreach (string f in files)
   6:              {
   7:                  try
   8:                  {
   9:                      System.AppDomain domain = System.AppDomain.CreateDomain(System.IO.Path.GetFileName(f));
  10:                      System.IO.StreamReader reader = new System.IO.StreamReader(f, System.Text.Encoding.GetEncoding(1252), false);
  11:   
  12:                      byte[] b = new byte[reader.BaseStream.Length];
  13:                      reader.BaseStream.Read(b, 0, System.Convert.ToInt32(reader.BaseStream.Length));
  14:   
  15:                      domain.Load(b);
  16:                      System.Reflection.Assembly[] a = domain.GetAssemblies();
  17:                      int index = 0;
  18:   
  19:                      
  20:                      
  21:   
  22:                      for (int x = 0; x < a.Length; x++)
  23:                      {
  24:                          if (a[x].GetName().Name + ".dll" == System.IO.Path.GetFileName(f))
  25:                          {
  26:                              index = x;
  27:                              break;
  28:                          }
  29:                      }
  30:   
  31:                      System.Windows.Forms.ListViewItem item = new ListViewItem();
  32:   
  33:                      item.Text = a[index].GetName().Name + ".dll";
  34:                      item.SubItems.Add(a[index].GetName().Version.ToString());
  35:                      item.SubItems.Add(reader.BaseStream.Length.ToString());
  36:                      lstInstalled.Items.Add(item);
  37:                      reader.Close();
  38:                      cglobal.mglobal.domains.Add(System.IO.Path.GetFileName(f), domain);
  39:                      
  40:                  }
  41:                  catch { }
  42:              }
 

Then, if we need to unload the assembly, we can unload the domain that its attached to.  Something like:

   1:  for (int x = 0; x < lstInstalled.Items.Count; x++)
   2:              {
   3:                  if (lstInstalled.Items[x].Selected == true) {
   4:                      try {
   5:                          if (System.IO.File.Exists(cglobal.mglobal.AppPath() + "plugins" + System.IO.Path.DirectorySeparatorChar + lstInstalled.Items[x].Text)) {
   6:                              System.AppDomain.Unload((System.AppDomain)cglobal.mglobal.domains[lstInstalled.Items[x].Text]);
   7:                              cglobal.mglobal.domains.Remove(lstInstalled.Items[x].Text);
   8:                              System.IO.File.Delete(cglobal.mglobal.AppPath() + "plugins" + System.IO.Path.DirectorySeparatorChar + lstInstalled.Items[x].Text);
   9:                          }
  10:                      }
  11:                      catch {}
  12:                  }
  13:              }

Seems a little more involved that it has to be, but once you know how it works, its not that big of a deal.

–TR

 Posted by at 1:15 am
Aug 012007
 

I’m posting this in hopes that it will save someone else a lot of time or someone that knows .NET a bit better than I can provide a better solution. 

Problem:

Last week, I had someone ping me regarding MarcEdit and a problem that they were running into with the Editor running it on a 64-bit version of Windows 2003 Server.  MarcEdit is compiled for any processor, so in theory, the framework should adjust the variable types to the current CPU type and go on it’s merry way.  And was it not that I have to work with some unmanaged code within my application, I’m sure that this would be the case.  However, when opening the MarcEditor, the user was getting the following error message:

This is odd because I test MarcEdit on every version of Windows from 98 to Vista.  The problem however, is I’ve never ran the program in a 64-bit version of Windows. 

Background:

I did a little bit of research, and found what I thought to be the problem.  The 64-bit version of windows shares many of the same signatures as its 32-bit counter-part, but one place where the signatures differ is in the Messaging Queue.  SendMessage, for example, which uses integers to pass values between processes had been updated to 64 bit integers and would crash if the wrong data type is sent into the function.  No problem, I fixed the signature issue, but the error message remained.  What I didn’t realize is that this wasn’t the actual problem (though it was a problem).  The real problem seemed to be related to simply accessing the RichTextbox Handle and passing it the callback.  Anytime the Handle was touched and passed, this error would be generated.

Solution:

So, Microsoft does make the Enterprise version of Windows 2003 Server available on a trial basis for developers wanting to test their software.  So, I dug up a box with an AMD-64 bit processor and set to installing the software.  Next, I installed SharpDevelop, an Open Source IDE for .NET.  I created a small sample program to isolate the code that was causing me problems.  In my case, the code that was causing the problem is necessary because of MARC being a UTF8 encoded data format.  Microsoft’s Richtext library supports the loading of plaintext (ASCII), Unicode text, text with OLE objects and text in just about any character format, including UTF8.  Unfortunately, the .NET framework only exposes plaintext and Unicode text as supported formats.  This means that in order to load UTF8 data and utilize the components streaming nature to minimize the memory footprint during loading, we need to essentially write our own EditStreamCallback function, create the delegates, the EDITSTREAM struct, etc.  And in that, there is the rub.  When compiling the code in SharpDevelop, I specified that the code should be targeted specifically for a 64-bit processor.  During compile, I got two warning messages that two core .NET components are compiled specifically for 32-bit processors.  Since the signatures on the 64 and 32 bit machines are identical, one can generally ignore these compilation warnings, as the framework does it’s magic.  However, the fact that I’m utilizing functionality from one of these two components within an unmanaged code block causes the problem.  Within the .NET (and 64-bit environment in general), an 64-bit process cannot load a library compiled for a 32-bit process.  A 32-bit process can run within a 64-bit environment, they just cannot share processes between themselves.  My best guess is that this is what was happening.  Since these two .NET components were compiled specifically for the 32-bit processors, my attempts to load them into a 64-bit process and utilize them within an unmanaged code block caused issues.  The solution is a simply one — for the GUI application of MarcEdit (which doesn’t do much anyway), the program simply needs to be complied to target 32-bit processors.  Now it runs just fine within a 64-bit environment, and will remain so until Microsoft cleans up these two core libraries.  With that said, if anyone has a better way of dealing with this problem (code is attached, so if you can make it work, I’d love to here from you), I’d love to hear about it.

RichText Code:

Finally, it’s pretty difficult to find example code dealing with the Richtext components in C#.  I think this is primarily because most folks that use high level languages like C# either don’t have a need for it or don’t have the background in C++ to understand what is actually happening at the Proc level.  Anyway, to that end, I’m posting the source to my small sample program (get it here) that I used to diagnosis this problem.  The trick to doing this type of interaction is to avoid the use of integer class variables.  In .NET, you have to remember that you are dealing with managed code, so when you make the call to a API like SendMessage, you should be Marshalling all your data, and passing it into the function via the IntPtr structure.  The only exception to that with the SendMessage API is the message argument, which microsoft defines and an unsigned 32-bit integer on all platforms, though for practical purposes, the message argument should be classed as a 32-bit integer.

API/Delegate Declarations

   1:  private const int SF_USECODEPAGE = 0x020;
   2:          private const int SF_TEXT = 0x001;
   3:          private const int SF_RTF = 0x002;
   4:          private const int CP_UTF8 = 65001;
   5:   
   6:          private const int WM_SETREDRAW      = 0x000B;
   7:   
   8:          private const int WM_USER = 0x400;
   9:          private const int EM_STREAMIN = WM_USER + 73;
  10:          private const int EM_GETEVENTMASK   = (WM_USER + 59);
  11:          private const int EM_SETEVENTMASK   = (WM_USER + 69);
  12:          private const int EM_STREAMOUT = WM_USER + 74;
  13:          private const int ENM_NONE =    0;
  14:          private const int EM_SETTEXTMODE        = WM_USER + 89;
  15:   
  16:          private const int TM_PLAINTEXT       = 1;
  17:   
  18:          private const int ECO_AUTOWORDSELECTION = 0x00000001;
  19:          private const int ECO_AUTOVSCROLL = 0x00000040;
  20:          private const int ECO_AUTOHSCROLL = 0x00000080;
  21:          private const int ECO_NOHIDESEL = 0x00000100;
  22:          private const int ECO_READONLY = 0x00000800;
  23:          private const int ECO_WANTRETURN = 0x00001000;
  24:          private const int ECO_SAVESEL = 0x00008000;
  25:          private const int ECO_SELECTIONBAR = 0x01000000;
  26:          private const int ECO_VERTICAL = 0x00400000;
  27:          private const int ECOOP_SET = 0x0001;
  28:          private const int ECOOP_OR = 0x0002;
  29:          private const int ECOOP_AND = 0x0003;
  30:          private const int ECOOP_XOR = 0x0004;
  31:  
  32:          private const int EM_SETOPTIONS = (WM_USER + 77);
  33:          private const int EM_GETOPTIONS = (WM_USER + 78);
  34:   
  35:   
  36:          delegate IntPtr EditStreamCallback(IntPtr dwCookie, IntPtr pbBuff, IntPtr
  37:              cb, out IntPtr pcb);
  38:   
  39:  
  40:          struct EDITSTREAM
  41:          {
  42:              public IntPtr dwCookie;
  43:              public IntPtr dwError;
  44:              public EditStreamCallback pfnCallback;
  45:          }
  46:   
  47:  
  48:   
  49:          [DllImport("user32.dll", CharSet = CharSet.Auto, SetLastError = false)]
  50:          static extern IntPtr SendMessage(HandleRef hWnd, Int32 Msg,
  51:                                          IntPtr wParam, IntPtr lParam);
  52:  
  53:          [DllImport("user32.dll", CharSet = CharSet.Auto, SetLastError = false)]
  54:          static extern IntPtr SendMessage(HandleRef hwnd, Int32 msg, IntPtr
  55:              wParam,    ref EDITSTREAM lParam);

In the declarations, you will see that two forms of SendMessage have been defined.  One where the lParam references the EDITSTREAM structure and on where it references an IntPtr structure.  The former is used when streaming data into the RichText window, the latter is used when sending regular messages between controls.  It should be noted, the later could be removed in .NET 2.0 by making use of the System.Windows.Forms.Message class, which essentially allows you to send messages to controls so long as all arguments can be sent as IntPtrs.

After the declarations, the remainder of the code is setting up the actual streaming, and creating the function that the delegate prototypes.  In this example, I’ve called the streaming function, ReadRichTextStream and the actual streaming function, StreamIn.  These functions would look like the following:

ReadRichTextStream: Accepts a RichTextBox Object and the filename of the file to load.

   1:          private void ReadRichTextStream(System.Windows.Forms.RichTextBox objRich,
   2:              string sfilename)
   3:          {
   4:  
   5:              string filename = sfilename.ToLower();
   6:              objRich.Text = "";
   7:              int eType = SF_TEXT;
   8:              if (filename.EndsWith(".mrk")|filename.EndsWith(".mrk8")|filename.EndsWith(".tmp")|filename.EndsWith(".xml"))
   9:              {
  10:                  eType = (((CP_UTF8)<<16)|SF_USECODEPAGE|SF_TEXT);
  11:              }
  12:              else if (filename.EndsWith(".bmrk"))
  13:              {
  14:                  eType = SF_TEXT;
  15:              }
  16:              else if (filename.EndsWith(".rtf"))
  17:              {
  18:                  eType = SF_RTF;
  19:              }
  20:              else if (filename.EndsWith(".txt"))
  21:              {
  22:                  eType = SF_TEXT;
  23:              }
  24:              else
  25:              {
  26:                  eType = (((CP_UTF8)<<16)|SF_USECODEPAGE|SF_TEXT);
  27:              }
  28:   
  29:              //this.Redraw = false;
  30:              long b_length = 0;
  31:              System.IO.FileStream fs = new System.IO.FileStream(sfilename, System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read);
  32:              b_length = fs.Length;
  33:              Application.DoEvents();
  34:              System.Runtime.InteropServices.GCHandle gch = System.Runtime.InteropServices.GCHandle.Alloc(fs, System.Runtime.InteropServices.GCHandleType.Normal);
  35:              EDITSTREAM es = new EDITSTREAM();
  36:              es.dwCookie = (IntPtr)gch;
  37:              EditStreamCallback callback = new EditStreamCallback(StreamIn);
  38:              es.pfnCallback = callback
  39:  
  40:              SendMessage(new HandleRef(objRich, objRich.Handle), (Int32)EM_STREAMIN, (IntPtr)eType, ref es);
  41:  
  42:              //Remember to free allocated memory to avoid leaks.
  43:              gch.Free();
  44:              fs.Close();
  45:  
  46:  
  47:          }

StreamIn: StreamIn is the function that actually reads the data from the file and pushs the data into the RichTextBox callback to print into the control.

   1:          public IntPtr StreamIn(IntPtr dwCookie, IntPtr pbBuff, IntPtr
   2:              cb, out IntPtr pcb)
   3:          {
   4:              byte[] buffer = new byte[cb.ToInt32()];
   5:              uint result = 0;
   6:   
   7:  
   8:  
   9:  
  10:              System.IO.FileStream fs = (System.IO.FileStream)((GCHandle)dwCookie).Target;
  11:              //pcb = cb;
  12:              try
  13:              {
  14:                  pcb = (IntPtr)fs.Read(buffer, 0, cb.ToInt32());
  15:  
  16:                  if (pcb.ToInt32()<=0)
  17:                  {
  18:                      pcb = IntPtr.Zero;
  19:                      result = 1;
  20:                      return (IntPtr)result;
  21:                  }
  22:                  else
  23:                  {
  24:  
  25:                      System.Runtime.InteropServices.Marshal.Copy(buffer, 0, pbBuff, pcb.ToInt32());
  26:                  }
  27:              }
  28:              catch
  29:              {
  30:                  pcb = IntPtr.Zero;
  31:                  result = 1;
  32:                  return (IntPtr)result;
  33:              }
  34:              fs.Close();
  35:              return (IntPtr)result;
  36:          }

Anyway, the gist of all this, is that by setting the compile option to target 32-bit processors in the MarcEdit gui, I’ve been able to solve this issue.  I’m having the user that found the problem verify that I’ve indeed hunted this bug down and squashed it — so as soon as that’s confirmed, I’ll be pushing this fix out with MarcEdit.

–TR

 Posted by at 12:40 am