Saxon.NET and local file paths with special characters and spaces

I thought I’d post this here in case this can help other folks.  One of the parsers that I like to use is Saxon.Net, but within the .net platform at least, it has problems doing XSLT or XQuery transformations when the files in question have paths with special characters or spaces (or if they reference files via xsl:include statements that live inside paths with special characters or spaces).  The question comes up a lot on the Saxon support site and it sounds like Saxon is actually processing the data correctly.  Saxon is expecting valid URIs, and a URI can’t have a spaces.  Internally, the URI is escaped, but when you process those escaped paths against a local file system, accessing the file will fail.  So, what do I mean — here are two different types of problems I encounter:

  • Path 1: c:\myfile\C#\folder1\test.xsl
  • Path2: c:\myfile\C#\folder 1\test.xsl

When setting up a transformation using Saxon, you setup a XSLTransform.  You can set this up using either a stream, like an XMLReader, or a URI.  But here the problem.  If you create the statement like this:

System.Xml.XmlReader xstream = System.Xml.XmlReader.Create(filepath);
transformer = xsltCompiler.Compile(xstream).Load();

The program can read Path 1, but will always fail on Path 2, and will fail on Path 1 if it includes secondary data.  If rather than using a stream, I use a URI class like:

transformer = xsltCompiler.Compile(new Uri(sXSLT, UriKind.Absolute)).Load();

Both Path’s will break.  On the Saxon list, there was a suggestion to create a sealed class, and to wrap the URI in that class.  So, you’d end up code that looked more like:

transformer = xsltCompiler.Compile(new SaxonUri(new Uri(sXSLT, UriKind.Absolute))).Load();

public sealed class SaxonUri : Uri
    {
        public SaxonUri(Uri wrappedUri)
            : base(GetUriString(wrappedUri), GetUriKind(wrappedUri))
        {
        }
        private static string GetUriString(Uri wrappedUri, bool localuri = false)
        {
            if (wrappedUri == null)
                throw new ArgumentNullException("wrappedUri", "wrappedUri is null.");            
            if (wrappedUri.IsAbsoluteUri) 
                return wrappedUri.AbsoluteUri;
            return wrappedUri.OriginalString;
        }
        private static UriKind GetUriKind(Uri wrappedUri)
        {
            if (wrappedUri == null)
                throw new ArgumentNullException("wrappedUri", "wrappedUri is null.");
            if (wrappedUri.IsAbsoluteUri)
                return UriKind.Absolute;
            return UriKind.Relative;
        }
        public override string ToString()
        {
            if (IsWellFormedOriginalString())
                return OriginalString;
            else if (IsAbsoluteUri)
                return AbsoluteUri;
            return base.ToString();
        }
    }

And this get’s a closer.  Using this syntax, Path 1 doesn’t work, but Path 2 will.  So, you could use an if…then statement to look for spaces in the XSLT file path, and if there are no spaces, open the stream, and if there are, wrap the URI.  Unfortunately, that doesn’t work either — because if you include a reference (like xsl:include) in your XSLT, Path 1 and Path 2 fail, because internally, the BaseURI is set to an escaped version of the URI, and Windows will fail to locate the string.  At which point, you end up feeling like you might be pretty much screwed, but there are still other options but they take more work.  In my case, the solution that I adopted was to create a custom XmlResolver.  This allows me to handle all the URI processing myself, and in the case of the two path statements, I’m interested in handling all local file URIs.  So how does that work:

xsltCompiler.XmlResolver = new CustomeResolver();
transformer = xsltCompiler.Compile(new Uri(sXSLT, UriKind.Absolute)).Load();

internal class CustomeResolver : XmlUrlResolver
    {
        
        public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
        {
            if (absoluteUri.IsFile)
            {
                string filename = absoluteUri.LocalPath;
                if (System.IO.File.Exists(filename)==false) {
                    filename = Uri.UnescapeDataString(filename);
                    if (System.IO.File.Exists(filename)==false)
                    {
                        return (System.IO.Stream)base.GetEntity(absoluteUri, role, ofObjectToReturn);
                    } else
                    {
                        System.IO.Stream myStream = new System.IO.FileStream(filename, System.IO.FileMode.Open);
                        return myStream;
                    }
                } else
                {
                    return (System.IO.Stream)base.GetEntity(absoluteUri, role, ofObjectToReturn);
                }
            }
            else
            {

                return (System.IO.Stream) base.GetEntity(absoluteUri, role, ofObjectToReturn);
            }
        }

By creating your own XmlResolver, you can fix the URI problems and allow Saxon to process both use cases above.

–tr


Posted

in

by

Tags: