Ruby and XML

I’ve been doing quite a bit of programming of late in Ruby and I have to say, I’m really disappointed in Ruby’s XML support.  REXML — the built-in Ruby library, well, stinks.  I’ve been trying to use it to parse some simply MarcXML records, and what I’ve finding is that its taking the process ~0.4-0.5 seconds to load the file.  These are small files.  Then processing — use XPATH and you pay dearly, use the more well documented convenience functions, again, you pay dearly. 

Anyway, the point of this was I was looking through some code and trying to figure out why some portions of my rails app was doggy.  From my investigation, I found two doggy point — one the result of a misunderstanding as to the best way to do an operation in ruby — the second, the REXML processor.  So what to do?  Well, I’m going the libxml2 way.  There is a ruby gem that allows you to work directly with the libxml2 library and problem solved.  In testing today, I found that this library was 30-50% faster in nearly all cases, with speed performance increasing the larger the document that was loaded.  I’m hoping that once I integrate the use of libxml into my app and gut the REXML code that I’ll continue to see these speed improvements.

–TR


Posted

in

by

Tags:

Comments

One response to “Ruby and XML”

  1. […] While poring over the REXML API docs, I noticed that the REXML::Element.each_element method’s argument was called ‘xpath’. Terry had written about how dreadfully slow XPath queries were with REXML and, as a result, I thought I was avoiding them. When I removed the path arg from the each_element call in one of my methods and just iterated through each child element to see if its name matched, it cut the processing time in half! So, while 12 seconds was certainly no thoroughbred, it was definitely the right track. When I eliminated every xpath in the recursion process, I got it down to about 5 seconds. Add a touch of fragment caching and the natural performance boost of a production vs. development site in rails, and I think we’ve got a “good enough for now” solution. […]