from Hpricot to nokogiri

At one time Hpricot, originally by the famous _why, was the most widely-used post-REXML html/xml parsing solution in ruby.  These days nokogiri has eclipsed it (for various reasons), but I had some old code that was still using Hpricot (and some people just starting out might ‘accidentally’ use Hpricot still; you probably don’t want to, because…)

Until now. Hpricot conflicts with Rails 3.1 — both of them monkey patch a #to_xs method into String, but they have conflicting semantics/arrity. (This goes to show this is probably a bad technique for a gem to use, and we could consider them both at fault, but as the biggest gorilla in the match, I don’t expect Rails to blink and change, esp when hpricot isn’t so popular anymore anyway).

This manifests as a `ArgumentError: wrong number of arguments (1 for 0)` error when you have hpricot gem loaded in Rails 3.1, and you try to use Rails (XML) ‘builder’ or call the #to_xml method on an ActiveRecord. (Hpricot redefined the #to_xs method differently than Rails expected it).

I wouldn’t necessarily call Hpricot ‘abandoned’, at the time of this writing it’s got a regular commit history , mostly nicksieger committing pull requests from outside developers.  However, there aren’t any comments from nicksieger or any other maintainers/committers on the ticket for this particular #to_xs Rails 3.1 conflict issue, filed two months ago.  It appears hpricot no longer has a developer community with the interest/capacity in dealing with this issue, anyhow.

From Hpricot to Nokogiri

So switching from Hpricot to Nokogiri is not too bad, Hpricot  is mostly api-compatible with nokogiri.

One difference that I ran into right away, and you might too, is how they handle xpaths from node objects.

 
<root>
 <parent>
   <child>
     <grandchild>
     </grandchild>
   </child>

   <child>
   </child>
  </parent>

  <parent>
  </parent>
</root>

hp_parent = Hpricot::XML(xml).at("//parent")
nk_parent = Nokogiri::XML(xml).at("//parent")

Now, if we do an xpath on the ruby object representing parent that begins with “/”, Hpricot interprets the xpath in a context where current node is considered root, but nokogiri still interprets the xpath in the context of the entire document, even though you’re calling it on a particular node.

3 : hp_parent.at('/child')
==> <child>

4 : nk_parent.at('/child')
==> nil

5 : hp_parent.at('/root/parent')
==> nil

6 : nk_parent.at('/root/parent')
==> <parent>

They’ll both behave the same, starting from current node, if you leave out the initial ‘/’.

7 : hp_parent.at('child')
==> <child>

8 : nk_parent.at('child')
==> <child>

You might think you could use “./child” too, xpath-style. You can in nokogiri, and it does the same thing. But Hpricot is broken with respect to “./” xpaths, no matter what comes after “./”, Hpricot will return the starting node.

11 : hp_parent.at("./child")
==> <parent>

12 : nk_parent.at("./child")
==> <child>

13 : hp_parent.at("./child/grandchild")
==> <parent>

14 : nk_parent.at("./child/grandchild")
==> <grandchild>

15 : hp_parent.at("./child/wrong")
==> <parent>

16 : nk_parent.at("./child/wrong")
==> nil

Here’s a little test script I used to demonstrate all this, which also demonstrates a cool hacky technique to write such demonstration scripts which output the code and result.

Special Hpricot xpath predicates

Continuing to find things that break when switching from hpricot to nokogiri (there aren’t _exactly_ api compatible), I’ll try to continue documenting em.

Hpricot supports some custom things in Xpath, that nokogiri does not. See “Supported Predicates, but differently” on https://github.com/hpricot/hpricot/wiki/Supported-XPath-Expressions .

For instance, i was using the “:eq(n)” one. For that one at least, nokogiri seems to supports the standard xpath predicate just fine: “[n+1]” (hpricot’s custom one is 0-based, actual xpath is 1-based indexing)

This entry was posted in General. Bookmark the permalink.

2 Responses to from Hpricot to nokogiri

  1. Pingback: loosely coupled components, developer joy, and dependency hell | Bibliographic Wilderness

  2. bhushan says:

    I don’t have hpricot gem but still I am facing worng number of argument(1 for 0) with #to_xml method in Rails 3.1. I am using anemone gem and Ruby 1.8.7. Any insight on this?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s