parsing in ruby

As a sort of hobby (although it’ll turn into code I’ll use if I ever succeed) I try to write a parser for a google-like query language, even though I know little about writing parsers. It’s surprising but there doesn’t seem to be a ruby one out there already I’ve been able to find. And it ends up being a little bit tricky, if you want to allow everything I think I might want, including for instance field declerations at various points in the grammar.

I never got very far with Treetop, not sure why, it seemed like I should be able to figure it out, but always ran up against a brick wall. I was suspecting that parsing is just hard, and I just didn’t get it. But then…

I am having MUCH more luck with the new Parslet ruby gem.  Although it’s a PEG like Treetop, and it’s methods  don’t look TOO different than Treetop — I’m finding it SO much easier to work with.  Partially that it easily lets you explicitly parse at a sub-rule level, not always at root, which is easier to interactively get your grammar right in an iterative process, and ‘debug’ where it’s going wrong. (In Treetop, I remain confused about WHAT rule is treated as root rule).  Partially that it’s much closer to writing plain old ruby (less magic, I understand what’s going on more).  Partially that the documentation is pretty darn good. And partially some reasons I don’t totally understand that it just seems to fit my brain better.

If you need to write a parser in ruby, I recommend checking out Parslet, especially if you’ve tried Treetop and not had luck with it.  Aside from the issue of me getting further with Parslet than I did with Treetop, I also like what you end up with better — a lot less mystery metaprogramming involved, it just is easier for me to conceptualize what’s going on and know how to even make parameterized grammars and such. Can’t totally explain it.

Anyhow, no, I don’t have that google-like query language parser yet. It’s more of a hobby than something I have confidence will turn into something useful. The end goal is then translating to a Solr query, but as I get closer to having the parser written, I’m thinking some non-trivial optimization of the parse tree will be needed if you want to end up with a non-ridiculous Solr query from certain inputs.

The reason I’m sort of idly trying to do this in the first place is to support a sufficiently sophisticated (arbitrarily complex nested booleans, fielded queries, etc) query language on an app backed by Solr; supporting certain things that neither dismax nor e-dismax quite support.  EDismax _almost_ supports what I need (it’s missing certain features on ‘fielded’ queries I want, that are discussed as possible in a Solr ticket but not implemented yet) , and it might make more sense to learn enough Java to add em as options to edismax instead of trying to do it in ruby and transform to Solr query langauge, but like I said, it’s a recreational hobby at this point, and trying to write Java is SO not recreational for me.

7 thoughts on “parsing in ruby

  1. Have a look at the freeform query parser in XTF, it basically does this. If you can hook it up to solr, let me know!

  2. I have no idea what XTF is and am not sure I want to learn, heh. But thanks. I’m really interested in ruby, if it’s not going to be something built into solr.

  3. just google [xtf freeform]. XTF is written in Java so you could possibly hook it up to solr. At the very least it provides a grammar for a google like query language. I also have a parser that parses a very similar query language into a CQL parse tree which is written in Java.

  4. Cool, not sure that meets my current needs/interests, but good to know. (Like I said, I kind of avoid writing Java unless there’s a really good reason for it).

    There is a cql parser in ruby, with various methods of converting to SQL queries, that I wrote some parts of. It’s used in Blacklight.

  5. Send me an email if you need more info about the parsers. I would like more library systems to support a query language that looks like google’s; it would be great if we could have that in solr.

  6. You pretty much do have it in Solr (not current 1.4 release, but trunk, or install seperately into 1.4 perhaps) with edismax, it just doesn’t quite do what I need for my particular software needs involving mapping from user entered fields to a set of actual Solr fields, instead of having users enter solr implementation fields directly. But it is pretty much ‘a query language that looks like googles’.

    But like I said, right now I am not interested in writing Java code for a solr query parser, I’m interested in a ruby parser (that I will use to then construct a Solr query from the parse tree, yeah).

    I’m not sure we’re talking about the same things here Erik, even though we think we are! We seem to be talking at cross purposes, somehow.

  7. Hm, you are right. edismax does look pretty close to google syntax now. Thanks for the pointer.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s