As a sort of hobby (although it’ll turn into code I’ll use if I ever succeed) I try to write a parser for a google-like query language, even though I know little about writing parsers. It’s surprising but there doesn’t seem to be a ruby one out there already I’ve been able to find. And it ends up being a little bit tricky, if you want to allow everything I think I might want, including for instance field declerations at various points in the grammar.
I never got very far with Treetop, not sure why, it seemed like I should be able to figure it out, but always ran up against a brick wall. I was suspecting that parsing is just hard, and I just didn’t get it. But then…
I am having MUCH more luck with the new Parslet ruby gem. Although it’s a PEG like Treetop, and it’s methods don’t look TOO different than Treetop — I’m finding it SO much easier to work with. Partially that it easily lets you explicitly parse at a sub-rule level, not always at root, which is easier to interactively get your grammar right in an iterative process, and ‘debug’ where it’s going wrong. (In Treetop, I remain confused about WHAT rule is treated as root rule). Partially that it’s much closer to writing plain old ruby (less magic, I understand what’s going on more). Partially that the documentation is pretty darn good. And partially some reasons I don’t totally understand that it just seems to fit my brain better.
If you need to write a parser in ruby, I recommend checking out Parslet, especially if you’ve tried Treetop and not had luck with it. Aside from the issue of me getting further with Parslet than I did with Treetop, I also like what you end up with better — a lot less mystery metaprogramming involved, it just is easier for me to conceptualize what’s going on and know how to even make parameterized grammars and such. Can’t totally explain it.
Anyhow, no, I don’t have that google-like query language parser yet. It’s more of a hobby than something I have confidence will turn into something useful. The end goal is then translating to a Solr query, but as I get closer to having the parser written, I’m thinking some non-trivial optimization of the parse tree will be needed if you want to end up with a non-ridiculous Solr query from certain inputs.
The reason I’m sort of idly trying to do this in the first place is to support a sufficiently sophisticated (arbitrarily complex nested booleans, fielded queries, etc) query language on an app backed by Solr; supporting certain things that neither dismax nor e-dismax quite support. EDismax _almost_ supports what I need (it’s missing certain features on ‘fielded’ queries I want, that are discussed as possible in a Solr ticket but not implemented yet) , and it might make more sense to learn enough Java to add em as options to edismax instead of trying to do it in ruby and transform to Solr query langauge, but like I said, it’s a recreational hobby at this point, and trying to write Java is SO not recreational for me.