Mostly as a note to myself, but share it in case it makes any sense to anyone else.
In the back of my mind, I’m continually thinking of how to implement a traditional opac ‘browse search’ in solr. Solr isn’t really quite designed for this. Mostly the back of my mind has been trying to figure out how to do this with the solr features already there.
But late tonight now, I figured, eh, maybe I understand Solr enough to try and dive into the solr code, and get the back of my mind thinking about how to actually hack the feature into solr directly.
Traditional browse search let’s you a ‘start with’ query on a list of “headings”. Those ‘headings’ generally end up as facet values in most people’s solr implementation.
Ideally, it would improve upon traditional browse search, in letting you do a browse search with “filters”, ie searching through the headings only including headings attached to bibs that have been filtered (bibs in a certain physical library, say).
So there are _several_ logic paths solr can take to do facetting, depending on which solr.method type you choice, whether the field is multi-valued or single-valued, possibly your facet.sort, and maybe some other factors.
I figured I’d focus on the path I actually need: facet ‘fc’ method, on multi-valued fields, doing a facet.sort=index, and with a facet.limit set to a positive integer. (And NO facet.prefix set).
The outcome I want? Well, start with the idea of the built in facet.offset. I want to do something that’s kind of like that, but I don’t know the offset I want yet, I want solr to figure it out for me based on a prefix. Instead of facet.offset, , I’m going to give, well, I’m making it up, so let’s call it facet.offset_from_prefix . For facet.offset_from_prefix=X, I want solr to figure out the offset that would put the FIRST facet beginning with X as the first value in the facet set — or if there is no facet value beginning with X, then whatever facet value is alphabetic sort closest to X. Then I want to continue as if this was actually specified as a facet.offset, returning facet values starting from there. AND I want the eventual solr response to the client to _include_ this calculated offset (so the client can page forward and back if it wants).
.For the conditions we set above, i think the control-flow path will lead us to: SimpleFacet#getTermCounts, which will get an UninvertedField for our facet field, and then call UninvertedField#getCounts on it.
If we look at UninvertedField#getCounts , an interesting part is the logic for handling facet.prefix. Now, facet.prefix is not what we want, because it changes the overall set of facet values returned. We don’t want to change the overall set, we just want to find the correct _offset_ for a prefix, within the overall unchanged set.
Okay, but look at what facet.prefix does: It FIRST _does_ find exactly the offset we want, by using NumberedTermEnum#skipTo/getTermNumber. Aha, this just showed us how to do what we want to do in solr. (We just don’t want to do the NEXT part of what the facet.prefix handling logic does, reset the overall facet value list’s “0” offset to this found offset).
So we just need to get UninvertedField#getCounts to accept a facet_offset_prefix param (and change everything up in it’s calling chain so that’s passed to it from the url params). And then, when such a thing is present, use that NumberedTermEnum logic to get the offset we want — and SET the variable that holds an explicit offset that would have been passed in by the user to this found offset — that’s it, now let the rest of the Solr logic continue as normal. (Perhaps raise an exception of some kind of conflicting params were passed in — for instance, this this facet_offset_prefix is kind of incompatible with an ordinary facet.prefix. ).
Now the facet values will returned will be right for our spec. The only thing that remains is figuring out how to _echo back_ the looked-up offset to the client, in the solr response. I have no idea how to do that, but trust there should be a not too hard way to modify SimpleFacet to include an extra xml element or attribute in it’s responses, which is I think what would need to be done.
So… I totally don’t actually understand what I’m talking about… but I still think I’ve figured out a decent plan.
If anyone actually has any idea what I’m talking about (the intersection between people who understand the solr code, and people who read my blog, may be 0; and on top of that, talking about code in narrative is inevitably confusing, and I’m not sure if this post is actually comprehensible by anyone)…
Does this actually sound like it just might work?
Is there an obvious reason the performance of this will be crap? By basing it on logic already used by SimpleFacet depending on your arguments, I figure it should perform just as well as, well, the equivalent facet.prefix and/or facet.offset querries already would. But if someone who actually understands Solr sees an obvious performance problem, let me know.
While the amount of code that has to be changed is actually fairly minimal, it might effect a buncha classes, since I need to get my new parameters passed all the way down the call chain to the right place, and then get the calculated offset passed all the way back up to make it into a response. Is this going to be a big pain in the butt custom fork/patch version that will be hard to maintain in parity with continuing Solr developments? (Certainly if the implementation of SimpleFacets#getCounts or UninvertedField#getTermCounts ever changes significantly, the patch would have to be entirely rewritten).
Assuming it actually does work, wonder if there’s any chance of getting a patch like this into solr main stream.