I havent’ actually read it yet, but just the abstract alone of this Dlib article makes me think of a reoccurent problem I think about. If showing the user all the subjects that matched their query along with hits is useful (we often describe this as ‘facetted’ display, which I think is actually a misnomer), that might work well when you only have LCSH, but what the heck do you do when you have a corpus involving disparate controlled vocabularies?
Just listing all the controlled terms raw can easily give users misleading ideas in several ways, or just be plain confusing.
And what if some items in the corpus don’t have controlled subject/genre vocab at all?
So on reading that abstract I think, hmm, assuming LCSH is still the most common controlled vocab in your corpus could you use automated clustering algorithms to map other items to LCSH, to actually provide a meaningful list of subjects across your corpus?