Return to libraryland

I’m excited to announce this week is my first week working for the Othmer Library division at the Chemical Heritage Foundation. CHF’s name isn’t necessarily optimal at explaining what the organization does: It’s actually an independent history of science institute (not in fact focusing exclusively on chemistry), with a museum, significant archival collection, and ongoing research activities. As they say on their home page, “The Chemical Heritage Foundation is dedicated to explaining a simple truth: science has a past and our future depends on it.” That’s a nice way to put it.

I’ll be working, at least initially, mostly on the Hydra/Sufia stack. CHF has been a significant contributor to this open source staff already (with existing developer Anna Headley, who’s still here at CHF, fortunately for us all!), and I am happy to be at a place that prioritizes open source contributions.  CHF has some really interesting collections (medieval alchemical manuscripts? Oral histories from scientists? that and more), which aren’t available on the web yet — but we’re working on it.

CHF is located in Philadelphia, but I’ll still be in Baltimore, mostly working remotely, with occasional but regular visits. (Conveniently, Philadelphia is only about 100 miles from Baltimore).

And I’m very happy to be back in the library community. It’s a little bit less confusing now if I tell people I’m “a librarian”. Just a little.  I definitely missed being in the library world, and the camaraderie and collaboration of the library open source tech community in my year+ I was mostly absent from it — it really is something special.

I have nothing but good things to say about Friends of the Web, where I spent the past 15 months or so. I’ll miss working with my colleagues there and many aspects of the work environment. They’re really a top-notch design and Rails/React/iOS dev firm, and if you’re looking for high-quality design or app implementation work, if you need something done in either web design or app development (or both!) that you don’t have in-house resources to do, I don’t hesitate to recommend them.

Advertisements

rubyland infrastruture, and a modest sponsorship from honeybadger

Rubyland.news is my hobby project ruby RSS/atom feed aggregator.

Previously it was run on entirely free heroku resources — free dyno, free postgres (limited to 10K rows, which dashes my dreams of a searchable archive, oh well). The only thing I had to pay for was the domain. Rubyland doesn’t take many resources because it is mostly relatively ‘static’ and cacheable content, so could get by fine on one dyno. (I’m caching whole pages with Rails “fragment” caching and an in-process memory-based store, not quite how Rails fragment caching was intended to be used, but works out pretty well for this simple use case, with no additional resources required).

But the heroku free dyno doesn’t allow SSL on a custom hostname.  It’s actually pretty amazing what one can accomplish with ‘free tier’ resources from various cloud providers these days.  (I also use a free tier mailgun account for an MX server to receive @rubyland.news emails, and SMTP server for sending admin notifications from the app. And free DNS from cloudflare).  Yeah, for the limited resources rubyland needs, a very cheap DigitalOcean droplet would also work — but just as I’m not willing to spend much money on this hobby project, I’m also not willing to spend any more ‘sysadmin’ type time than I need — I like programming and UX design and enjoy doing it in my spare ‘hobby’ time, but sysadmin’ing is more like a necessary evil to me. Heroku works so well and does so much for you.

With a very kind sponsorship gift of $20/month for 6 months from Honeybadger, I used the money to upgrade to a heroku hobby-dev dyno, which does allow SSL on custom hostnames. So now rubyland.news is available at https, via letsencrypt.org, with cert acquisition and renewal fully automated by the letsencrypt-rails-heroku gem, which makes it incredibly painless, just set a few heroku config variables and you’re pretty much done.

I still haven’t redirected all http to https, and am not sure what to do about https on rubyland. For one, if I don’t continue to get sponsorship donations, I might not continue the heroku paid dyno, and then wouldn’t have custom domain SSL available. Also, even with SSL, since the rubyland.news feed often includes embedded <img> tags with their original src, you still get browser mixed-content warnings (which browsers may be moving to give you a security error page on?).  So not sure about the ultimate disposition of SSL on rubyland.news, but for now it’s available on both http and https — so at least I can do secure admin or other logins if I want (haven’t implemented yet, but an admin interface for approving feed suggestions is on my agenda).

Honeybadger

I hadn’t looked at Honeybadger before myself.  I have used bugsnag on client projects before, and been quite happy with it. Honeybadger looks like basically a bugsnag competitor — it’s main feature set is about capturing errors from your Rails (or other, including non-ruby platform) apps, and presenting them well for your response, with grouping, notifications, status disposition, etc.

I’ve set up honeybadger integration on rubyland.news, to check it out. (Note: “Honeybadger is free for non-commercial open-source projects”, which is pretty awesome, thanks honeybadger!) Honeybadger’s feature set and user/developer experience are looking really good.  It’s got much more favorable pricing than bugsnag for many projects–pricing is just per-app, not per-event-logged or per-seat.  It’s got pretty similar featureset to bugsnag, in some areas I like how honeybadger does things a lot better than bugsnag, in others not sure.

(I’ve been thinking for a while about wanting to forward all Rails.logger error-level log lines to my error monitoring service, even though they aren’t fatal exceptions/500s. I think this would be quite do-able with honeybadger, might try to rig it up at some point. I like the idea of being able to put error-level logging in my code rather than monitoring-service-specific logic, and have it just work with whatever monitoring service is configured).

So I’d encourage folks to check out honeybadger — yeah, my attention was caught by their (modest, but welcome and appreciated! $20/month) sponsorship, but I’m not being paid to write this specifically, all they asked for in return for sponsorship was a mention on the rubyland.news about page.

Honeybadger also includes some limited uptime monitoring.   The other important piece of monitoring, in my opinion, is request- or page-load time monitoring, with reports and notifications on median and 90th/95th percentile. I’m not sure if honeybadger includes that in any way. (for non-heroku deploys, disk space, RAM, and CPU usage monitoring is also key. RAM and CPU can still be useful with heroku, but less vital in my experience).

Is there even a service that will work well for Rails apps that combines error, uptime, and request time monitoring, with a great developer experience, at a reasonable price? It’s a bit surprising to me that there are so many services that do just one or two of these, and few that combine all of them in one package.  Anyone had any good experiences?

For my library-sector readers, I think this is one area where most library web infrastruture is not yet operating at professional standards. In this decade, a professional website means you have monitoring and notification to tell you about errors and outages without needing to wait for users to report em, so you can get em fixed as soon as possible. Few library services are being operated such, and it’s time to get up to speed.  While you can run your own monitoring and notification services on your own hardware, in my experience few open source packages are up to the quality of current commercial cloud offerings — and when you run your own monitoring/notification, you run the risk of losing notice of problems because of misconfiguration of some kind (it’s happened to me!), or a local infrastructure event that takes out both your app and your monitoring/notification (that too!).  A cloud commercial offering makes a lot of sense. While there are many “reasonably” priced options these days, they are admittedly still not ‘cheap’ for a library budget (or lack thereof) — but it’s a price worth paying, it’s what i means to run web sites, apps, and services professionally.

bento_search 1.7.0 released

bento_search is the gem for making embedding of external searches in Rails a breeze, focusing on search targets and use cases involving ‘scholarly’ or bibliographic citation results.

Bento_search isn’t dead, it just didn’t need much updating. But thanks to some work for a client using it, I had the opportunity to do some updates.

Bento_search 1.7.0 includes testing under Rails 5 (the earlier versions probably would have worked fine in Rails 5 already), some additional configuration options, a lot more fleshing out of the EDS adapter, and a new ConcurrentSearcher demonstrating proper use of new Rails5 concurrency API.  (the older BentoSearch::MultiSearcher is now deprecated).

See the CHANGES file for full list.

As with all releases of bento_search to date, it should be strictly backwards compatible and an easy upgrade. (Although if you are using Rails earlier than 4.2, I’m not completely confident, as we aren’t currently doing automated testing of those).

ruby VCR, easy trick for easy re-record

I do a lot of work with external HTTP API’s, and I love the vcr for use in writing tests/specs involving these. It records the interaction, so most of the time the tests are running based on a recorded interaction, not actually going out to the remote HTTP server.

This makes the tests run faster, it makes more sense on a CI server like Travis, it let’s tests run automatically without having to hard-code credentials in for authenticated services (make sure to use VCR’s filter_sensitive_data feature, figuring out the a convenient way to do that with real world use cases is a different discussion), and it even lets people run the tests without having credentials themselves at all to make minor PRs and such.

But in actual local dev, I sometimes want to run my tests against live data for sure, often as the exactly HTTP requests change as I edit my code. Sometimes I need to do this over and over again in a cycle. Previously, I was doing things like manually deleting the relevant VCR cassettes files , to ensure I was running with live data, or avoid VCR “hey, this is a new request buddy” errors.

Why did I never think of using the tools VCR already gives us to make it a lot easier on myself?

Normally works as always, but I just gotta VCR=all ./bin/rspec to run that run with brand newly recorded cassettes. Or VCR=all ./bin/rspec some_specific_spec.rb to re-record only that spec, or only the specs I’m working on, etc.

Geez, I should have figured that out years ago. So I’m sharing with you.

Just don’t ask me if it makes more sense to put VCR configuration in spec_helper.rb or rails_helper.rb. I still haven’t figured out what that split is supposed to be about honestly. I mean, I do sometimes VCR specs of service objects that do not have Rails dependencies…. but I usually just drop it (and all my other config) in rails_helper.rb and ignore the fact that rspec these days is trying to force us to make a choice I don’t really understand the implications or utility of and don’t want to think about.

never bash again, just ruby

Sometimes I have a little automation task so simple that I think, oh, I’ll just use a bash script for this.

I almost always regret the decision, as it tends to grow more complicated, and I start fighting with bash and realize that I don’t know bash very well, and why do I want to spend time knowing bash well anyway, and some things are painful to do in bash even if you do know bash more, should have just used a ruby script from the start.

I always forget this again, and repeat. Doh.

One thing that drives me to bash for simple cases, is when your task does consist of a series of shell commands, getting a reasonably behaving script (esp with regard to output and error handling) can be a pain with just backticks or system in a ruby script.

tty-command gem to the rescue!  I haven’t used it yet, but it’s API looks exactly what I need to never accidentally start with bash again, with no added pain from starting with ruby.  I will definitely try to remember this next time I think “It’s so simple, just use a bash script”, maybe I can use a ruby script with tty-command instead.

tty-command is one gem in @piotrmurach’s  TTY toolkit. 

Heroku auto-scaling, and warning, ask first about load testing

Heroku auto-scaling looks like a pretty sweet feature, well-implemented as expected from Heroku. (Haven’t tried it out myself yet, just from the docs).

But…

“We strongly recommend that you simulate the production experience with load testing, and use Threshold Alerting in conjunction with autoscaling to monitor your app’s end-user experience. If you plan to conduct significant load testing, you will need to request written consent from Heroku in advance to prevent being flagged as a denial of service attacker.”

They strongly recommend something that requires written consent from Heroku? That’s very un-heroku-like annoying.

I actually recently did some automated load testing of rubyland.news, as well as a different app I was working on for a client, in order to determine the proper puma workers and threads. I hadn’t seen these docs and it hadn’t occured to me I should notify Heroku first.

My load testing was brief, but who knows what is considered ‘significant’ by Heroku’s automated DoS defenses. Glad I seem to have escaped being flagged. Next time I’ll make sure to request written consent… by, filing a support ticket I guess, as the text links to the support area.

Concurrency in Rails 5.0

My previous posts on concurrency in ActiveRecord have been some of the most popular on this blog (which I’d like to think means concurrency is getting more popular in Rails-land), so I’m going to share what I know about some new concurrency architecture in Rails5 — which is no longer limited to ActiveRecord.

(update: Hours before I started writing this unawares, matthewd submitted a rails PR for a Rails Guide, with some really good stuff; have only skimmed it now, but you might wanna go there either before, after, or in lieu of this).

I don’t fully understand the new stuff, but since it’s relatively undocumented at present, and has some definite gotchas, as well as definite potentially powerful improvements — sharing what I got seems helpful. This will be one of my usual “lots of words” posts, get ready!

The new architecture primarily involves ActiveSupport::Reloader (a global one of which is in Rails.application.reloader) and ActiveSupport::Executor (a global one of which is in Rails.application.executor). Also ActiveSupport::Dependencies::Interlock (a global one of which is at ActiveSupport::Dependencies.interlock.

Why you need to know this

If you create any threads in a Rails app yourself — beyond the per-request threads a multi-threaded app server like Puma will do for you. Rails takes care of multi-threaded request dispatch for you (with the right app server), but if you’re doing any kind of what I’ll call “manual concurrency” yourself —Thread.new, any invocations of anything in concurrent-ruby (recommended), or probably any celluloid (not sure), etc. — you got to pay attention to be using the new architecture to be doing what Rails wants — and to avoid deadlocks if dev-mode-style class-reloading is happening.

If you’re getting apparent deadlocks in a Rails5 app that does multi-threaded concurrency, it’s probably about this.

If you are willing to turn off dev-mode class-reloading and auto-loading altogether, you can probably ignore this.

What I mean by “dev-mode class-reloading”

Rails5 by default generates your environments/development.rb with  with config.cache_classes==false, config.eager_load==false. Classes are auto-loaded only on demand (eager_load == false), and are also sometimes unloaded to be reloaded on next access (cache_classes == false). (The details of when/how/which/if they are unloaded is outside the scope of this blog post, but has also changed in Rails5).

You can turn off all auto-loading with config.cache_classes==true, config.eager_load==true — the Rails5 default production.  All classes are loaded/require’d en masse on boot, and are never unloaded.  This is what I mean by ‘turn off dev-mode class-reloading and auto-loading altogether’.

The default Rails5 generated environments/test.rb has config.cache_classes==true, config.eager_load==false.  Only load classes on demand with auto-loading (eager_load == false), but never unload them.

I am not sure if there’s any rational purpose for having config.cache_classes = false, config.eager_load = true, probably not.

I think there was a poorly documented  config.autoload in previous Rails versions, with confusing interactions with the above two config; I don’t think it exists (or at least does anything) in Rails 5.

Good News

Previously to Rails 5, Rails dev-mode class-reloading and auto-loading were entirely un-thread-safe. If you were using any kind of manual concurrency, then you pretty much had to turn off dev-mode class-reloading and auto-loading. Which was too bad, cause they’re convenient and make dev more efficient. If you didn’t, it might sometimes work, but in development (or possibly test) you’d often see those pesky exceptions involving something like “constant is missing”, “class has been redefined”, or “is not missing constant” — I’m afraid I can’t find the exact errors, but perhaps some of these seem familiar.

Rails 5, for the first time, has an architecture which theoretically lets you do manual concurrency in the presence of class reloading/autoloading, thread-safely. Hooray! This is something I had previously thought was pretty much infeasible, but it’s been (theoretically) pulled off, hooray. This for instance theoretically makes it possible for Sidekiq to do dev-mode-style class-reloading — although I’m not sure if latest Sidekiq release actually still has this feature, or they had to back it out.

The architecture is based on some clever concurrency patterns, so it theoretically doesn’t impact performance or concurrency measuribly in production — or even, for the most part, significantly in development.

While the new architecture most immediately effects class-reloading, the new API is, for the most part, not written in terms of reloading, but is higher level API in terms of signaling things you are doing about concurrency: “I’m doing some concurrency here” in various ways.  This is great, and should be a good for future of Just Works concurrency in Rails in other ways than class reloading too.  If you are using the new architecture correctly, it theoretically makes ActiveRecord Just Work too, with less risk of leaked connections without having to pay lots of attention to it. Great!

I think matthewd is behind much of the new architecture, so thanks matthewd for trying to help move Rails toward a more concurrency-friendly future.

Less Good News

While the failure mode for concurrency used improperly with class-reloading in Rails 4 (which was pretty much any concurrency with class-reloading, in Rails 4) was occasional hard-to-reprodue mysterious exceptions — the failure mode for concurrency used improperly with class-reloading in Rails5 can be a reproduces-every-time deadlock. Where your app just hangs, and it’s pretty tricky to debug why, especially if you aren’t even considering “class-reloading and new Rails 5 concurrency architecture”, which, why would you?

And all the new stuff is, at this point, completely undocumented.  (update some docs in rails/rails #27494, hadn’t seen that before I wrote this).  So it’s hard to know how to use it right. (I would quite like to encourage an engineering culture where significant changes without docs is considered just as problematic to merge/release as significant changes without tests… but we’re not there yet). (The docs Autoloading and Reloading Constants Guide, to which this is very relevant, have not been updated for this ActiveSupport::Reloader stuff, and I think are probably no longer entirely accurate. That would be a good place for some overview docs…).

The new code is a bit tricky and abstract, a bit hard to follow. Some anonymous modules at some points made it hard for me to use my usual already grimace-inducing methods of code archeology reverse-engineering, where i normally count on inspecting class names of objects to figure out what they are and where they’re implemented.

The new architecture may still be buggy.  Which would not be surprising for the kind of code it is: pretty sophisticated, concurrency-related, every rails request will touch it somehow, trying to make auto-loading/class-reloading thread-safe when even ordinary ruby require is not (I think this is still true?).  See for instance all the mentions of the “Rails Reloader” in the Sidekiq changelog, going back and forth trying to make it work right — not sure if they ended up giving up for now.

The problem with maybe buggy combined with lack of any docs whatsoever — when you run into a problem, it’s very difficult to tell if it’s because of a bug in the Rails code, or because you are not using the new architecture the way it’s intended (a bug in your code). Because knowing the way it’s intended to work and be used is a bit of a guessing game, or code archeology project.

We really need docs explaining exactly what it’s meant to do how, on an overall architectural level and a method-by-method level. And I know matthewd knows docs are needed. But there are few people qualified to write those docs (maybe only matthewd), cause in order to write docs you’ve got to know the stuff that’s hard to figure out without any docs. And meanwhile, if you’re using Rails5 and concurrency, you’ve got to deal with this stuff now.

So: The New Architecture

I’m sorry this is so scattered and unconfident, I don’t entirely understand it, but sharing what I got to try to save you time getting to where I am, and help us all collaboratively build some understanding (and eventually docs?!) here. Beware, there may be mistakes.

The basic idea is that if you are running any code in a manually created thread, that might use Rails stuff (or do any autoloading of constants), you need to wrap your “unit of work” in either Rails.application.reloader.wrap { work } or Rails.application.executor.wrap { work }.  This signals “I am doing Rails-y things, including maybe auto-loading”, and lets the framework enforce thread-safety for those Rails-y things when you are manually creating some concurrency — mainly making auto-loading thread-safe again.

When do you pick reloader vs executor? Not entirely sure, but if you are completely outside the Rails request-response cycle (not in a Rails action method, but instead something like a background job), manually creating your own threaded concurrency, you should probably use Rails.application.reloader.  That will allow code in the block to properly pick up new source under dev-mode class-reloading. It’s what Sidekiq did to add proper dev-mode reloading for sidekiq (not sure what current master Sidekiq is doing, if anything).

On the other hand, if you are in a Rails action method (which is already probably wrapped in a Rails.application.reloader.wrap, I believe you can’t use a (now nested) Rails.application.reloader.wrap without deadlocking things up. So there you use Rails.application.executor.wrap.

What about in a rake task, or rails runner executed script?  Not sure. Rails.application.executor.wrap is probably the safer one — it just won’t get dev-mode class-reloading happening reliably within it (won’t necessarily immediately, or even ever, pick up changes), which is probably fine.

But to be clear, even if you don’t care about picking up dev-mode class-reloading immediately — unless you turn off dev-mode class-reloading and auto-loading for your entire app — you still need to wrap with a reloader/executor to avoid deadlock — if anything inside the block possibly might trigger an auto-load, and how could you be sure it won’t?

Let’s move to some example code, which demonstrates not just the executor.wrap, but some necessary use of ActiveSupport::Dependencies.interlock.permit_concurrent_loads too.

An actual use case I have — I have to make a handful of network requests in a Rails action method, I can’t really push it off to a bg job, or at any rate I need the results before I return a response. But since I’m making several of them, I really want to do them in parallel. Here’s how I might do it in Rails4:

In Rails4, that would work… mostly. With dev-mode class-reloading/autoloading on, you’d get occasional weird exceptions. Or of course you can turn dev-mode class-reloading off.

In Rails5, you can still turn dev-mode class-reloading/autoloading and it will still work. But if you have autoload/class-reload on, instead of an occasional weird exception, you’ll get a nearly(?) universal deadlock. Here’s what you need to do instead:

And it should actually work reliably, without intermittent mysterious “class unloaded” type errors like in Rails4.

ActiveRecord?

Previously, one big challenge with using ActiveRecord under concurrency was avoiding leaked connections.

think that if your concurrent work is wrapped in Rails.application.reloader.wrap do or Rails.application.executor.wrap do, this is no longer a problem — they’ll take care of returning any pending checked out AR db connections to the pool at end of block.

So you theoretically don’t need to be so careful about wrapping every single concurrent use of AR in a ActiveRecord::Base.connection_pool.with_connection  to avoid leaked connections.

But I think you still can, and it won’t hurt — and it should sometimes lead to shorter finer grained checkouts of db connections from the pool, which matters if you potentially have more threads than you have pool size in your AR connection. I am still wrapping in ActiveRecord::Base.connection_pool.with_connection , out of superstition if nothing else.

Under Test with Capybara?

One of the things that makes Capybara feature tests so challenging is that they inherently involve concurrency — there’s a Rails app running in a different thread than your tests themselves.

I think this new architecture could theoretically pave the way to making this all a lot more intentional and reliable, but I’m not entirely sure, not sure if it helps at all already just by existing, or would instead require Capybara to make use of the relevant API hooks (which nobody’s prob gonna write until there are more people who understand what’s going on).

Note though that Rails 4 generated a comment in config/environments/test.rb that says “If you are using a tool that preloads Rails for running tests [which I think means Capybara feature testing], you may have to set [config.eager_load] to true.”  I’m not really sure how true this was in even past versions Rails (whether it was neccessary or sufficient). This comment is no longer generated in Rails 5, and eager_load is still generated to be true … so maybe something improved?

Frankly, that’s a lot of inferences, and I have been still leaving eager_load = true under test in my Capybara-feature-test-using apps, because the last thing I need is more fighting with a Capybara suite that is the closest to reliable I’ve gotten it.

Debugging?

The biggest headache is that a bug in the use of the reloader/executor architecture manifests as a deadlock — and I’m not talking the kind that gives you a ruby ‘deadlock’ exception, but the kind where your app just hangs forever doing nothing. This is painful to debug.

These deadlocks in my experience are sometimes not entirely reproducible, you might get one in one run and not another, but they tend to manifest fairly frequently when a problem exists, and are sometimes entirely reproducible.

First step is experimentally turning off dev-mode class-reloading and auto-loading altogether  (config.eager_load = true,config.cache_classes = true), and see if your deadlock goes away. If it does, it’s probably got something to do with not properly using the new Reloader architecture. In desperation, you could just give up on dev-mode class-reloading, but that’d be sad.

Rails 5.0.1 introduces a DebugLocks feature intended to help you debug these deadlocks:

Added new ActionDispatch::DebugLocks middleware that can be used to diagnose deadlocks in the autoload interlock. To use it, insert it near the top of the middleware stack, using config/application.rb:

config.middleware.insert_before Rack::Sendfile, ActionDispatch::DebugLocks

After adding, visiting /rails/locks will show a summary of all threads currently known to the interlock.

PR, or at least initial PR, at rails/rails #25344.

I haven’t tried this yet, I’m not sure how useful it will be, I’m frankly not too enthused by this as an approach.

References

  • Rails.application.executor and Rails.application.reloader are initialized here, I think.
  • Not sure the design intent of: Executor being an empty subclass of ExecutionWrapper; Rails.application.executor being an anonymous sub-class of Exeuctor (which doesn’t seem to add any behavior either? Rails.application.reloader does the same thing fwiw); or if further configuration of the Executor is done in other parts of the code.
  • Sidekiq PR #2457 Enable code reloading in development mode with Rails 5 using the Rails.application.reloader, I believe code may have been written by matthewd. This is aood intro example of a model of using the architecture as intended (since matthewd wrote/signed off on it), but beware churn in Sidekiq code around this stuff dealing with issues and problems after this commit as well — not sure if Sidekiq later backed out of this whole feature?  But the Sidekiq source is probably a good one to track.
  • A dialog in Rails Github Issue #686 between me and matthewd, where he kindly leads me through some of the figuring out how to do things right with the new arch. See also several other issues linked from there, and links into Rails source code from matthewd.

Conclusion

If I got anything wrong, or you have any more information you think useful, please feel free to comment here — and/or write a blog post of your own. Collaboratively, maybe we can identify if not fix any outstanding bugs, write docs, maybe even improve the API a bit.

While the new architecture holds the promise to make concurrent programming in Rails a lot more reliable — making dev-mode class-reloading at least theoretically possible to do thread-safely, when it wasn’t at all possible before — in the short term, I’m afraid it’s making concurrent programming in Rails a bit harder for me.  But I bet docs will go a long way there.

A class_eval monkey-patching pattern with prepend

Yes, it’s best to avoid “monkey-patching” — changing an already loaded ruby class by reopening the class to add or replace methods.

But sometimes you’ve got no choice, because a dependency just doesn’t give you the API you need to do what you need, or has a bug that hasn’t been fixed in a release you can use yet.

And in some cases I really do think it actually makes sense to most forward-compatibly make your customization to a dependency, in a way that’s surgically targetted to avoid replacing or copy-pasting code you _don’t_ want to customize, to make it most likely your code will keep working with future releases of the dependency.

Module#prepend, added in Ruby 2.0, makes it easier to do this kind of surgical intervention, because you can monkey-patch a new method replacing an original implementation, and still call super to call default/original implementation of that very same method. Something you couldn’t do before to methods that were implemented directly in the original class-at-hand (rather than a module/superclass it includes/extends).

But a Module you are going to prepend can’t include “class macros”, class methods like activerecord’s `validates` for instance.  For a module that’s going to be included in a more normal way, ActiveSupport::Concern in Rails can let ‘class macros’ live sensibly in the module — but AS::Concern has no support for prepend, not gonna help.  (Maybe a PR to Rails? If I had some indication that rails maintainers might be interested in such a PR, I might try to see if I could make something reasonable, but I hate working on tricky stuff only to have maintainers reject it as something they’re not interested in).

You might be able to hack something up yourself with Module#prepended, similar to an implementation one could imagine being a part of AS::Concern. But I don’t, I just Use The Ruby. Here’s how I do my class_eval monkey-patches with Concern, trying to keep everything as non-magical as possible, and without diminishing readability too much from when we just used class_eval without Module.prepend.


# spell out entire class name, so it's not defined yet
# we'll get a raise -- we don't want to define it fresh here
# accidentally when we're expecting to be monkey-patching
Some::Dependency::Foo.class_eval
  # 'class macros' go here
  validates :whatever

  # We want the instance methods inline here for legibility,
  # looking kind of like an ordinary class. But we want
  # to use prepend. And giving it a name rather than an
  # anonymous module can help with stack traces and other debugging.
  # this is one way to do all that:
  prepend(FooExtension = Module.new do
    def existing_method
      if custom_guard_logic
        return false
      end

      super
    end
  end)

Last part: I put all these extensions in a directory I create, ./app/extensions

Because of what I’ll show you next, you can call these files whatever you want, so I put them in the same directory structure and with the same name as the original file being patched, but with _extension on the end. So the above would be at ./app/some/dependency/foo_extension.rb.

And then I put this to_prepare in my ./config/application.rb, to make sure all these monkey-patch extensions get loaded in dev-mode class-reloading, properly effecting the thing they are patching even if that thing is dev-mode class-reloaded too:

    config.to_prepare do
      # Load any monkey-patching extensions in to_prepare for
      # Rails dev-mode class-reloading. 
      Dir.glob(File.join(File.dirname(__FILE__), "../app/extensions/**/*_extension.rb")) do |c|
        Rails.configuration.cache_classes ? require(c) : load(c)
      end

So there you go. This seems to be working for me, arrived at this pattern in fits and pieces, copying techniques from other projects and figuring out what worked best for me.

Segmenting “Catalog” and “Articles” in EDS API

About 4 years ago, I posted a long position paper arguing that a “bento-style” search  was appropriate for the institution I then worked at. (I’ve taken to calling it a “search dashboard” approach too since then.)   The position paper stated that this recommendation was being made in the face of actually existing technical/product constraints at the time; as well as with the (very limited) research/evidence we had into relevant user behavior and preferences. (And also because for that institution at the time, a bento-style search could be implemented without any additional 6-figure software licenses, which some of the alternatives entailed).

I never would have expected that 4 years later the technical constraint environment would be largely unchanged, and we would not have (so far as I’m aware) any significant additional user research (If anyone knows about any write-ups, please point us to them). But here we are. And “bento style” search has kind of taken over the landscape.

Putting reasons for that and evaluations of whether it’s currently the best decision aside, for a client project I have been implementing a “bento style” search dashboard with several of the components targetting the EDS API.  (The implementation is of course using the bento_search gem, expect a new release in the near future with many enhancements to the EDS adapter).

The client wanted to separate “Catalog” and “Article” results in separate “bento” boxes — clicking the “see all results” link should take the user to the EDS standard interface, still viewing results limited to “Catalog” and “Articles”. It was not immediately clear how to best accomplish that in EDS.  The distinction could be based on actual source of the indexed records (indexed from local ILS, vs EDS central index), or on format (‘article’ vs ‘monograph and things like that’, regardless of indexing source).  I was open to either solution in exploring possibilities.

I sent a query to the Code4Lib listserv for people doing this with EDS and discovered: This is indeed a very popular thing to do with EDS; People are doing it a whole variety of different kind of hacky ways.   The conclusion is I think probably the best way might be creating a custom EDS “limiter” corresponding to a “(ZT articles)” query, but I’m not sure if anyone is actually doing that, and i haven’t tried it yet myself.

Possibilities identified in people’s off-list responses to me:

  • Some people actually just use full un-limited EDS results for “Articles”, even though it’s labelled “Articles”! Obviously not a great solution.

  • Some people set up different EDS ‘profiles’, one which just includes the Catalog source/database, and one which includes all source/databases except ‘Catalog’.  This works, but I think doesn’t give the user a great UI for switching back and forth once they are in the EDS standard interface, or choosing to search over everything once they are there — although my client ultimately decided this was good enough, or possibly even preferred to keep ‘catalog’ and ‘articles’ entirely separate in the UI.
  • One person was actually automatically adding “AND (ZT article)” to the end of the user-entered queries. Which actually gives great results. Interestingly, it even returns some results marked “Book” format type in EDS — because they are book chapters, which actually seems just right. On the API end, this is just fine to invisibly add an “AND (ZT article)” to the end of the query. But once we direct to ‘view all results’, redirecting to a query that has “AND (ZT article)” at the end looks sloppy, and doesn’t give the user a good UI for choosing to switch between articles, catalog, and everything, once they are there in the EDS standard interface.

  • Some people are using the EDS format “source type” facets, limiting to certain specified ‘article-like’ values.  That doesn’t seem as good as the “(ZT article)” hack, because it won’t include things like book chapters that are rightly included in “(ZT article)”.  But it may be good enough or the best available option.  But, while I believe I can do that limit fine form the API, I haven’t figured out any way to ‘deep link’ into EDS results with a pre-selected query that has pre-selected “source type” facet limits.  Not sure if there any parameters I can add on to the `?direct=true&bquery=your-query-here`  “deep link” URL to pre-select source type facets.

Through some communication with an internal EDS developer contact, I learned it ought to be possible to create a custom “limiter” in EDS corresponding to the AND (ZT articles)hack. I’m not sure if anyone is actually doing this, but sounds good to me for making an “Articles Only” limiter which can be used in both standard EDS and via API. The instructions I was given were:

Good news, we can do your best option here.  We’ve got a feature
called “Custom Limiters” that should do the trick.

http://search.ebscohost.com/login.aspx?direct=true&scope=site&site=eds-live&authtype=guest&custid=ericfrier&groupid=main&profile=eds_fgcu%20&bquery=nanotechnology+AND+PZ+Article

Take a look at how this search “pre-selects” the custom limiter and
removes the syntax from the search query.

In order to accomplish this, the library needs to add a custom
limiter for the specific search syntax you’d like to use.  In
this case, this needs to be pasted in the bottom branding of their
EDS profile:

[gah, having so much trouble getting WP to let me put example
script tag in here. --jrochkind ] 

(pretend this is an <)script type="text/javascript" src="http://widgets.ebscohost.com/prod/simplekey/customlimiters/limiter.php?modifier=AND%20PZ%20Article&label=Articles%20Only&id=artonly"> </script>


This script catches any use of “AND PZ Article” and instead simulates
a limiter based on that search syntax.

I haven’t actually tried this myself yet, but it sounds like it should probably work (modulo the typo “PZ Article” for “ZT Article”, which I think is the right one to use on EDS).  Hard to be sure of anything until you try it out extensively with EDS API, but sounds good to me.