Blacklight 7: current_user or other request context in SearchBuilder solr query builder

In Blacklight, the “SearchBuilder” is an object responsible for creating a Solr query. A template is generated into your app for customization, and you can write a kind of “plugin” to customize how the query is generated.

You might need some “request context” to do this. One common example is the current_user, for various kinds of axis control. For instance, to hide certain objects from returning in Solr query depending on user’s permissions, or perhaps to keep certain Solr fields from searched (in qf or pf params) unless a user is authorized to see/search them.

The way you can do this changed between Blacklight 6 and Blacklight 7. The way to do it in Blacklight 7.1 is relatively straightforward, but I’m not sure if it’s documented, so I’ll explain it here. (Anyone wanting to try to update the blacklight-access_controls or hydra-access-controls gems to work with Blacklight 7 will need to know this).

I was going to start by describing how this worked in Blacklight 6… but I realized I didn’t understand it, and got lost figuring it out. So we’ll skip that. But I believe that in BL 6, controllers interacted directly with a SearchBuilder. I can also say that the way a SearchBuilder got “context” like a current_user in BL6 and previous was a bit ad hoc and messy, without a clear API, and had evolved over time in a kind of “legacy” way.

Blacklight 7 introduces a new abstraction, the somewhat generically named “search service”, normally an instance of Blacklight::SearchService. (I don’t think this is mentioned in the BL 7 Release Notes, but is a somewhat significant architectural change that can break things trying to hook into BL).

Now, controllers don’t interact with the SearchBuilder, but with a “search service”, which itself instantiates and uses a SearchBuilder “under the hood”. In Blacklight 7.0, there was no good way to get “context” to the SearchBuilder, but 7.1.0.alpha has a feature that’s pretty easy to use.

In your CatalogController, define a search_service_context method which returns a hash of whatever context you need available:

class CatalogController < ApplicationController
  include Blacklight::Catalog

  def search_service_context
    { current_user: current_user }
  end

# ...
end

OK, now the Blacklight code will automatically add that to the "search service" context. But how does your SearchBuilder get it?

Turns out, in Blacklight 7, the somewhat confusingly named scope attribute in a SearchBuilder will hold the acting SearchService instance, so in a search builder or mix-in to a search_builder…

def some_search_builder_method
  if scope.context[:current_user]
    # we have a current_user!
  end
end

And that’s pretty much it.

I believe in BL 7, the scope attribute in a SearchBuilder will always be a “search service”, perhaps it would make sense to alias it as “search_service”. To avoid the somewhat ugly scope.context[:current_user], you could put a method in your SearchBuilder that covers that as current_user, but that would introduce some coupling between that method existing in SearchBuilder, and a SearchBuilder extension that needs to use it, so I didn’t go that route.

For a PR in our local app that supplies a very simple local SearchBuilder extension, puts it into use, and makes the current_user available in a context, see this PR. 

Advertisements

A terrible Github UI — accidentally shadow a tag with a branch

So we generally like to tag our releases in git, like v1.0.0 or what have you.

Github Web UI has a “tag/branch switcher” widget, which lets you look at a particular branch or tag in the Web UI.

Screenshot 2019-04-30 12.27.17

You can see it has separate tabs for “branches” and “tags”. Let’s say you get confused, and type “v1.0.0” (a tag) while the “branches” tab is selected (under the text box).

Screenshot 2019-04-30 12.30.55

It found no auto-complete for “v1.0.0” in “branches” (although there is a tag with that name it would have found if “tags” tab had been selected), and it “helpfully” offers to create a branch with that name.

Now, if you do that, you’re going to have a new branch, created off master, with the same name as a tag. Which is going to be really confusing. And not what you wanted.

Maybe your muscle memory makes your fingers hit “enter” and you wind up there — but at least it is very clearly identified, it says in fairly big and bold text “Create branch: v1.0.0 (from master)”, at least it warned you, although it’d be easy to miss in a hurry from muscle memory thinking you know what you’re doing.

That’s not the really evil UI yet.

Now let’s go to git’s “compare” UI. At https://github.com/someorg/someproject/compare

A fairly common thing I at least want to do is look at the compare between two releases, or from last release to master. But the ‘compare’ UI doesn’t have the tabs, it will only list or auto-complete from branches.

Screenshot 2019-04-30 12.34.55

In a hurry, going from muscle memory, you type in “v1.0.0” anyway.

Screenshot 2019-04-30 12.35.37

It does say “nothing to show”. But “v1.0.0” shows up in the list anyway. With a pretty obscure icon I’ve never seen before. Do you know what that icon means? It turns out, apparently, it means “Create branch: v1.0.0 (from master)”.

If confused, or in a hurry, or with your muscle memory outpacing your brain, you click on that line — that’s what happens.

Now you’ve got a branch called “v1.0.0”, created off current master, along with a tag “v1.0.0” pointing at a different SHA.  Because many UI’s treat branches and tags somewhat interchangeably, this is confusing. If you do a git checkout v1.0.0, are you going to get the branch or the tag?

It turns out if you go to a github compare UI, like `https://github.com/someorg/someproject/compare/v1.0.0..master`, Github is going to compare the new branch you accidentally made, not the existing tag (showing nothing in the diff, if master hasn’t changed yet). There is no no way to get Github to compare the tag. If you didn’t realize exactly what you did, you’re going to be awfully confused about what the heck is going on.

You’re going to need to figure it out, and delete the branch you just made, which it turns out you can do from the command line with the confusing and dangerous command: ` git push origin :refs/heads/v1.0.0`

And that’s how I lost a couple hours to figuring out “what the heck is going on here?”

What should you do if you want github ‘compare’ web UI for a tag rather than a branch? Turns out, as far as I know, you just need to manually enter the the URL https://github.com/org/project/compare/v1.0.0..v1.0.1 or what have you. The actual UI widgets will not get you there. They’ll just get you to a mess.

Am I missing something? That seems like github web UI is not only not providing for what I would think is a pretty common use (comparing tags), but leading you down a path to disaster when you look for it, no?

Solr Indexing in Kithe

So you may recall the kithe toolkit we are building in concert with our new digital collections app, which I introduced here.

I have completed some Solr Indexing support in kithe. It’s just about indexing, getting your data into Solr. It doesn’t assume Blacklight, but should work fine with Blacklight; there isn’t currently any support in kithe for what you do to provide UX for your Solr index.  You can look at the kithe guide documentation for the indexing features for a walk-through.

The kithe indexing support is based on ActiveRecord callbacks, in particular the after_commit callback. While callbacks get a bad rap, I think they are appropriate here, and note that both the popular sunspot gem (Solr/Rails integration, currently looking for new maintainers) and the popular searchkick gem (ElasticSearch/Rails integration) base their indexing synchronization on AR callbacks too. (There are various ways in kithe’s feature to turn off automatic callbacks temporarily or permanently in your code, like there are in those other two gems too). I spent some time looking at API’s, features, and implementation of the indexing-related functionality in sunspot, and searchkick, as well as other “prior art”, before/while developing kithe’s support.

The kithe indexing support is also based on traject for defining your mappings.

I am very happy with how it turned out, I think the implementation and public API both ended up pretty decent. (I am often reminded of the quote of uncertain attribution “I didn’t have time to write a short letter, so I wrote a long one instead” — it can take a lot of work to make nice concise code).

The kithe indexing support is independent of any other kithe features and doesn’t depend on them. I think it might be worth looking at for anyone writing a an app whose persistence is based on ActiveRecord. (If something ActiveModel-like but not ActiveRecord, it probably doesn’t have after_commit callbacks, but if it has after_save callbacks, we could make the kithe feature optionally use those instead; sunspot and searchkick can both do that).

Again, here’s the kithe documentation giving a tour of the indexing features. 

Note on traject

The part of the architecture I’m least happy with is traject, actually.

Traject was written for a different use case — command-line executed high-volume bulk/batch indexing from file serializations. And it was built for that basic domain and context at the time, with some YAGNI thoughts.

So why try to use it for a different case of event-based few-or-one object sync’ing, integrated into an app?  Well, hopefully it was not just because I already had traject and was the maintainer (‘when all you have is a hammer’), although that’s a risk. Partially because traject’s mapping DSL/API has proven to work well for many existing users. And it did at least lead me to a nice architecture where the indexing code is separate and fairly decoupled from the ActiveRecord model.

And the Traject SolrJsonWriter already had nice batching functionality (and thread-safety, although didn’t end up using it in current kithe architecture), which made it convenient to implement batching features in a de-coupled way (just send to a writer that’s batching, the other code doesn’t need to know about it, except for maybe flushing at the end).

And, well, maybe I just wanted to try it. And I think it worked out pretty well, although there are some oddities in there due to traject’s current basic architectural decisions. (Like, instantiating a Traject “Indexer” can be slow, so we use a global singleton in the kithe architecture, which is weird.)  I have some ideas for possible refactors of traject (some backwards compat some not) that would make it seem more polished for this kind of use case, but in the meantime, it really does work out fine.

Note on times to index, legacy sufia app vs our kithe-based app

Our collection, currently in a sufia app, is relatively small. We have about 7,000 Works (some of which are “child works”), 23,000 “FileSets” (which in kithe we call “Assets”), and 50 Collections.

In our existing Sufia-based app, it takes about 6 hours to reindex to Solr on an empty index.

  • Except actually, on an empty index it might take two re-index operations, because of the way sufia indexing is reliant on getting things out of the index to figure out the proper way to index a thing at hand. (We spent a lot of work trying to reorganize the indexing to not require an index to index, but I’m not sure if we succeeded, and may ironically have made performance issues with fedora worse with the new patterns?) So maybe 12 hours.
  • Except that 6 hours is just a guess from memory. I tried to do a bulk reindex-everything in our sufia app to reconfirm it — but we can’t actually currently do a bulk reindex at all, because it triggers an HTTP timeout from Fedora taking too long to respond to some API request.
    • If we upgraded to ActiveFedora 12, we could increase the timeout that ActiveFedora is willing to wait for a fedora response for. If we upgraded to ActiveFedora 12.1, it would include this PR, which I believe is intended to eliminate those super long fedora responses. I don’t think it would significantly change our end-to-end indexing time, the bulk of it is not in those initial very long fedora API calls. But I could be wrong. And not sure how realistic it is to upgrade our sufia app to AF 12 anyway.
    • To be fair, if we already had an existing index, but needed to reindex our actual works/collections/filesets because of a Solr config change, we had another routine which could do so in only ~25 minutes.

In our new app, we can run our complete reindexing routine in currently… 30 seconds. (That’s about 300 records/second throughput — only indexing Works and Collections. In past versions as I was building out the indexing I was getting up to 1000 records/second, but I haven’t taken time to investigate what changed, cause 30s is still just fine).

In our sufia app we are backing up our on-disk Solr indexes, because we didn’t want to risk the downtime it would take to rebuild (possibly including fighting with the code to get it to reindex).  In addition to just being more bytes to sling, this leads to ongoing developer time on such things as “did we back up the solr data files in a consistent state? Sync’d with our postgres backup?”, and “turns out we just noticed an error in the backup routine means the backup actually wasn’t happening.” (As anyone who deals with backups of any sort knows can be A Thing).

In the new system, we can just… not do that.  We know we can easily and quickly regenerate the Solr index whenever, from the data in postgres. (And if we upgrade to a new Solr version that requires an index rebuild, no need to figure out how to do so without downtime in a complicated way).

Why is the new system so much faster? I’ve identified three areas I believe are likely, but haven’t actually tried to do much profiling to determine which of these (if any?) are the predominant factors, so couldn’t say.

  1. Getting things out of fedora (at least under sufia’s usage patterns) is slow. Getting things out of postgres is fast.
  2. We are now only indexing what we need to support search.
    • The only things that show up in our search results are Works and Collections, so that’s all we’re indexing. (Sufia indexes not only FileSets too, but some ancillary objects such as one or two kinds of permission objects, and possibly a variety of other things I’m not familiar with. Sufia is trying to put pretty much everything that’s in fedora in Solr. For Reasons, mainly that it’s hard to query your things in Fedora with Fedora).
    • And we are only indexing the fields we actually need in Solr for those objects. Sufia tries to index a more or less round-trippable representation to Solr, with every property in it’s own stored solr field, etc. We aren’t doing that anymore. We could put all text in one “text” field, if we didn’t want to boost some higher than others. So we only index to as many fields as need different boosts, plus fields for facets, etc. Only what need to support the Solr functionality we want.
      • If you want to render your results from only Solr stored fields (as sufia/hyrax do, and blacklight kind of wants you to) you’d also need those stored fields, sufficiently independently addressable to render what you want (or perhaps just in one big serialized JSON?). We are hoping to not use solr stored fields for rendering at all, but even if we end up with Solr stored fields for rendering, it will be just enough that we need for rendering. (For instance, some people using Blacklight are using solr stored fields for the “index”/search results/hits page, but not for the individual record ‘show’ page).
  3. The indexing routines in new thing send updates to Solr in an efficient way, both batching record updates into fewer Solr HTTP update requests, and not sending synchronous Solr “hard commits” at all. (the bulk reindex, like the after_commit indexing, currently sends a softCommit per update request, although this could be configured differently).

So

Check out the kithe guide on indexing support! Maybe you want to use kithe, maybe you’re writing an ActiveRecord-based apps and want to consider kithe’s solr indexing support in isolation, or maybe you just want to look at it for API and implementation ideas in your own thing(s).

very rough benchmarking of Solr update batching performance characteristics

In figuring out how I want to integrate a synchronized Solr index into my Rails application, I am doing some very rough profiling/benchmarking of batching Solr adds vs not, just to get a general sense of it.

(This is all _very rough estimates_ and may depend a lot on your environment and Solr setup, including how many records you have in Solr, if Solr is being simultaneously used for queries, etc).

One thing some Solr (or ElasticSearch) integration packages sometimes end up concentrating on is batching multiple index-change-needed events into fewer Solr update requests.

Based on my observations, I think it’s not actually the separate HTTP requests that are expensive. (although I’m benchmarking with a solr on localhost).

But the commits are — if you are doing them. In my benchmarks reindexing a whole bunch of things, if I’m not doing any commits, whether I batch into fewer HTTP update requests to Solr or not has no appreciable effect on speed.

But sending a softCommit per record/update makes it around 2.5x slower.

Sending a (hard) commit per record makes it around 4x slower.

Even without explicit commit directives, if you have your solr setup to autocommit (soft or hard), it may of course occasionally pause to do some commits, so your measured time may depend on if you hit one of those.

So if you don’t care about realtime/near-realtime, you may not have to care about batching. I had already gotten the sense from Solr’s documentation that Solr will really like it better if the client never sends commits, but just lets Solr’s autoCommit/autoSoftCommit/commitWithin configuration to make sure updates become visible within a certain amount of maximum time. The reason to have the client send commits is generally because you need to guarantee that the updates will be visible to queries as soon as your code doing the update is finished.

The reason so many end up caring about batching updates might not because individual http requests to solr are a problem, but because too many _commits_ are. So if for some reason it was more convenient, only sending a commit per X records might be just as good as actually batching http requests — if you have to send commits from the client at all.

What “search engine” to use in a digital collections Rails app?

Traditional samvera apps have Blacklight, and it’s Solr index, very tightly integrated into many parts of persistence and discovery functionality, including management interfaces.

In rewriting our digital collections app , we have the opportunity to make other choices. Which of course is both a blessing and a curse, who wants choices?

One thing I know I don’t want is as tight and coupled an integration to Solr as a typical sufia or hyrax app.

We should be able to at least find persisted model items by id (or iterate through all of them), make some automated changes (say correcting a typo), and persist them to storage — without a Solr index existing at all. To the extent a Solr (or other text-search-engine) index exists at all, discrepencies between what’s in the index and what’s in our “real” store should not cause any usual mutation-and-persistence APIs to fail (either with an error or with wrong side effect outcome).

Really, I want a back-end interface that can do most if not all things a staff user needs to do in managing the repo, without any Solr index existing at all.  Just plain postgres ‘like’ search may sometimes be enough, when it’s not using pg’s full text indexing features likely are. These sorts of features are not as powerful as a ‘text search engine’ product like lucene or Solr — they aren’t going to do the complicated stemming that Solr does, or probably features like “phrase boosting”. They can give you filters/limits, but not facets in the Solr sense (telling you what terms are present in a given already restricted search, with term counts).

So we almost certainly still want Solr or a similar search engine providing user-facing front-end discovery, for this powerful search experience. We just want it sitting loosely on top or our app, not tightly integrated into every persistence and retrieval operation like it ends up in a typical sufia or hyrax app.

And part of this, for me, is I only want to have to index in Solr (or similar) what is neccesary for discovery/retrieval features, for search. This is how Solr works best. I don’t want to have to store a complete representation of the model instance in Solr, with every field addressable from a Solr result. So, that means, even in the implementation of the front-end UX search experience, i want display of the results to be from my ordinary ActiveRecord model objects (even on the ‘index’ search results page, and certainly on the ‘item detail’ page).  This is in fact how sunspot works — after solr returns a hit list, take the db pk’s (and model names) from the solr response, and then just fetch the results again from the database.  In a nice efficient SQL, using pre-loading (via SQL joins) etc. This is how one attempt at at elasticsearch-rails integration works too.

Yeah, it’s doing an “extra” fetch from the db, when it theoretically could have gotten everything it needed to display from Solr.  But properly optimized fetches from the db to display one page of search results are pretty quick, certainly faster than what was going on with ActiveFedora in our sufia app anyway, and the developer pain (and subsequent bugs) that can come from trying to duplicate everything in Solr just aren’t worth trying to optimize away the db fetch. There’s a reason popular Solr or ElasticSearch integrations with Rails do the “extra” fetch.

OK, I know what I want (and what I don’t), but what am I going to do? There’s still some choices, curses!

1. Blacklight, intervened in to return actual ActiveRecord models to views?

Blacklight was originally written for “library catalog” use cases where you might not be indexing from a local rdbms at all, you might be indexing from a third party data API, and you might not have representations in a local rdbms, solr is all you’ve got. So it was written to find everything it needs to display the results found in the Solr response.

But can we intervene in Blacklight to instead take the Solr responses, use them to get model names and pks to then fetch from a local rdbms instead? Like sunspot does?

This was my initial plan, and at first I thought I could easily. In fact, years ago, when writing a Blacklight catalog app, I had to do something in some ways analagous. We wanted our library catalog to show live checked in/out status for things returned by Solr. But we had no good way to get this into our Solr index in realtime. So, okay, we can’t provide a filter/facet by this value without it in the index, but can we still display it with realtime accuracy?

We could! We wanted to hook into the solr search results process in Blacklight, take the unique IDs from the solr response, and use them to make API calls to our ILS to figure out item status. (checked out, etc). And we wanted to do this in one bulk query (with all the IDs that are on the page of results), not one request per search hit, which would have been a performance problem. (We won’t talk about how our ILS didn’t actually have an API; let’s just say we gave it one).

So I had already done that, and thought the hook points and functions were pretty similar (just look up ‘extra info’ differently, this time the ‘extra info’ is an actual ActiveRecord model associated with each hit, instead of item status info). So I figured I could do it again!

The Blacklight method I had overridden to do that (back in maybe Blacklight 2.x days), was the search_results method called by Catalog#index action among other places. Overriding this got every place Blacklight got ‘results’, so we could make sure to hook in to get ‘extra stuff’ on every results fetching. it returned the @response itself, so we could hook into it to enhance the SolrDocument‘s returned to have extra stuff! Or hey, it’s a controller method, we can even have it set other iVars if we want. A nice clean intervention.

But alas! While I had used this in many-years-ago Blacklight, and it made it all the way to Blacklight 6… this method no longer exists in Blacklight 7, and there’s no great obvious place to override to do similar. It looks like it actually went away in a BL commit a couple years ago, but that commit didn’t make it into a BL release until 7.0. The BL 7 def index action method… doesn’t really have any clear point to intervene to do similar.

Maybe I could do something over in the ‘search_service.search_result’  method, I guess in a custom search_service. It’s not a controller method, so couldn’t provide it’s own iVar, but it could still modify the @response to enhance the SolrDocuments in it.  There are some more layers of architecture to deal with (and possibly break with future BL releases), and I haven’t yet really figured out what the search_service is and where it comes from! But it could be done.

Or I could try to get search_results cover method back in BL. (Not sure how ammenable BL would be to such a PR).

Or I could… override even more? The whole catalog index method? Don’t even use the blacklight Catalog controller at all but write my own? Both of those, my intuition based on experience with BL says, there be dragons.

2. Use Solr, but not Blacklight

So as I contemplated the danger of overriding big pieces of BL, I thought, ok, wait, why am I using BL at all, actually?

A couple senior developers at a couple institutions I talked to (I won’t reveal their names in case I’m accidentally misrepresenting them, and to not bring down the heat of bucking the consensus on them) said they were considering just writing ruby code to interact with Solr. (We’re talking on search/discovery, indexing — getting the data into Solr in the first place — is another topic). They said, gee, what we need in our UI for Solr results just isn’t that complicated, we think maybe it’s not actually that hard to just write the code to do it, maybe easier than fighting with BL, which in the words of one developer has a tendency to “infect” your application making everything more complicated when you try doing things the “Blacklight way”.

And it’s true we spend a lot of time overriding or configuring Blacklight to turn off features we didn’t think worked right, or we just don’t want in our UX. (Sufia/hyrax have traditionally tried to override Blacklight to make the ‘saved searches’ feature go away for instance). And there’s more features we just don’t use.

Could we just write our own code for issuing queries to BL, and displaying facets and hits from the results? Maybe. Off-hand, I can think of a couple things we get from BL that are non-trivial.

  1. The “back to search” link on the item detail page. Supplying a state-ful UX on top of state-less HTTP web app (while leaving the URLs clean) is a pain.  The Blacklight implementation has gone through various iterations and probably still has some weirdness, but my stakeholders have told me this feature is important.
  2. The date range facet with little mini-histogram, provided by blacklight_range_limit. This feature is also implemented kind of crazily (I should know, I wrote the first version, although I’m not currently really the maintainer) — if all you want is a date range limit where you enter a start and end year, without the little bar-graph-ish histogram, that’s actually easy, and I think some people are using blacklight_range_limit when that’s all they want, and could be doing it a lot simpler. But the histogram, with the nice calculated (for the particular result set!) human-friendly range endpoints (it’ll be 10s or 5s or whatever, at the right order of magnitude for your current facetted results!), kind of a pain, and it just works with blacklight_range_limit (although I don’t actually know if blacklight_range_limit works for BL7, it may not).

Probably a few more things I’m not thinking of that I’d run into.

On the plus side, wouldn’t have to fight with Blacklight to turn off the things I don’t want, or to get it to have the retrieval behavior I want retrieving hits my actual rdbms for display.

Hmm.

(While I keep looking at sunspot for ideas — it is/was somewhat popular, so must have at least gotten some developer APIs right for some use cases involving rdbms data searched in Solr — it’s got some awfully complicated implementation, is assuming certain “basic search” use cases as the golden path and definitely has some things I’d have to fight with, and has a “Looking for maintainers” line on it’s README, so I’m not really considering actually using it).

3. Should we use ElasticSearch?

Hypothetically, I think Blacklight was abstracted at one point to support ElasticSearch. I’m not totally sure how that went, if anyone is using BL with ES in production or whatever.

But if I wanted to use ElasticSearch, I think I probably wouldn’t try to use it with BL, but as an extension of the “2” part. If I’m going to be writing it myself anyway, might we want to use ElasticSearch instead?

ElasticSearch, like Solr, is an HTTP-api search engine app built on top of lucene. In some ways, I think Solr is a victim of being first. It’s got a lot more… legacy.  And different kinds of deployments it supports. (SolrCloud or not cloud? Managed schema or not? What?) Solr can sometimes seem to me like it gives you a billion ways to do whatever you want to do, but you’ve got to figure out which one to do (and whatever you choose may break some other feature). Whereas ElasticSearch just seems to be more straighfforrward. Or maybe that’s just that it seems to have better clearer documentation. It just seems less overwhelming, and I theoretically am familiar with Solr from years of use (but I always learned just enough to get by).

For whatever reasons, ElasticSearch seems to have possibly overtaken Solr in popularity, be, seems to be easier to pay someone else for a cloud-hosted PaaS instance at an affordable price, and seems to just generally be considered a bit easier to get started with.

I’ve talked to some other people in samvera space who are hypothetically considering ElasticSearch too, if they could do whatever they wanted (although I know of nobody actually moving forward with plans).

ElasticSearch at least historically didn’t have all the features and flexiblity of Solr, but it’s caught up a lot. Might it have everything we actually need for this digital collections app?

I’m not sure. Historically ES had problems with facets, and while it seems to have caught up a lot… they don’t work quite like Solr’s, and looking around to see if they do everything I need, it seems like there are a couple different ES features that approach Solr’s “facets”, and I’m not totally sure either does what I actually need (ordinary Solr facets: exact term counts, sorted by most-represented-term-first, within a result set).

It might! But really ES’s unfamiliarity is the biggest barrier. I’d have to figure out how to do things with slightly different featureset, and sometimes might find myself up against a brick wall, and am not sure I’d know that for sure until I’m in the middle of it. I have a pretty good sense of what Solr can do at this point, I know what I’m getting into.

(ES also maybe exposes different semantics around lucene ‘commits’? If you need synchronous “realtime” commits immediately visible on next query, I think maybe you can get that from ES, but I’m not 100% confident, it’s definitely not ES’s “happy path”. Historically samvera apps have believed they needed this; I’m not sure I do if I succesfully have search engine functionality resting more lightly on the app. But I’m not sure I don’t).

So what will I do?

I’m actually not sure, I’m a bit stumped.

I think going to ElasticSearch is probably too much for me right now, there’s too many balls in the air in rewriting this app to add in search engine software I’m not familiar with that may not have quite the same featureset.

But between using BL and doing it myself… I’m not sure, both offer some risks and possible rewards.

The fact that I can’t use the override-point in BL I was planning to, cause it’s gone from BL 7, annoys me and pushes me a bit more to consider a DIY approach. But I’m not sure if I’m going to regret that later. I might start out trying it out and seeing where it gets me… or I might just figure out how to hack in the rdbms-retrieval pattern I want into BL, even if it’s not pretty. I know want to write my display logic in terms of my ActiveRecord models, and with full access to ActiveRecord eager-loading to load any associated records I need (ala sunspot), instead of trying to jam it all into a Solr record in a denormalized fashion. Being able to get out of that by escaping from sufia/hyrax was one of the main attractions of doing so!

Our progress on new digital collections app, and introducing kithe

In September, I wrote a post on a “Proposed Rails-based digital collections developer’s toolkit”

What has happened since then?

Yes we decided to go ahead with a rewrite of our digital collections app, with the new app not based on Hyrax or Valkryie, but a persistence layer based on ActiveRecord (making use of postgres-specific features were appropriate), and exposing ActiveRecord models to the app as a whole.

No, we are not going forward with trying to make that entire toolkit”, with all the components mentioned there.

But Yes, unlike Alberta, we are taking some functionality and putting it in a gem that can be shared between institutions and applications. That gem is kithe. It includes some sharable modeling/persistence code, like Valkyrie (but with a very different approach than Valkyrie), but also includes some additional fundamental components too.

Scaling back the ambition—and abstraction—a bit

The total architecture outlined in my original post was starting to feel overwhelming to me. After all, we also need to actually produce and launch an app for ourselves, on a “reasonable” timeline, with fairly high chance of success.  I left my conversation with U Alberta (which was quite useful, thank you to the Alberta team!), concerned about potential over-reach and over-abstraction. Abstraction always has a cost and building shared components is harder and more time-consuming than building a custom app.

But, then, also informed by my discussion with Alberta,  I realized we basically just had to build a Rails app, and this is something I knew how to do, and we could, as we progressed, jetison anything that didn’t seem actually beneficial for that goal or seem feasible at the moment. And, also after discussion with a supportive local team, my anxiety about the project went down quite a bit — we can do this.

Even when writing the original proposal, I knew that some elements might be traps. Building a generalized ACL permissions system in an rdbms-based web app… many have tried, many have fallen. :)  Generalized controllers are hard, because they are a piece very tightly tied to your particular app’s UI flows, which will vary.

So we’ve scaled back from trying to provide a toolkit which can also be “scaffolding” for a complete starter app.  The goals of the original thought-experiment proposal — a toolkit which provides  pieces developers put together when building their own app — are better approached, for now, by scaling back and providing fewer shared tools, which we can make really solid.

After all, building shared code is always harder than building code for your app. You have more use cases to figure out and meet, and crucially, shared code is harder to change because it’s (potentially) got cross-institutional dependents, which you have to not break. For the code I am putting into kithe, I’m trying to make it solidly constructed and well-polished. In purely local code,  I’m more willing to do something experimental and hacky — it’s easy enough (comparatively!) to change local app code later.  As with all software, get something out there that works, iterating, using what you learn. (It’s just that this is a lot harder to do with shared dependencies without pain!)

So, on October 1st, we decided to embark on this project. We’re willing to show you our fairly informal sketch of a work plan, if you’d like to look.

Introducing kithe

But we’re not just building a local app, we are also trying to create some shareable components. While the costs and risks of shared code and abstractions are real,  I ultimately decided that “just Rails” would not get us to the most maintainable code after all. (And of course nothing is really just Rails, you are always writing code and using non-Rails dependencies; it’s a matter of degree, how much your app seems like a “typical” Rails app to developers).

It’s just too hard to model the data we ourselves already needed (including nested/compound/repeated models) in “just” ActiveRecord, especially in a way that lets you work with it sanely as “just” ActiveRecord, and is still performant. (So we use attr_json, which I also developed, for a No-SQLy approach without giving up rdbms or ActiveRecord benefits including real foreign-key-based associations). And in another example, ActiveStorage was not flexible/powerful enough for our file-handling needs (which are of course at the core of our domain!), and I wasn’t enthused about CarrierWave either — it makes sense to me to make some solid high-quality components/abstractions for some of our fundamental business/domain concerns, while being aware of the risks/costs.

So I’ve put into kithe the components I thought seemed appropriate on several considerations:

  • Most valuable to our local development effort
  • Handling the “trickiest” problems, most useful to share
  • Handling common problems, most likely to be shareable; and it’s hard to build a suite of things that work together without some modelling/persistence assumptions, so got to start there.
  • I had enough understanding of the use-cases (local and community) that I thought I could, if I took a reasonable amount of extra time, produce something well-polished, with a good developer experience, and a relatively stable API.

That already includes, in maybe not 1.0-production-ready but used in our own in-progress app and released (well-tested and well-documented) in kithe:

  • A modeling and persistence layer tightly coupled to ActiveRecord, with some postgres-specific features, and recommending use of attr_json, for convenient “NoSQL”-like modelling of your unique business data (in common with existing samvera and valkyrie solutions, you don’t need to build out a normalized rdbms schema for your data). With models that are samvera/PCDM-ish (also like other community solutions).
    • Including pretty slick handling of “representatives”, dealing with the performance issues in figuring out representative to display with constant query time (using some pg-specific SQL to look up and set “leaf” representative on save).
    • Including UUIDs as actual DB pk/fks, but also a friendlier_id feature for shorter public URL identifiers, with logic to automatically create such if you wish.
  • A nice helper for building Rails forms with repeatable complex embedded values. Compare to the relevant parts of hydra-editor, but (I think) lighter and more flexible.
  • A flexible file-handling architecture based on shrine — meaning transparent cloud-storage support out of the box.
    • Along with a new derivatives architecture, which seems to me to have the right level of abstraction and affordances to provide a “polished” experience.
    • All file-handling support based on assuming expensive things happen in the background, and “direct upload” from browser pre-form-submit (possibly to cloud storage)

It will eventually include some solr/blacklight support, including a traject-based indexing setup, and I would like to develop an intervention in blacklight so after solr results are returned, it immediately fetches the “hit” records from ActiveRecord (with specified eager-loading), so you can write your view code in terms of your actual AR models, and not need to duplicate data to solr and logic for dealing with it. This latter is taken from the design of sunspot.

But before we get there, we’re going to spend a little bit of time on purely local features, including export/import routines (to get our data into the new app; with some solid testing/auditing to be confident we have), and some locally bespoke workflow support (I think workflow is something that works best just writing the Rails). 

We do have an application deployed as demo/staging, with a basic more-than-just-MVP-but-not-done-yet back-end management interface (note: it does not use Solr/Blacklight at all which I consider a feature), but not yet any non-logged-in end-user search front-end. If you’d like a guest login to see it, just ask.

Technical Evaluation So Far

We’ve decided to tie our code to Rails and ActiveRecord. Unlike Valkyrie, which provides a data-mapper/repository pattern abstraction, kithe expects the dependent code to use ActiveRecord APIs (along with some standard models and modelling enhancements kithe gives you).

This means, unlike Valkyrie, our solution is not “persistence-layer agnostic”. Our app, and any potential kithe apps, are tied to Rails/ActiveRecord, and can’t use fedora or other persistence mechanisms. We didn’t have much need/interest in that, we’re happy tying our application logic and storage to ActiveRecord/postgres, and perhaps later focusing on regularly exporting our data to be stored for preservation purposes in another format, perhaps in OCFL.

It’s worth noting that the data-mapper/repository pattern itself, along the lines valkyrie uses, is favored by some people for reasons other than persistence-swapability. In the Rails and ruby web community at large, there is a contingent that think the data-mapper/repository pattern is better than what Rails gives you, and gives you better architecture for maintainable code. Many of this contingent is big on hanami, and the dry-rb suite.  (I have never been fully persuaded by this contingent).

And to be sure, in building out our approach over the last 4 months, I sometimes ran right into the architectural issues with Rails “model-based” architecture and some of what it encourages like dreaded callbacks.  But often these were hypothetical problems, “What if someone wanted to do X,” rather than something I actually needed/wanted to do now. Take a breath, return to agility and “build our app”.

And a Rails/ActiveRecord-focused approach has huge advantages too. ActiveRecord associations and eager-loading support are very mature and powerful tools, that when exposed to the app as an API give you very mature, time-tested tools to build your app flexibly and performantly (at least for the architectures our community are used to, where avoiding n+1 queries still sometimes seems like an unsolved problem!).  You have a whole Rails ecosystem to rely on, which kithe-dependent apps can just use, making whatever choices they want (use reform or not?) as with most any Rails app, without having to work out as many novel approaches or APIs. (To be sure, kithe still provides some constraints and choices and novelty — it’s a question of degree).

Trying to build up an alternative based on data-mapper/repository, whether in hanami or valkyrie, I think you have a lot of work to do to be competitive with Rails mature solutions, sometimes reproducing features already in ActiveRecord or it’s ecosystem. And it’s not just work that’s “time implementing”, it’s work figuring out the right APIs and patterns. Hanami, for instance, is probably still not as mature, as Rails, or as easy to use for a newcomer.

By not having to spend time re-inventing things that Rails already has solutions for, I could spend time on our actual (digital collections) domain-specific components that I wasn’t happy with existing solutions for. Like spending time on creating shareable file handling and derivatives solutions that seem to me to be well-polished, and able to be used for flexible use-cases without feeling like you’re fighting the system or being surprised by it. Components that hopefuly can be re-used by other apps too.

I think schneem’s thoughts on “polish” are crucial reading when thinking about the true costs of shared abstractions in our community.  There is a cost to additional abstractions: in initial implementation, ongoing maintenance, developer on-boarding, and just figuring out the right architectures and APIs to provide that polish. Sometimes these costs are worthwhile in delivered benefits, of course.

I’d consider our kithe-based approach to be somewhere in between U Alberta’s approach and valkryie, in the dimension of “how close do we stick to and tie our line to ‘standard’ Rails”.

Unlike Hyrax, we are building our own app, not trying to use a shared app or “solution bundle” like Hyrax. I would suggest we share that aspect with both the U Alberta approach as well as the several institutions building valkyrie-not-hyrax apps. But if you’ve had good experiences with the over-time maintenance costs of Hyrax, you have a use case/context where Hyrax has worked well for you — then that’s great, and there’s never anything wrong with doing what has worked for you.

Overall, 4 months in, while some things have taken longer to implement than I expected, and some unexpected design challenges have been encountered — I’m still happy with the approach we are taking.

If you are considering a based-on-valkyrie-no-hyrax approach, I think you might be in a good position to consider a kithe approach too.

How do we evaluate success?

Locally,

We want to have a replacement app launched in about a year.

I think we’re basically on target, although we might not hit it on the nose, I feel confident at this point that we’re going to succeed with a solid app, in around that timeline. (knock on wood).

When we were considering alternate approaches before committing to this one, we of course tried to compare how long this would take to various other approaches. This is very hard to predict, because you are trying to compare multiple hypotheticals, but we had to make some ballpark guesses (others may have other estimates).

Is this more or less time than it would have taken to migrate our sufia app to current hyrax? I think it’s probably taking more time to do it this new way, but I think migrating our sufia app to current hyrax (with all it’s custom functionality for current features) would not have been easy or quick — and we weren’t sure current hyrax was a place we wanted to end up.

Is it going to take more or less time than it would have taken to write an app on valkyrie, including any work we might contribute to valkyrie for features we needed? It’s always hard to guess these things, but I’d guess in the same ballpark, although I’m optimistic the “kithe” approach can lead to developer time-savings in the long-run.

(Of course, we hope if someone else wants to follow our path, they can re-use what’s now worked out in kithe to go quicker).

We want it to be an app whose long-term maintenance and continued development costs are good

In our sufia-based app, we found it could be difficult and time-consuming to add some of the features we needed. We also spent a lot of time trying to performance-tune to acceptable levels (and we weren’t alone), or figure out and work towards a manageable and cost-efficient cloud deployment architecture.

I am absolutely confident that our “kithe” approach will give us something with a lower TCO (“total cost of ownership”) than we had with sufia.

Will it be a lower TCO than if we were on the present hyrax (ignoring how to get there), with our custom features we needed? I think so, and that current hyrax isn’t different enough from sufia we are used to — but again this is necessarily a guess, and others may disagree. In the end, technical staff just has to make their best predictions based on experience (individual and community).  Hyrax probably will continue to improve under @no-reply’s steady leadership, but I think we have to make our decisions on what’s there now, and that potential rosey future also requires continued contribution by the community (like us) if it is to come to fruition, which is real time to be included in TCO too.   I’m still feeling good about the “write our own app” approach vs “solution bundle”.

Will we get a lower TCO than if we had a non-hyrax valkyrie-based app? Even harder to say. Valkryie has more abstractions and layers that have real ongoing maintenance costs (that someone has to do), but there’s an argument that those layers will lower your TCO over the long-term. I’m not totally persuaded by that argument myself, and when in doubt am inclined to choose the less-new-abstraction path, but it’s hard to predict the future.

One thing worth noting is the main thing that forced our hand in doing something with our existing sufia-based app is that it was stuck on an old version of Rails that will soon be out-of-support, and we thought it would have been time-consuming to update, one way or another.  (When Rails 6.0 is released, probably in the next few months, Rails maintenance policy says nothing before 5.2 will be supported.) Encouragingly, both kithe and attr_json dependency (also by me), are testing green on Rails 6.0 beta releases — and, I was gratified to see, didn’t take any code changes to do so, they just passed.  (Valkyrie 1.x requires Rails 5.1, but a soon-to-be-released 2.0 is planned to work fine up to Rails 6; latest hyrax requires Rails 5.1 as well, but the hyrax team would like to add 5.2 and 6 soon).

We want easier on-boarding of new devs for succession planning

All developers will leave eventually (which is one reason I think if you are doing any local development, a one-developer team is a bad idea — you are guaranteeing that at some point 100% of your dev team will leave at once).

We want it to be easier to on-board new developers. We share U Alberta’s goal that what we could call a “typical Rails developer” should be able to come on and maintain and enhance the app.

Are we there? Well, while our local app is relatively simple rails code (albeit using kithe API’s), the implementation of  kithe and attr_json, which a dev may have to delve into, can get a bit funky, and didn’t turn out quite as simple as I would have liked.

But when I get a bit nervous about this, I reassure myself remembering that:

  • a) Our existing sufia-based app is definitely high-barrier for new devs (an experience not unique to us), I think we can definitely beat that.
    • Also worth pointing out that when we last posted a position, we got no qualified applicants with samvera, or even Rails, experience. We did make a great hire though, someone who knew back-end web dev and knew how to learn new tools; it’s that kind of person that we ideally need our codebase to be accessible to, and the sufia-based one was not.
  • b) Recruiting and on-boarding new devs is always a challenge for any small dev shop, especially if your salaries are not seen as competitive.  It’s just part of the risk and challenge you accept when doing local development as a small shop on any platform. (Whether that is the right choice is out of scope for this post!)

I think our code is going to end up more accessible to actually-existing newly onboarded devs  than a customized hyrax-based solution would be. More than Valkyrie? I do think so myself, I think we have fewer layers of “specialty” stuff than valkyrie, but it’s certainly hard to be sure, and everyone must judge for themselves.

I do think any competent Rails consultancy (without previous LAM/samvera expertise) could be hired to deal with our kithe-based app no problem; I can’t really say if that would be true of a Valkyrie-based app (it might be); I do not personally have confidence it would be true of a hyrax-based app at this point, but others may have other opinions (or experience?).

Evaluating success with the community?

Ideally, we’d of course love it if some other institutions eventually developed with the kithe toolkit, with the potential for sharing future maintenance of it.

Even if that doesn’t happen, I don’t think we’re in a terrible place. It’s worth noting that there has been some non-LAM-community Rails dev interest in attr_json, and occasional PRs; I wouldn’t say it’s in a confidently sustainable place if I left, but I also think it’s code someone else could step into and figure out. It’s just not that many lines of code, it’s well-tested and well-documented, and and i’ve tried to be careful with it’s design — but take a look at and decide for yourself!. I can not emphasize enough my belief that if you are doing local development at all (and I think any samvera-based app has always been such), you should have local technical experts doing evaluation before committing to a platform — hyrax, valkyrie, kithe, entirely homegrown, whatever.

Even if no-one else develops with kithe itself, we’d consider it a success if some of the ideas from kithe influence the larger samvera and digital collections/repository communities. You are welcome to copy-paste-modify code that looks useful (It’s MIT licensed, have at it!). And even just take API ideas or architectural concepts from our efforts, if they seem useful.

We do take seriously participating in and giving back to the larger community, and think trying a different approach, so we and others can see how it goes, is part of that. Along with taking the extra time to do it in public and write things up, like this. And we also want to maintain our mutually-beneficial ties to samvera and LAM technologist communities; even if we are using different architectures, we still have lots of use-cases and opportunities for sharing both knowledge and code in common.

Take a look?

If you are considering development of a non-Hyrax valkyrie-based app, and have the development team to support that — I believe you have the development team to support a kithe-based approach too.

I would be quite happy if anyone took a look, and happy to hear feedback and have conversations, regardless of whether you end up using the actual kithe code or not. Kithe is not 1.0, but there’s definitely enough there to check it out and get a sense of what developing with it might be like, and whether it seems technically sound to you. And I’ve taken some time to write some good “guide” overview docs, both for potential “onboarding” of future devs here, and to share with you all.

We have a staging server for our in-development app based on kithe; if you’d like a guest login so you can check it out, just ask and I can share one with you.

Our local app also should also probably be pretty easy for you to get installed (with dependencies) from a git checkout, and just run it and see how it goes. See: https://github.com/sciencehistory/scihist_digicoll/

Hope to hear from you!

Browser dominance, standards setting, and WHATWG vs W3C

Reda Lemeden writes a warning note about what Chrome’s dominance means for the “Web as an open platform”, in “We Need Chrome No More.”

Lemeden doesn’t mention WHATWG, but in retrospect, I think the practical shifting of web-standards-setting from an at least possibly neutral standards body representing multiple interests (W3C) to a a body wholly controlled by browser-vendors (WHATWG)… may have been good for speed of “innovation” for a time, but was in the long-term not good for the “Web as an open platform” in Lemeden’s phrase.  Lemeden writes:

Making matters worse, the blame often lands on other vendors for “holding back the Web”. The Web is Google’s turf as it stands now; you either do as they do, or you are called out for being a laggard.

Indeed, I think it’s the structural politics of WHATWG that make that hard to counter. WHATWG was almost founded on the principles of “not being a laggard” and “doing what we do”. When there were several browser-vendors with roughly equal market power they could counter-balance each other, but when there’s an elephant in the room…

That is, the W3C folks that were accused of “holding back the web” while trying to keep standards setting from going to to the “faster” WHATWG… were perhaps correct all along.

People can disagree, but 10-15 years on, I think we’re overdue a larger discussion and retrospective evaluation of the consequences of the WHATWG “coup”. I haven’t seen much discussion of this yet.

On code-craft, and writing code for other programmers to use

The New Yorker this week has a profile of Google programmer pair Jeff Dean and Sanjay Ghemawat — if the annoying phrase “super star programmer” applies to anyone it’s probably these guys, who among other things conceived and wrote the original Google Map Reduce implementation–  that includes some comments I find unusually insightful about some aspects of the craft of writing code. I was going to say “for a popular press piece”, but really even programmers talking to each other don’t talk about this sort of thing much. I recommend the article, but was especially struck by this passage:

At M.I.T., [Sanjay’s] graduate adviser was Barbara Liskov, an influential computer scientist who studied, among other things, the management of complex code bases. In her view, the best code is like a good piece of writing. It needs a carefully realized structure; every word should do work. Programming this way requires empathy with readers. It also means seeing code not just as a means to an end but as an artifact in itself. “The thing I think he is best at is designing systems,” Craig Silverstein said. “If you’re just looking at a file of code Sanjay wrote, it’s beautiful in the way that a well-proportioned sculpture is beautiful.”

…“Some people,” Silverstein said, “their code’s too loose. One screen of code has very little information on it. You’re always scrolling back and forth to figure out what’s going on.” Others write code that’s too dense: “You look at it, you’re, like, ‘Ugh. I’m not looking forward to reading this.’ Sanjay has somehow split the middle. You look at his code and you’re, like, ‘O.K., I can figure this out,’ and, still, you get a lot on a single page.” Silverstein continued, “Whenever I want to add new functionality to Sanjay’s code, it seems like the hooks are already there. I feel like Salieri. I understand the greatness. I don’t understand how it’s done.”

I aspire to write code like this, it’s a large part of what motivates me and challenges me.

I think it’s something that (at least for most of us, I don’t know about Dean and Ghemawat), can only be approached and achieved with practice — meaning both time and intention. But I think many of the environments that most working programmers work in are not conducive to this practice, and in some cases are actively hostile to it.  I’m not sure what to think or do about that.

It is most important when designing code for re-use, when designing libraries to be used in many contexts and by many people.  If you are only writing code for a particular business “seeing code not just as a means to an end but as an artifact in itself” may not be what’s called for.  It really is a means to an end of the business purposes. Spending too much time on “the artifact itself”, I think, has a lot of overlap with what is often derisively called “bike-shedding”.  But when creating an artifact that is intended to be used by lots of other programmers in lots of other contexts to build things to meet their business purposes — say, a Rails… or a samvera — “empathy with readers” (which is very well-said, and very related to:) and creating an artifact where “it seems like the hooks are already there” are pretty much indispensable to creating something successful at increasing the efficiency and success of those developers using the code.

It’s also not easy even if it is your intention, but without the intention, it’s highly unlikely to happen by accident. In my experience TDD can (in some contexts) actually be helpful to accomplishing it — but only if you have the intention, if you start from developer use-cases, and if you do the “refactor” step of “red-green-refactor”.  Just “getting the tests to pass” isn’t gonna do it. (And from the profile, I suspect Dean and Ghemawat may not write tests at all — TDD is neither necessary nor sufficient).  That empathy part is probably necessary — understanding what other programmers are going to want to do with your code, how they are going to come to it, and putting yourself in their place, so you can write code that anticipates their needs.

I’m not sure what to do with any of this, but I was struck by the well-written description of what motivates me in one aspect of my programming work.

“Against software development”

Michael Arntzenius writes:

Beautiful code gets rewritten; ugly code survives.

Just so, generic code is replaced by its concrete instances, which are faster and (at first) easier to comprehend.

Just so, extensible code gets extended and shimmed and customized until under its own sheer weight it collapses, then replaced by a monolith that Just Works.

Just so, simple code grows, feature by creeping feature, layer by backward-compatible layer, until it is complicated.

So perishes the good, the beautiful, and the true.

In this world of local-optimum-seeking markets, aesthetics alone keep us from the hell of the Programmer-Archaeologist.

Code is limited primarily by our ability to manage complexity. Thus,

Software grows until it exceeds our capacity to understand it.

HackerNews discussion. 

Ruby Magic helps sponsor Rubyland News

I have been running the Rubyland.news aggregator for two years now, as just a hobby spare time thing. Because I wanted a ruby blog and news aggregator, and wasn’t happy with what was out there then,  and thought it would be good for the community to have it.

I am not planning or trying to make money from it, but it does have some modest monthly infrastructure fees that I like getting covered. So I’m happy to report that Ruby Magic has agreed to sponsor Rubyland.news for a modest $20/month for six months.

Ruby Magic is an email list you can sign up for for occasional emails about ruby. They also have an RSS feed, so I’ve been able to include them on Rubyland.news for some time.  I find their articles to often be useful introductions or refreshers to particular topics about ruby language fundamentals. (It tends not to be about Rails, I know some people appreciate some non-Rails-focused sources of ruby info).  Personally, I’ve been using ruby for years, and the way I got as comfortable with it as I am is by always asking “wait, how does that work then?” about things I run into, always being curious about what’s going on and what the alternatives are and what tools are available, starting with the ruby language itself and it’s stdlib.

These days, blogging, on a platform with an RSS feed too, seems to have become a somewhat rarer thing, so I’m also grateful that Ruby Magic articles are available through RSS feed, so I can include then in rubyland.news. And of course for the modest sponsorship of Rubyland.news, helping to pay infrastructure costs to keep the lights on.  As always, I value full transparency in any sponsorship of rubyland.news; I don’t intend it to affect any editorial policies (I was including Ruby Magic feed already); but I will continue to be fully transparent about any sponsorship arrangements and values, so you can judge for yourself (a modest $20/month from Ruby Magic; no commitment beyond a listing on About page, and this particular post you are reading now, which is effectively a sponsored post).

I also just realized I am two years into Rubyland.news. I don’t keep usage analytics (was too lazy to set it up, and not entirely clear how to do that in case where people might be consuming it as an RSS feed itself), although it’s got 156 followers on it’s twitter feed (all aggregated content is also syndicated to twitter, which I thought was a neat feature).  I’m honestly not sure how useful it is to anyone other than me, or what people changes people might want; feedback is welcome!