yes, product owner and technical lead need to be different people

I used to disagree with this conventional wisdom, and think I could be both. I now realize in retrospect that’s because I was in an environment where I basically had no choice but to be both.  Or at least where I didn’t trust anyone who might step in to be product owner to actually take responsibility for it and do it right.

Having had the experience of being de facto the technical lead (only engineer or most experienced engineer of a very very small team) on a project/program with a very responsible and effective de facto product owner, I see I was totally wrong.

The typical argument in favor of these roles being separated is that the technical lead/engineer is too close to the code to really understand the needs of stakeholders (customers, the organization, politics within the organization, whatever) and fill a product owner role. I think that argument definitely has a lot of merit, but if a technical lead also has a lot of domain knowledge, and spends a lot of time with stakeholders (and/or hearing from UX people), and has a lot of skill, couldn’t they maybe do both things effectively in the same person?  And might I be such a person who can pull it off?  It’s a challenge, but maybe.

But. The real reason I’ve seen that this is no good, is that there’s no way for me to stay sane doing that.  There are just too many things to worry about. Instead: One person (“product owner”) to decide what is to be done (in large part consisting of prioritization, and deciding when a feature is “good enough” and when it is not), and another separate person to decide how to do it (the technical lead).

Then I, the technical lead, can spend my time worrying about (or we could say ‘planning’ if I wasn’t such a worrier!) the technical decisions — whether we’re really doing it right, whether this is the most efficient or lowest TCO way to accomplish something, technical debt, choice of dependencies, how to set up our current work to provide the right platform/abstractions for future work, etc.   Without having to worry about if we’ve chosen to do the right things, or being responsible for those decisions, or being called to account for making them “wrong” by internal stakeholders.

I just can’t do both and stay sane. And I suspect this isn’t just me.  I realize now that in the former position where I was doing both and thought I was doing okay at it — I was not staying sane, and my increasing feelings of loss of control over things effected team and organizational dynamics negatively. It wasn’t healthy for anyone.

To be sure, the product owner and technical lead should be in close communication, and feeding back on each other. I still don’t believe it should be a one-way power dynamic, where the product owner simply sets down the plan and the technical team implements it. The product owner’s decisions should be influenced by feedback from the technical team/lead, on feasibility, estimated costs/time, and even just their own ideas about how to meet stakeholder needs, among other things.  And the product owner should ideally have some high-level conceptual understanding of how the engineering works. But they’ve got to be different people in different roles, so they can each focus on their area of responsibility


improving citation export in a sufia/hyrax app

Our app is based on Sufia 7, but I believe the relevant parts are mostly true for Hyrax as well; if I know they aren’t, I’ll try to make a note of it.

The out of the box Sufia app offers three citation export links on an individual item (ie Work) page, for: Endnote, Zotero, and Mendeley.

The Zotero and Mendeley links just take you to a page that says:

Exporting to Zotero[Mendeley] is supported via embedded metadata. If Zotero[Mendeley] does not automatically pick up metadata for deposited files, please report the issue via the <%= link_to ‘Contact Form’, sufia.contact_form_index_path %>.

I believe the automatic metadata pickup is supposed to be via COinS.  Putting aside that that is a bit weird UX there, Zotero’s “Save to Zotero” button did do something with “Embedded Metadata”, but didn’t really pick up all the metadata we’d want. I think this is because we hadn’t properly configured all our local custom metadata fields to work with COinS, which I believe in Sufia is done via Rails i18n, and in hyrax by a different mechanism.

I didn’t get to the bottom of this, because either way, COinS isn’t really granular/specific enough to get all the metadata we have as good as it can be for a reference management application — there’s no way to say type “Manuscript”, or provide archival arrangement/location (box/folder).  I’m not sure if there’s a way to send abstract or subject/keywords (which users appreciate included in their export to reference manager, even though they aren’t part of a citation) — and the link I used to use to check what fields are available in standard OpenURL metadata (on which COinS is based) are giving me 404 errors from OCLC today.  Oh, and did I mention that COinS (if not OpenURL itself) is kind of an abandonware standard, the site that documents the standard is currently only available in internet archive wayback machine.

The EndNote export was also not including all of our possible metadata as well as it could be. I’m not sure where I’d customize this for our local fields, perhaps I need to override the Sufia::SolrDocument::Export class; not really sure what’s going on there. But looking at that class suggests that the format it’s calling “EndNote” is this one , which I think is now more commonly called “Endnote Tagged Format” (although I can’t find a reference for that), as distinct from Endnote XML, which I’m also having trouble finding documentation for.

Rather than trying to get each of these existing logic paths working, we decided to initially replace with…

Replace with RIS for everyone

RIS is the closest thing to a “lingua franca” among reference management software. While it is also an abandoned standard (wikipedia links to this capture), pretty much every reference management software can handle it, and in fairly compatible/standard ways — I think mainly due to every new reference management software trying to be compatible with the current market leader at the point it was introduced, all the way back to the no-longer-existing software that originated RIS.

For the same reasons, it seems to be relatively close to the internal data models of most reference management software.  It’s annoying in some ways, including (did we mention) that it’s an unmaintained abandonware standard, there are (undocumented) minor differences between how different software handles it on import, and the same ‘tag’ in RIS can be interpreted differently depending on the ‘type’ of the reference. (Oh, and there’s a limited number of ‘types’, not suitable to the full diversity of the modern digital archive, or even for all types found in modern reference management software!).

But it’s way more expressive than COinS, and close to as expressive as Endnote Tagged Format (probably just as good for the actual metadata we have), and there’s not much better.

And it’s super convenient to be able to write one export which will work with all reference management software, rather than spend extra time (we can’t necessarily afford) to do a custom export for every possible software (and over the past decade the “popular” software has changed several times, and may vary in different disciplines — but they all do RIS).

When I asked in the Zotero forums (the Zotero people are great and tend understand the ecosystem way beyond just their software, as domain experts in a way many of us don’t)  if there was a better format to use for a ‘generic’ import to multiple reference management systems, or even a better format to use just for Zotero, @adamsmith replied:

There is indeed no useful bibliographic exchange format. It’s a fairly ridiculous situation. You’ll get the best import into Zotero using Zotero RDF, but a) that isn’t well documented and b) it’ll probably be replaced with a JSON-LD/ based schema in the not-too-distant future, so I wouldn’t invest heavily in implementing it. Endnote XML is marginally better documented and, by virtue of being XML, more robust, so that might be worth it. BibLaTeX is very precise and exceedingly well documented, but I don’t think many tools other than Zotero do very well importing it (and I don’t know _how_ well Zotero does — most people use this the other way from Zotero to BibLaTeX).

(EndNote XML didn’t look to me significantly more powerful or convenient than RIS for the sorts of data we have, although it’s more straightforward in some ways. Not sure if it has as universal adoption).

In general, if you download an RIS file, and double-click on it, it will open in your installed reference software of choice (or, as in Firefox, depending on your browser and browser preferences, open immediately in your reference manager software without having to find the file and double-click on it). If you have the Zotero Chrome extension installed, it will (at first ask to) “intercept” an RIS download (with proper MIME/IANA content-type header) and immediately send it to Zotero, even though Chrome doesn’t ordinarily do that.

So, rather than figure out how the current Sufia citation export stuff worked to make it work better for us and/or try to improve or expand it, we decided to try replacing the built-in stuff with our own RIS implementation.

Our implementation

I basically just created a ruby class that can take one of our Sufia ‘work’ models, and translate it to RIS — not really all that hard.  Thinking of working towards something shareable, I did split my implementation into a base class that sets up some tools for defining mappings, and a concrete sub-class that defines the mappings.

I originally intended to allow the mappings to look up attributes based on RDF predicates, which might theoretically make it possible to share mappings with more likely chance of working across projects. But I see now I never actually implemented that feature, oops. (And it’s unclear how/if this kind of rdf-predicate-to-model-attribute lookup would work in a valkyrie-based app like planned hyrax 3.0, or if it would be possible to make it work in a standard way).

Then just register the RIS mime type; hook into a CurationConcerns method to have the work show method deliver the RIS using our serializer; generate an on-page link to that action in our already customized view; and that’s pretty much it.

Some interesting parts:

  • In our mappings, we put archival location information in both “AV” and “VL” tags, because in my experimentation different software seemed to at least sometimes use each.
  • In RIS “M2” field (“Miscellaneous 2” says RIS), which Zotero imports as “Extra” (and Endnote I think something similar), we put our recommended “Courtesy of Science History Institute” statement, as well as any rights information we have.
  • When we can’t determine a great RIS “type” for the citation, we default to “MANSCPT” (Manuscript), some advice I found suggested this tends to be the one that will most reliably get archival-relevant fields and output citation formats in reference management software, and much/most of our content is unpublished in a mass edition for general distribution (whether technically a ‘manuscript’ or not).
  • We create a filename for the downloaded RIS file that includes the first three words of the title as well as the internal ID.  Users validate they appreciate this, so they can figure out what the file is on their disk if needed. Refactored some of the code I was previously using for derivative download names to do similar, to be reusable in this context.

You can take a look at the PR with initial implementation of this feature in our app if you like.  Reviewing it now, looks like that PR accidentally ended up with a new file that is unrelated, and really from a different feature, at `app/views/application/_query_constraint_as_form.html.erb`, oops sorry.  I see now too there is only a limited “smoke test” spec for “converts without raising any exceptions”, so it goes.

How did it turn out? Future improvement?

As I write this, we just now deployed to production, but we earlier did some user testing with several users in a feature demo.  In general, we found out that users have pretty low expectations when it comes to citation export, they are used to it not working perfectly, and most users asked found our system to work at least as well as their expectations of an automated reference export, and often better.  I feel good about the RIS direction as an efficient use of developer time to get pretty decent citation export feature.

There are a couple of outstanding issues:

Child works

We have some things that are ‘works’ in sufia, but are really excerpts from the “work” that should be cited in the reference.  At the moment sometimes we have that ‘parent’ work stored in our Sufia repo, we sometimes don’t.  Our RIS export feature never takes it into account though, it always exports the citation as if it’s a standalone thing based on the title of the ‘work’ in sufia, even if there’s really a parent ‘container’ work that the reference should be based on.  This is a bit hard to get right for both metadata reasons (we might not have sufficient machine-readable metadata in all cases to determine correct citation), and technical reasons (sufia doesn’t make it super easy to get access to parent information in an efficient/performant way).

Zotero toolbar button

If you actually click on the “Export citation” button, it generally gets into Zotero fine. (On Chrome, need Zotero plugin installed; on Firefox with plugin or need to tell Firefox the first time to open .ris with Zotero). But if you have the Zotero browser plugin installed, you have a “Save to Zotero” button in toolbar.  Using this one imports into Zotero as a “web page” (rather than correct citation type for the reference; our users generally wanted reference types based on the original item, not ‘web page’), and with stunted/limited metadata.  (In our case Zotero is picking up “Embedded Metadata” from somewhere, not sure in what format, it was not intentional by me; but if it were not the metadata would be no better).

One of our test users tried this, and was disappointed.

Zotero supports a couple generic options for getting the “Save to Zotero” button to pick up embedded metadata.  COinS, as mentioned, isn’t really expressive enough for our metadata. I’m not sure what they mean about “META tags”, but possibly only applies to RDF? (And I would not be thrilled about figuring out right RDF vocab for Zotero to pick up, and doing the translation). That seems to leave unAPI, from which we could actually expose/re-use our now-existing RIS, great. UnAPI is another kind of abandoned standard, and based on kind of a mis-use of HTML too with possible accessibility concerns. ☹️  It wouldn’t be that hard to implement, but even easier would be if Zotero would just pick up HTML <link rel="alternate" type="type"> tags for Zotero-recognized types. Zotero doesn’t do that at present, but when I asked, there seems to be some support for the idea of it it doing so, with some details (as well as implementation!) to be worked out. (Also, can I say again I love how responsive Zotero devs are on the Zotero forums?).

Of course, if we had no “Export citation” button deployed, the “Save to Zotero” button provided by Zotero plugin would still be there, and still behave unsatisfactorily.

But Deployed

Based on consultation with potential users, we didn’t consider either of these problems severe enough to delay release of our RIS export button, although we’ve made a note of them as possible future improvement to prioritize.  You can see the RIS export feature in action in our current production system, on any individual item page, such as this one, look for the “Export citation” button.


attachment filename downloads in non-ascii encodings, ruby, s3

You tell the browser to force a download, and pick a filename for the browser to ‘save as’ with a Content-Disposition header that looks something like this:

Content-Disposition: attachment; filename="filename.tiff"

Depending on the browser, it might open up a ‘Save As’ dialog with that being the default, or might just go ahead and save to your filesystem with that name (Chrome, I think).

If you’re having the user download from S3, you can deliver an S3 pre-signed URL that specifies this header — it can be a different filename than the actual S3 key, and even different for different users, for each pre-signed URL generated.

What if the filename you want is not strictly ascii? You might just stick it in there in UTF-8, and it might work just fine with modern browsers — but I was doing it through the S3 content-disposition download, and it was resulting in S3 delivering an XML error message instead of the file, with the message “Header value cannot be represented using ISO-8859-1.response-content-disposition”.

Indeed, my filename in this case happened to have a Φ (greek phi) in it, and indeed this does not seem to exist as a codepoint in ISO-8859-1 (how do I know? In ruby, try `”Φ”.encode(“ISO-8859-1”)`, which perhaps is the (standard? de facto?) default for HTTP headers, as well as what S3 expects. If it was unicode that could be trans-coded to ISO-8859-1, would S3 have done that for me? Not sure.

But what’s the right way to do this?  Googling/Stack-overlowing around, I got different answers including “There’s no way to do this, HTTP headers have to be ascii (and/or ISO-8859-1)”, “Some modern browsers will be fine if you just deliver UTF-8 and change nothing else” [maybe so, but S3 was not], and a newer form that looks like filename*=UTF-8''#{uri-encoded ut8} [no double quotes allowed, even though they ordinarily are in a content-disposition filename] — but which will break older browsers (maybe just leading to them ignoring the filename rather than actually breaking hard?).

The golden answer appears to be in this stackoverflow answer — you can provide a content-disposition header with both a filename=$ascii_filename (where $filename is ascii or maybe can be ISO-8859-1?), followed by a filename*=UTF-8'' sub-header. And modern browsers will use the UTF-8 one, and older browsers will use the ascii one. At this point, are any of these “older browsers” still relevant? Don’t know, but why not do it right.

Here’s how I do it in ruby, taking input and preparing a) a version that is straight ascii, replacing any non-ascii characters with _, and b) a version that is UTF-8, URI-encoded.

ascii_filename = file_name.encode("US-ASCII", undef: :replace, replace: "_")
utf8_uri_encoded_filename = URI.encode(filename)

something["Content-Disposition"] = "attachment; filename=\"#{ascii_filename}\"; filename*=UTF-8''#{utf8_uri_encoded_filename}"

Seems to work. S3 doesn’t complain. I admit I haven’t actually tested this on an “older browser” (not sure how old one has to go, IE8?), but it does the right thing (include the  “Φ ” in filename) on every modern browser I tested on MacOS, Windows (including IE10 on Windows 7), and Linux.

One year of the aggregator

It’s been a year since I launched, my sort of modern take on a “planet” style aggregator of ruby news and blog RSS/atom feeds.

Is there still a place for an RSS feed aggregator in a social media world? I think I like it, and find it a fun hobby/side project regardless. And I’m a librarian by training and trade, and just feel an inner urge to collect, aggregate, and distribute information, heh. But do other people find it useful? Not sure!  You can (you may or may not have known) follow on twitter instead, and it’s currently got 86 followers, that’s probably a good sign. I don’t currently track analytics on visits to the http page. It’s also possible to follow through it’s own aggregated RSS feed, which would be additionally hard to track.

Do you use it or like it? I’d love for you to let me know.

Thoughts on a year of developing/maintaining

I haven’t actually done too much maintenance, it kind of just keeps on chugging. Which is great.  I had originally planned to add a bunch of features, mainly including an online form to submit suggested feeds to include, and an online admin interface for me to approve and otherwise manage feeds. Never got to it, haven’t really needed it — it would take a lot of work over the no-login-no-admin-screen thing that’s there now, and adding feeds with a rake task has worked out fine. heroku run rake feeds:add[http://some/feed.rss], no problem.  So just keep feeling free to email me if you have a suggestion please. So far, I don’t get too many such suggestions, but I myself keep an eye on /r/reddit and add blogs when I see an interesting post from one of them there. I haven’t yet removed any feeds, but maybe I should; inactivity doesn’t matter too much, but feeds sometimes drift to no longer be so much about ruby.

If I was going to do anything at this point, it’d probably trying to abstract the code a bit so I can use it for other aggregators, with their own names and CSS etc.

It’s kind of fun to have a very simple Rails app for a change. I’m not regretting using Rails here, I know Rails, and it works fine here (no performance problems, I’m just caching everything aggressively with Rails fragment caching, I don’t even bother with a CDN. Unless I set up cloudflare and forgot? I forget. The site only has like 4 pages!). I can do things like my first upgrade of an app to Rails 5.1 in a very simple but real testbed. (It was surprisingly not quite as trivial as I thought even to upgrade this very simple app from rails 5.0 to 5.1. Of course, that ended up not being just Rails 5.1, but doing things like switching to heroku’s supported free-for-hobby-dyno SSL endpoint (the hacky way it was doing it before no longer worked with rails 5.1), and other minor deferred maintenance.  Took a couple hours probably.

It’s fun working with RSS/Atom feeds, I enjoy it. Remember that dream of a “Web 2.0” world that was all about open information sharing through APIs?  We didn’t really get that, we got walled garden social media instead. (More like gated plantations than walled gardens actually, a walled garden sounds kind of nice and peaceful).

But somehow we’ve still got RSS and Atom, and they are still in fairly widespread use. So I get to kind of pretend I’m still in that world. They are in fairly widespread use… but usually as a sort of forgotten unmaintained stepchild.  There are lacks of specification in the specifications that will never be filled in, and we get to deal with it. (Can a ‘title’ be HTML, or must it be plain text?  If it’s HTML, is there any way to know it is? Nope, not really). I run into all kinds of weirdness — can links in a feed be relative urls? If so, they are supposed to be… relative to what? You might think the feed url… but that’s not always how they go. I get to try to work around them all, which is kinda fun. Or sometimes ‘fun’.

I wish people would offer more tagged/subsection feeds, those seem pretty rare still. I wish medium would offer feeds that worked at all, they don’t really — medium has feeds for a person, but they include both posts and comments with no ways to distinguish, and are thus pretty useless for an aggregator. (I don’t want your out of context two-line comments in my aggregator).

I also get to do fun HTTP/REST kind of stuff — one of the reasons I chose to use Rails with a database as a backend, so I can keep state, is so I can actually do conditional GET requests of feeds and only fetch if a feed has changed. Around 66% of the feed URLs actually provide etags or last-modified so I can try. Then every once in a while I see a feed which reports “304 Not Modified” but it’s a lie, there is new content, the server is just broken. I usually just ignore em.

Keeping state also lets me refuse to let a site post-date it’s entries to keep em at the top of the list, and generally lets me keep the aggregated list in a consistent and non-changing order even if people change their dates on their posts. Oh, dealing with dates is another ‘fun’ thing, people deliver dates in all sorts of formats, with and without timezones, with and without times (just dates), I got to try to normalize them all somewhat to keep things in a somewhat expected and persistent newest-on-top order. (in which state is also helpful, because I can know when I last fetched a feed, and what entries are actually new since then, to help me guess a “real” timestamp for screwy or timestamp-missing entries).

Anyway, it’s both fun and “fun”.

Modest Sponsorship from Honeybadger is hosted on heroku, cause it’s easy, and even fun, and this is a side project. It’s costs are low (one hobby dyno, a free postgres that I might upgrade to the lowest tier paid one at some point). Costs are low, but there are costs.

Fortunately covered by a modest $20/month sponsorship from Honeybadger. I think it’s important to be open about exactly how much they are paying, so you can decide for yourself if it’s likely influencing’s editorial decisions or whatever, and just everything is transparent. I don’t think it is, I do include honeybadger’s Developer Blog in the aggregator, but I think I’d stop if it started looking spammy.

When they first offered the modest sponsorship, I had no experience with honeybadger. But since then I’ve been using it both for (which has very few approaching zero uncaught exceptions) and a day job project (which has plenty). I’ve liked using it, I definitely recommend checking it out.  Honeybadger definitely keeps developing, adding and refining features, if there’s any justice I think it’ll be as successful in the market as bugsnag.  I think I like it better than bugsnag, although it’s been a while since I used bugsnag now. I think honeybadger pricing tends to be better than bugsnag’s, although it depends on your needs and sizes. They also offer a free “micro” plan for projects that are non-commercial open source, although you gotta email them to ask for it. Check em out!

Performance on a many-membered Sufia/Hyrax show page

We still run Sufia 7.3, haven’t yet upgraded/migrated to hyrax, in our digital repository. (These are digital repository/digital library frameworks, for those who arrived here and are not familiar; you may not find the rest of the very long blog post very interesting. :))

We have a variety of ‘manuscript’/’scanned 2d text’ objects, where each page is a sufia/hyrax “member” of the parent (modeled based on PCDM).  Sufia was  originally designed as a self-deposit institutional repository, and I didn’t quite realize this until recently, but is now known sufia/hyrax to still have a variety of especially performance-related problems with works with many members. But it mostly works out.

The default sufia/hyrax ‘show’ page displays a single list of all members on the show page, with no pagination. This is also where admins often find members to ‘edit’ or do other admin tasks on them.

For our current most-membered work, that’s 473 members, 196 of which are “child works” (each of which is only a single fileset–we use child works for individual “interesting” pages we’d like to describe more fully and have show up in search results independently).  In stock sufia 7.3 on our actual servers, it could take 4-6 seconds to load this page (just to get response from server, not including client-side time).  This is far from optimal (or even ‘acceptable’ in standard Rails-land), but… it works.

While I’m not happy with that performance, it was barely acceptable enough that before getting to worrying about that, our first priority was making the ‘show’ page look better to end-users.  Incorporating a ‘viewer’, launched by clicks on page thumbs, more options in a download menu, , bigger images with an image-forward kind of design, etc. As we were mostly just changing sizes and layouts and adding a few more attributes and conditionals, I didn’t think this would effect performance much compared to the stock.

However, just as we were about to reach a deadline for a ‘soft’ mostly-internal release, we realized the show page times on that most-membered work had deteriorated drastically. To 12 seconds and up for a server response, no longer within the bounds of barely acceptable. (This shows why it’s good to have some performance monitoring on your app, like New Relic or Skylight, so you have a chance to notice performance degradation as a result of code changes as soon as it happens. Although we don’t actually have this at present.)

We thus embarked on a week+ of most of our team working together on performance profiling to figure out what was up and — I’m happy to say — fixing it, perhaps even getting slightly better perf than stock sufia in the end. Some of the things we found definitely apply to stock sufia and hyrax too, others may not, we haven’t spend the time to completely compare and contrast, but I’ll try to comment with my advice.

When I see a major perf degradation like this, my experience tells me it’s usually one thing that’s caused it. But that wasn’t really true in this case, we had to find and fix several issues. Here’s what we found, how we found it, and our local fixes:

N+1 Solr Queries

The N+1 query problem is one of the first and most basic performance problems many Rails devs learn about. Or really, many web devs (or those using SQL or similar stores) generally.

It’s when you are showing a parent and it’s children, and end up doing an individual db fetch for every child, one-per-child. Disastrous performance wise, you need to find a way to do a single db fetch that gets everything you want instead.

So this was our first guess. And indeed we found that stock sufia/hyrax did do n+1 queries to Solr on a ‘show’ page, where n is the number of members/children.

If you were just fetching with ordinary ActiveRecord, the solution to this would be trivial, adding something like .includes(:members) to your ActiveRecord query.  But of course we aren’t, so the solution is a bit more involved, since we have to go through Solr, and actually traverse over at least one ‘join’ object in Solr too, because of how sufia/hyrax stores these things.

Fortunately Princeton University Library already had a local solution of their own, which folks in the always helpful samvera slack channel shared with us, and we implemented locally as well.

I’m not a huge fan of overriding that core member_presenters method, but it works and I can’t think of a better way to solve this.

We went and implemented this without even doing any profiling first, cause it was a low-hanging fruit. And were dismayed to see that while it did improve things measurably, performance was still disastrous.

Solrizer.solr_name turns out to be a performance bottleneck?(!)

I first assumed this was probably still making extra fetches to solr (or even fedora!), that’s my experience/intuition for most likely perf problem. But I couldn’t find any of those.

Okay, now we had to do some actual profiling. I created a test work in my dev instance that had 200 fileset members. Less than our slowest work in production, but should be enough to find some bottlenecks, I hoped. The way I usually start is by a really clumsy and manual deleting parts of my templates to see what things deleted makes things faster. I don’t know if this is really a technique I’d recommend, but it’s my habit.

This allowed me to identify that indeed the biggest perf problem at this time was not in fetching the member-presenters, and indeed was in the rendering of them. But as I deleted parts of the partial for rendering each member, I couldn’t find any part that speeded up things drastically, deleting any part just speeded things up proportional to how much I deleted. Weird. Time for profiling with ruby-prof.

I wrapped the profiling just around the portion of the template I had already identified as problem area. I like the RubyProf::GraphHtmlPrinter report from ruby-prof for this kind of work. (One of these days I’m going to experiment GraphViz or compatible, but haven’t yet).

Surprisingly, the top culprit for taking up time was — Solrizer.solr_name. (We use Solrizer 3.4.1; I don’t believe as of this date newer versions of solrizer or other dependencies would fix this).

It makes sense Solrizer.solr_name is called a lot. It’s called basically every time you ask for any attribute from your Solr “show” presenter. I also saw it being called when generating an internal app link to a show page for a member, perhaps because that requires attributes. Anything you have set up to delegate …, to: :solr_document probably  also ends up calling Solrizer.solr_name in the SolrDocument.

While I think this would be a problem in even stock Sufia/Hyrax, it explains why it could be more of a problem in our customization — we were displaying more attributes and links, something I didn’t expect would be a performance concern; especially attributes for an already-fetched object oughta be quite cheap. Also explains why every part of my problem area seemed to contribute roughly equally to the perf problem, they were all displaying some attribute or link!

It makes sense to abstract the exact name of the Solr field (which is something like ​​title_ssim), but I wouldn’t expect this call to be much more expensive than a hash lookup (which can usually be done thousands of times in 1ms).  Why is it so much slower? I didn’t get that far, instead I hackily patched Solrizer.solr_name to cache based on arguments, so all calls after the first with the same argument would be just a hash lookup. 

I don’t think this would be a great upstream PR, it’s a workaround. Would be better to figure out why Solrizer.solr_name is so slow, but my initial brief forays there didn’t reveal much, and I had to return to our app.

Because while this did speed up my test case by a few hundred ms, my test case was still significantly slower compared to an older branch of our local app with better performance.

Using QuestioningAuthority gem in ways other than intended

We use the gem commonly referred to as “Questioning Authority“, but actually released as a gem called qa for most of our controlled vocabularies, including “rights”.  We wanted to expand the display of “rights” information beyond just a label, we wanted a nice graphic and user-facing shortened label ala

It seemed clever some months ago to just add this additional metadata to the licenses.yml file already being used by our qa-controlled vocabulary.  Can you then access it using the existing qa API?  Some reverse-engineering led me to using

It worked great… except after taking care of Solrizer.solr_name, this was the next biggest timesink in our perf profile. Specifically it seemed to be calling slow YAML.load a lot. Was it reloading the YAML file from disk on every call? It was!  And we were displaying licensing info for every member.

I spent some time investigating the qa gem. Was there a way to add caching and PR it upstream? A way that would be usable in an API that would give me what I wanted here? I couldn’t quite come up with anything without pretty major changes.  The QA gem wasn’t really written for this use case, it is focused pretty laser-like on just providing auto-complete to terms, and I’ve found it difficult in the past to use it for anything else. Even in it’s use case, not caching YAML is a performance mistake, but since it would usually be done only once per request it wouldn’t be disastrous.

I realized, heck, reading from a YAML is not a complicated thing. I’m going to leave it the licenses.yml for DRY of our data, but I’m just going to write my own cover logic to read the YAML in a perf-friendly way. 

That trimmed off a nice additional ~300ms out of 2-3 seconds for my test data, but the code was still significantly slower compared to our earlier branch of local app.

[After I started drafting this post, Tom Johnson filed an issue on QA on the subject.]

Sufia::SufiaHelperBehavior#application_name is also slow

After taking care of that one, the next thing taking up the most time in our perf profile was, surprisingly, Sufia::SufiaHelperBehavior#application_name (I think Hyrax equivalent is here and similar).

We were calling that #application_name helper twice per member… just in a data-confirm attr on a delete link! `Deleting #{file_set} from #{application_name} is permanent. Click OK to delete this from #{application_name}, or Cancel to cancel this operation. ` 

If the original sufia code didn’t have this, or only had application_name once instead of twice, that could explain a perf regression in our local code, if application_name is slow. I’m not sure if it did or not, but this was the biggest bottleneck in our local code at this time either way.

Why is application_name so slow? This is another method I might expect would be fast enough to call thousands of times on a page, in the cost vicinity of a hash lookup. Is I18n.t just slow to begin with, such that you can’t call it 400 times on a page?  I doubt it, but it’s possible. What’s hiding in that super call, that is called on every invocation even if no default is needed?  Not sure.

At this point, several days into our team working on this, I bailed out and said, you know what, we don’t really need to tell them the application name in the delete confirm prompt.

Again, significant speed-up, but still significantly slower than our older faster branch.

Too Many Partials

I was somewhat cheered, several days in, to be into actual generic Rails issues, and not Samvera-stack-specific ones. Because after fixing above, the next most expensive thing identifiable in our perf profile was a Rails ‘lookup_template’ kind of method. (Sorry, I didn’t keep notes or the report on the exact method).

As our HTML for displaying “a member on a show page” got somewhat more complex (with a popup menu for downloads and a popup for admin functions), to keep the code more readable we had extracted parts to other partials. So the main “show a member thumb” type partial was calling out to three other partials. So for 200 members, that meant 600 partial lookups.

Seeing that line in the profile report reminded me, oh yeah, partial lookup is really slow in Rails.  I remembered that from way back, and had sort of assumed they would have fixed it in Rails by now, but nope. In production configuration template compilation is compiled, but every render partial: is still a live slow lookup, that I think even needs to check the disk in it’s partial lookup (touching disk is expensive!).

This would be a great thing to fix in Rails, it inconveniences many people. Perhaps by applying some kind of lookup caching, perhaps similar to what Bootsnap does for $LOAD_PATH and require, but for template lookup paths. Or perhaps by enhancing the template compilation so the exact result of template lookups are compiled in and only need to be done on template compilation.  If either of these were easy to do, someone would probably have done them already (but maybe not).

In any event, the local solution is simple, if a bit painful to code legibility. Remove those extra partials. The main “show a member” partial is invoked with render collection, so only gets looked-up once and is not a problem, but when it calls out to others, it’s one lookup per render every time.  We inlined one of them, and turned two more into helper methods instead of partials. 

At this point, I had my 200-fileset test case performing as well or better as our older-more-performant-branch, and I was convinced we had it!  But we deployed to staging, and it was still significantly slower than our more-performant-branch for our most-membered work. Doh! What was the difference? Ah right, our most-membered work has 200 child works, my test case didn’t have child works.

Okay, new test case (it was kinda painful to figure out how to create a many-hundred-child-work test case in dev, and very slow with what I ended up with). And back to ruby-prof.

N+1 Solr queries again, for representative_presenter

Right before our internal/soft deadline, we had to at least temporarily bail out of using riiif for tiled image viewer and other derivatives too, for performance reasons.  (We ultimately ended up not using riiif, you can read about that too).

In the meantime, we added a feature switch to our app so we could have the riiif-using code in there, but turn it on and off.  So even though we weren’t really using riiif yet (or perf testing with riiif), there was some code in there preparing for riiif, that ended up being relevant to perf for works with child-works.

For riiif, we need to get a file_id to pass to riiif. And we also wanted the image height and width, so we could use lazysizes-aspect ratio so the image would be taking up the proper space on the screen even if waiting for a slow riiif server to deliver it. (lazysizes for lazy image loading, and lazysizes-aspectratio which can be used even without lazy loading — are highly recommended, they work great).

We used polymorphism, for a fileset member, the height, width and original_file_id were available directly on the solr object fetched corresponding to the member. But for a child work, it delegated to representative_presenter to get them. And representative_presenter, of course, triggered a solr fetch. Actually, it seemed to trigger three solr fetches, so you could actually call this a 3n+1 query!

If we were fetching from ActiveRecord, the solution to this would possibly be as simple as adding something like .includes("members", "members.representative") . Although you’d have to deal with some polymorphism there in some ways tricky for AR, so maybe that wouldn’t work out. But anyway, we aren’t.

At first I spent some time thinking through if there was a way to bulk-eager-load these representatives for child works similarly to what you might do with ActiveRecord. It was tricky, because the solr data model is tricky, the polymorphism, and solr doesn’t make “joins” quite as straighforward as SQL does.  But then I figured, wait, use Solr like Solr.   In Solr it’s typical to “de-normalize” your data so the data you want is there when you need it.

I implemented code to index a representative_file_id, representative_width, and representative_height directly on a work in Solr. At first it seemed pretty straightforward.  Then we discovered it was missing some edge cases (a work that has as it’s representative a child work, that has nothing set as it’s representative?), and that there was an important omission — if a work has a child work as a representative, and that child work changes it’s representative (which now applies to the first work), the first work needs to be reindexed to have it. So changes to one work need to trigger a reindex of another. After around 10 more frustrating dev hours, some tricky code (which reduces indexing performance but better than bad end-user performance), some very-slow and obtuse specs, and a very weary brain, okay, got that taken care of too. (this commit may not be the last word, I think we had some more bugfixes after that).

After a bulk reindex to get all these new values — our code is even a little bit faster than our older-better-performing-branch. And, while I haven’t spent the time to compare it, I wouldn’t be shocked if it’s actually a bit faster than the Stock sufia.  It’s not fast, still 4-5s for our most-membered-work, but back to ‘barely good enough for now’.

Future: Caching? Pagination?

My personal rules of thumb in Rails are that a response over 200ms is not ideal, over 500ms it’s time to start considering caching, and over 1s (uncached) I should really figure out why and make it faster even if there is caching.  Other Rails devs would probably consider my rules of thumb to already be profligate!

So 4s is still pretty slow. Very slow responses like this not only make the user wait, but load down your Rails server filling up it’s processing queue and causing even worse problems under multi-user use. It’s not great.

Under a more standard Rails app, I’d definitely reach for caching immediately. View or HTTP caching is a pretty standard technique to make your Rails app as fast as possible, even when it doesn’t have pathological performance.

But the standard Rails html caching approaches use something they call ‘russian doll caching’, where the updated_at timestamp on the parent is touched when a child is updated. The issue is making sure the cache for the parent page is refreshed when a child displayed on that page changes.

classProduct < ApplicationRecord
  has_many :games
classGame < ApplicationRecord
  belongs_to :product, touch: true

With touch set to true, any action which changes updated_at for a game record will also change it for the associated product, thereby expiring the cache.

ActiveFedora tries to be like ActiveRecord, but it does not support that “touch: true” on associations used in the example for russian doll caching. It might be easy to simulate with an after_save hook or something — but updating records in Fedora is so slow. And worse, I think (?) there’s no way to atomically update just the updated_at in fedora, you’ve got to update the whole record, introducing concurrency problems. I think this could be a whole bunch of work.

jcoyne in slack suggested that instead of russian-doll-style with touching updated_at, you could assemble your cache key from the updated_at values from all children.  But I started to worry about child works, this might have to be recursive, if a child is a child work, you need to include all it’s children as well. (And maybe File children of every FileSet?  Or how do fedora ‘versions’ effect this?).  It could start getting pretty tricky.  This is the kind of thing the russian-doll approach is meant to make easier, but it relies on quick and atomic touching of updated_at.

We’ll probably still explore caching at some point, but I suspect it will be much less straightforward to work reliably than if this were a standard rails/AR app. And the cache failure mode of showing end-users old not-updated data is, I know from experience, really confusing for everyone.

Alternately or probably additionally, why are we displaying all 473 child images on the page at once in the first place?  Even in a standard Rails app, this might be hard to do performantly (although I’d just solve it with cache there if it was the UX I wanted, no problem). Mostly we’re doing it just cause stock sufia did it and we got used to it. Admins use ctrl-f on a page to find a member they want to edit. I kind of like having thumbs for all pages right on the page, even if you have to scroll a lot to see them (was already using lazysizes to lazy load the images only when scrolled to).  But some kind of pagination would probably be the logical next step, that we may get to eventually. One or more of:

  • Actual manual pagination. Would probably require a ‘search’ box on titles of members for admins, since they can’t use cntrl-f anymore.
  • Javascript-based “infinite scroll” (not really infinite) to load a batch at a time as user scrolls there.
  • Or using similar techniques, but actually load everything with JS immediately on page load, but a batch at a time.  Still going to use the same CPU on the server, but quicker initial page load, and splitting up into multiple requests is better for server health and capacity.

Even if we get to caching or some of these, I don’t think any of our work above is wasted — you don’t want to use this technique to workaround performance bottlenecks on the server, in my opinion you want to fix easily-fixable (once you find them!) performance bottlenecks or performance bugs on the server first, as we have done.

And another approach some would be not rendering some/all of this HTML on the server at all, but switching to some kind of JS client-side rendering (react etc.). There are plusses and minuses to that approach, but it takes our team into kinds of development we are less familiar with, maybe we’ll experiment with it at some point.

Thoughts on the Hydra/Samvera stack

So. I find Sufia and the samvera stack quite challenging, expensive, and often frustrating to work with. Let’s get that out of the way. I know I’m not alone in this experience, even among experienced developers, although I couldn’t say if it’s universal.

I also enjoy and find it rewarding and valuable to think about why software is frustrating and time-consuming (expensive) to work with, what makes it this way, and how did it get this way, and (hardest of all), what can be done or done differently.

If you’re not into that sort of discussion, please feel free to drop out now. Myself, I think it’s an important discussion to have. Developing a successful collaborative open source shared codebase is hard, there are many things we (or nobody) has figured out, and I think it can take some big-picture discussion and building of shared understanding to get better at it.

I’ve been thinking about how to have that discussion in as productive a way as possible. I haven’t totally figured it out — wanting to add this piece in but not sure how to do it kept me from publishing this blog post for a couple months after the preceding sections were finished — but I think it is probably beneficial to ground and tie the big picture discussion in specific examples — like the elements and story above. So I’m adding it on.

I also think it’s important to tell beginning developers working with Samvera, if you are feeling frustrated and confused, it’s probably not you, it’s the stack. If you are thinking you must not be very good at programming or assuming you will have similar experiences with any development project — don’t assume that, and try to get some experience in other non-samvera projects as well.

So, anyhow, this experience of dealing with performance problems on a sufia ‘show’ page makes me think of a couple bigger-picture topics:  1) The continuing cost of using a less established/bespoke data store layer (in this case Fedora/ActiveFedora/LDP) over something popular with many many developer hours already put into it like ActiveRecord, and 2) The idea of software “maturity”.

In this post, I’m actually going to ignore the first other than that, and focus on the second “maturity”.

Software maturity: What is it, in general?

People talk about software being “mature” (or “immature”) a lot, but googling around I couldn’t actually find much in the way of a good working definition of what is meant by this. A lot of what you find googling is about the “Capability Maturity Model“. The CMM is about organizational processes rather than product, it’s came out of the context of defense department contractors (a very different context than collaborative open source), and I find it’s language somewhat bureaucratic.  It also has plenty of critique.  I think organizational process matters, and CMM may be useful to our context, but I haven’t figured out how to make use of CMM to speak to about software maturity in the way I want to here, so I won’t speak of it again here.

Other discussions I found also seemed to me kind of vague, hand-wavy, or self-referential, in ways I still didn’t know how to make use of to talk about what I wanted.

I actually found a random StackOverflow answer I happened across to be more useful than most, I found it’s focus on usage scenarios and shared understanding to be stimulating:

I would say, mature would add the following characteristic to a technology:

  1. People know how to use it, know its possibilities and limitations
  2. People know what the typical usage scenarios are, patterns, what are good usage scenarios for this technology so that it shows its best
  3. People have found out how to deal with limitations/bugs, there is a community knowledge and help out there
  4. The technology is trusted enough to be used not only by individuals but in productive commercial environment as well

In this way of thinking about it, mature software is software where there is shared understanding about what the software is for, what patterns of use it is best at and which are still more ‘unfinished’ and challenging; where you’re going to encounter those, and how to deal with them.  There’s no assumption that it does everything under the sun awesomely, but that there’s a shared understanding about what it does do awesomely.

I think the unspoken assumption here is that for the patterns of use the software is best at, it does a good job of them, meaning it handles the common use cases robustly with few bugs or surprises. (If it doesn’t even do a good job of those, that doesn’t seem to match what we’d want to call ‘maturity’ in software, right? A certain kind of ‘ready for use’; a certain assumption you are not working on an untested experiment in progress, but on something that does what it does well.).

For software meant as a tool for developing other software (any library or framework; I think sufia qualifies), the usage scenarios are at least as much about developers (what they will use the software for and how) as they are about the end-users those developers are ultimately develop software for.

Unclear understanding about use cases is perhaps a large part of what happened to me/us above. We thought sufia would support ‘manuscript’ use cases (which means many members per work if a page image is a member, which seems the most natural way to set it up) just fine. It appears to have the right functionality. Nothing in it’s README or other ‘marketing’ tells you otherwise. At the time we began our implementation, it may very well be that nobody else thought differently either.

At some point though, a year+ after the org began implementing the technology stack believing it was mature for our use case, and months after I started working on it myself —  understanding that this use case would have trouble in sufia/hyrax began to build,  we started realizing, and realizing that maybe other developers had already realized, that it wasn’t really ready for prime time with many-membered works and would take lots of extra customization and workarounds to work out.

The understanding of what use cases the stack will work painlessly for, and how much pain you will have in what areas, can be something still being worked out in this community, and what understanding there is can be unevenly distributed, and hard to access for newcomers. The above description of software maturity as being about shared understanding of usage scenarios speaks to me; from this experience it makes sense to me that that is a big part of ‘software maturity’, and that the samvera stack still has challenges there.

While it’s not about ‘maturity’ directly, I also want to bring in some of what @schneems wrote about in a blog post about “polish” in software and how he tries to ensure it’s present in software he maintains.

Polish is what distinguishes good software from great software. When you use an app or code that clearly cares about the edge cases and how all the pieces work together, it feels right.…

…User frustration comes when things do not behave as you expect them to. You pull out your car key, stick it in the ignition, turn it…and nothing happens. While you might be upset that your car is dead (again), you’re also frustrated that what you predicted would happen didn’t. As humans we build up stories to simplify our lives, we don’t need to know the complex set of steps in a car’s ignition system so instead, “the key starts the car” is what we’ve come to expect. Software is no different. People develop mental models, for instance, “the port configuration in the file should win” and when it doesn’t happen or worse happens inconsistently it’s painful.

I’ve previously called these types of moments papercuts. They’re not life threatening and may not even be mission critical but they are much more painful than they should be. Often these issues force you to stop what you’re doing and either investigate the root cause of the rogue behavior or at bare minimum abandon your thought process and try something new.

When we say something is “polished” it means that it is free from sharp edges, even the small ones. I view polished software to be ones that are mostly free from frustration. They do what you expect them to and are consistent…

…In many ways I want my software to be boring. I want it to harbor few surprises. I want to feel like I understand and connect with it at a deep level and that I’m not constantly being caught off guard by frustrating, time stealing, papercuts.

This kind of “polish” isn’t the same thing as maturity — schneems even suggests that most software may not live up to his standards of “polish”.

However, this kind of polish is a continuum.  On the dark opposite side, we’d have hypothetical software, where working with it is about near constant surprises, constantly “being caught off guard by frustrating, time-stealing papercuts”, software where users (including developer-users for tools) have trouble developing consistent mental models, perhaps because the software is not very consistent in it’s behavior or architecture, with lots of edge cases and pieces working together unexpectedly or roughly.

I think our idea of “maturity” in software does depend on being somewhere along this continuum toward the “polished” end. If we combine that with the idea about shared understanding of usage scenarios and maturity, we get something reasonable. Mature software has shared understanding about what usage scenarios it’s best at, generally accomplishing those usage scenarios painlessly and well. At least in those usage scenarios it is “polished”, people can develop mental models that let them correctly know what to expect, with frustrating “papercuts” few and far between.

Mature software also generally maintains backwards compatibility, with backwards breaking changes coming infrequently and in a well-managed way — but I think that’s a signal or effect of the software being mature, rather than a cause.  You could take software low on the “maturity” scale, and simply stop development on it, and thereby have a high degree of backwards compat in the future, but that doesn’t make it mature. You can’t force maturity by focusing on backwards compatibility, it’s a product of maturity.

So, Sufia and Samvera?

When trying to figure out how mature software is, we are used to taking certain signals as sort of proxy evidence for it.  There are about 4 years between the release of sufia 1.0 (April 2013) and Sufia 7.3 (March 2017; beyond this point the community’s attention turned from Sufia to Hyrax, which combined Sufia and CurationConcerns). Much of sufia is of course built upon components that are even older: ActiveFedora 1.0 was Feb 2009, and the hydra gem was first released in Jan 2010. This software stack has been under development for 7+ years,  and is used by several dozens of institutions.

Normally, one might take these as signs predicting a certain level of maturity in the software. But my experience has been that it was not as mature as one might expect from this history or adoption rate.

From the usage scenario/shared understanding bucket, I have not found that there is as high degree as I might have expected of easily accessible shared understanding of  “know how to use it, know its possibilities and limitations,” “know what the typical usage scenarios are, patterns, what are good usage scenarios for this technology so that it shows its best.”  Some people have this understanding to some extent, but this knowledge is not always very clear to newcomers or outsiders — and not what they may have expected. As in this blog post, things I may assume are standard usage scenarios that will work smoothly may not be.   Features I or my team assumed were long-standing, reliable, and finished sometimes are not. 

On the “polish” front, I honestly do feel like I am regularly “being caught off guard by frustrating, time stealing, papercuts,” and finding inconsistent and unparallel architecture and behavior that makes it hard to predict how easy or successful it will be to implement something in sufia; past experience is no guarantee of future results, because similar parts often work very differently. It often feels to me like we are working on something at a more proof-of-concept or experimental level of maturity, where you should expect to run into issues frequently.

To be fair, I am using sufia 7, which has been superceded by hyrax (1.0 released May 2017, first 2.0 beta released Sep 2017, no 2.0 final release yet), which in some cases may limit me to older versions of other samvera stack dependencies too. Some of these rough edges may have been filed off in hyrax 1/2, one would expect/hope that every release is more mature than the last. But even with Sufia 7 — being based on technology with 4-7 years of development history and adopted by dozens of institutions, one might have expected more maturity. Hyrax 1.0 was only released a few months ago after all.  My impression/understanding is that hyrax 1.0 by intention makes few architectural changes from sufia (although it may include some more bugfixes), and upcoming hyrax 2.0 is intended to have more improvements, but still most of the difficult architectural elements I run into in sufia 7 seem to be mostly the same when I look at hyrax master repo. My impression is that hyrax 2.0 (not quite released) certainly has improvements, but does not make huge maturity strides.

Does this mean you should not use sufia/hyrax/samvera? Certainly not (and if you’re reading this, you’ve probably already committed to it at least for now), but it means this is something you should take account of when evaluating whether to use it, what you will do with it, and how much time it will take to implement and maintain.  I certainly don’t have anything universally ‘better’ to recommend for a digital repository implementation, open source or commercial. But I was very frustrated by assuming/expecting a level of maturity that I then personally did not find to be delivered.  I think many organizations are also surprised to find sufia/hyrax/samvera implementation to be more time-consuming (which also means “expensive”, staff time is expensive) than expected, including by finding features they had assumed were done/ready to need more work than expected in their app; this is more of a problem for some organizations than others.  But I think it pays to take this into account when making plans and timelines.   Again, if you (individually or as an institution) are having more trouble setting up sufia/hyrax/samvera than you expected, it’s probably not just you.

Why and what next?

So why are sufia and other parts of the samvera stack at a fairly low level of software maturity (for those who agree they are not)?  Honestly, I’m not sure. What can be done to get things more mature and reliable and efficient (low TCO)?  I know even less.  I do not think it’s because any of the developers involved (including myself!) have anything but the best intentions and true commitment, or because they are “bad developers.” That’s not it.

Just some brainstorms about what might play into sufia/samvera’s maturity level. Other developers may disagree with some of these guesses, either because I misunderstand some things, or just due to different evaluations.

  • Digital repositories are just a very difficult or groundbreaking domain, and it just necessarily would take this number of years/developer-hours to get to this level of maturity. (I don’t personally subscribe to this really, but it could be)


  • Fedora and RDF are both (at least relatively) immature technologies themselves, that lack the established software infrastructure and best practices of more mature technologies (at the other extreme, SQL/rdbms, technology that is many decades old), and building something with these at the heart is going to be more challenging, time-consuming, and harder to get ‘right’.


  • I had gotten the feeling from working with the code and off-hand comments from developers who had longer that Sufia had actually taken a significant move backwards in maturity at some point in the past. At first I thought this was about the transition from fedora/fcrepo 3 to 4. But from talking to @mjgiarlo (thanks buddy!), I now believe it wasn’t so much about that, as about some significant rewriting that happened between Sufia 6 and 7 to: Take sufia from an app focused on self-deposit institutional repository with individual files, to a more generalized app involving ‘works’ with ‘members’ (based on the newly created PCDM model); that would use data in Fedora that would be compatible with other apps like Islandora (a goal that has not been achieved and looks to me increasingly unrealistic); and exploded into many more smaller purpose hypothetically decoupled component dependencies that could be recombined into different apps (an approach that, based on outcomes, was later reversed in some ways in Hyrax).
    • This took a very significant number of developer hours, literally over a year or two. These were hours that were not spent on making the existing stack more mature.
    • But so much was rewritten and reorganized that I think it may have actually been a step backward in maturity (both in terms of usage scenarios and polish), not only for the new usage scenarios, but even for what used to be the core usage scenario.
    • So much was re-written, and expected usage scenarios changed so much, that it was almost like creating an entirely new app (including entirely new parts of the dependency stack), so the ‘clock’ in judging how long Sufia (and some but not all other parts of the current dependency stack) has had to become mature really starts with Sufia 7 (first released 2016), rather than sufia 1.0.
    • But it wasn’t really a complete rewrite, “legacy” code still exists, some logic in the stack to this day is still based on assumptions about the old architecture that have become incorrect, leading to more inconsistency, and less robustness — less maturity.
    • The success of this process in terms of maturity and ‘total cost of ownership’ are, I think… mixed at best. And I think some developers are still dealing with some burnout as fallout from the effort.


  • Both sufia and the evolving stack as a whole have tried to do a lot of things and fit a lot of usage scenarios. Our reach may have exceeded our grasp. If an institution came with a new usage scenario (for end-users or for how they wanted to use the codebase), whether they come with a PR or just a desire, the community very rarely says no, and almost always then tries to make the codebase accommodate. Perhaps in retrospect without sufficient regard for the cost of added complexity. This comes out of a community-minded and helpful motivation to say ‘yes’. But it can lead to lack of clarity on usage scenarios the stack excels at, or even lack of any usage scenarios that are very polished in the face of ever-expanding ambition. Under the context of limited developer resources yes, but increased software complexity also has costs that can’t be handled easily or sometimes at all simply by adding developers either (see The Mythical Man-Month).


  • Related, I think, sufia/samvera developers have often aspired to make software that can be used and installed by institutions without Rails developers, without having to write much or any code. This has not really been accomplished, or if it has only in the sense that you need samvera developer(s) who are or become proficient in our bespoke stack, instead of just Rails developers. (Our small institution found we needed 1-2 developers plus 1 devops).  While motivated by the best intentions — to reduce Total Cost of Ownership for small institutions — the added complexity in pursuit of this ambitious and still unrealized goal may have ironically led to less maturity and increased TCO for institutions of all sizes.


  • I think most successfully mature open source software probably have one (or a small team of) lead developer/architect(s) providing vision as to the usage scenarios that are in or out, and to a consistent architecture to accomplish them. And with the authority and willingness to sometimes say ‘no’ when they think code might be taking the project in the wrong direction on the maturity axis. Samvera, due to some combination of practical resource limitations and ideology, has often not.


  • ActiveRecord is enormously complex software which took many many developer-hours to get to it’s current level of success and maturity. (I actually like AR okay myself).  The thought that it’s API could be copied and reimplemented as ActiveFedora, with much fewer developer-hour resources, without encountering a substantial and perhaps insurmountable “maturity gap” — may in retrospect have been mistaken. (See above about how basing app on Fedora has challenges to achieving maturity).


What to do next, or different, or instead?  I’m not sure!  On the plus side we have a great community of committed and passionate and developers, and institutions interested in cooperating to help each other.

I think improvements start with acknowledging the current level of maturity, collectively and in a public way that reaches non-developer stakeholders, decision-makers, and funders too.

We should be intentional about being transparent with the level of maturity and challenge the stack provides. Resisting any urge to “market” samvera or deemphasize the challenges, which is a disservice to people evaluating or making plans based on the stack, but also to the existing community too.We don’t all have to agree about this either; I know some developers and institutions do have similar analysis to me here (but surely with some differences), others may not. But we have to be transparent and public about our experiences, to all layers of our community as well as external to it. We all have to see clearly what is, in order to make decisions about what to do next.

Personally, I think we need to be much more modest about our goals and the usage scenarios (both developer and end-user) we can support. This is not necessarily something that will be welcome to decision-makers and funders, who have reasons to want  to always add on more instead.  But this is why we need to be transparent about where we truly currently are, so decision-makers can operate based on accurate understanding of our current challenges and problems as well as successes


Consider TTY::Command for all your external process/shell out needs in ruby

When writing a ruby app, I regularly have the need to execute and wait for an external non-ruby “command line” process. Sometimes I think of this as a “shell out”, but in truth depending on how you do it a shell (like bash or sh) may not be involved at all, the ruby process can execute the external process directly.  Typical examples for me are the imagemagick/graphicsmagick command line.

(Which is incidentally, I think, what the popular ruby minimagick gem does, just execute an external process using IM command line. As opposed to rmagick, which tries to actually use the system C IM libraries. Sometimes “shelling out” to command line utility is just simpler and easier to get right).

There are a few high-level ways built into ruby to execute external processes easily. Including the simple system and  backticks (`), which is usually what most people start with, they’re simple and right there for you!

But I think many people end up finding what I have, the most common patterns I want in a “launch and wait for external command line process” function are difficult with system and backticks.  I definitely want the exit value — I usually am going to wait to raise an exception if the exit value isn’t 0 (unix for “success”).   I usually want to suppress stdout/stderr from the external process (instead of having it end up in my own processes stdout/stderr and/or logs), but I want to capture them in a string (sometimes separate strings for stdout/stderr), because in an error condition I do want to log them and/or include them in an exception message. And of course there’s making sure you are safe from command injection vulnerabilities. 

Neither system nor backticks will actually give you all this.  You end up having to do Open3#popen3 to get full control. And it ends up pretty confusing, verbose, and tricky, to make sure you’re doing what you want, and without accidentally dead-blocking for some of the weirder combinations. In part because popen3 is just an old-school low-level C-style OS API being exposed to you in ruby.

The good news is @piotrmurrach’s TTY::Command will do it all for you. It’s got the right API to easily express the common use-cases you actually have, succinctly and clearly, and taking care of the tricky socket/blocking stuff for you.

One common use case I have is:  execute an external process. Do not let it output to stderr/stdout, but do capture the stderr/stdout in string(s). If the command fails,  raise with the captured stdout/stderr included (that I intentionally didn’t output to logs, but I wanna see it on error). Do it all with proper protection from command injection attack, of course. :null).run('vips', 'dzsave', input_file_path_string)

Woah, done! run will already:

If the command fails (with a non-zero exit code), a TTY::Command::ExitError is raised. The ExitError message will include: the name of command executed; the exit status; stdout bytes; stderr bytes

Does exactly what I need, cause, guess, what, what I need is a very common use case and piotr recognized that, prob from his own work.

Want to not raise on the error, but still detect it and log stdout/stderr? No problem.

result = :null).run("vips", "dzsave", whatever)
if result.failed?
$stderr.puts("Our vips thing failed!!! with this output:\n #{result.stdout} #{result.stderr}")

If you want to not raise on error but still detect it, pass ENV, a bunch of other things, TTY::Command has got ya. Supply stdin too? No prob.  Supply a custom output formatter, so stuff goes to stdout/stderr but properly colorized/indented for your own command line utility, to look all nice and consistent with your other output? Yup. You even get a dry-run mode!

Ordinary natural rubyish options for just about anything I can think of I might want to do, and some things I hadn’t realized I might want to do until I saw em doc’d as options in TTY::Command. Easy-peasy.

In the past, I sometimes end up writing bash scripts when I’m writing something that calls a lot of external processes, cause bash seems like the suitable fit for that, it can be annoying and verbose to do a lot of that how you want in ruby script. Inevitably the bash script grows to the point that I’m looking up non-trivial parts of bash (I’m not an expert), and fighting with them, and regretting that I used bash.  In the future, when I have the thought “this might be best in bash”, I plan to try using just ruby with TTY::Command, I think it’ll lessen the pain of lots of external processes in ruby to where there’s no reason to even consider using bash.


Gem dependency use among Sufia/Hyrax apps

I have a little side project that uses the GitHub API (and a little bit of rubygems API) to analyze what gem dependencies and versions (from among a list of ‘interesting’ ones) are being used in a list of open Github repos with `Gemfile.lock`s, that I wrote out of curiosity regarding sufia/hyrax apps. I think it could turn into a useful tool for any ruby open source community using common dependencies to use to see what the community is up to.

It’s far from done, it just generates an ASCII report, and is missing many features I’d like. There are things I’m curious about that it doesn’t report on yet, like history of dependency use, how often do people upgrade a given dependency. And I’d like an interactive HTML interface that lets you slice and dice the data a bit (of people using a given gem, how many are also using another gem, etc).  And then maybe set it up so it’s on the public web and regularly updates itself.

But it’s been a couple of months since I’ve worked on it, and I thought just the current snapshot in limited ASCII report format was useful enough that I should share a report.

The report, intentionally, for now, does not tell you which repos are using which dependencies, it just gives aggregate descriptive statistics. (Although you could of course manually find that out from their open Gemfile.locks). I wanted to avoid seeming to ‘call out’ anyone for using old versions or whatever. Although it would be useful to know, so you can, say, get in touch with people using the same things or same versions as you, I wanted to get some community feedback first.  Thoughts on if it should?

I got the list of repos from various public lists of sufia or hyrax repos. Some things on the lists didn’t actually have open github repos at that address anymore — or had an open repo, but without a Gemfile.lock! Can only analyze with a Gemfile.lock in the repo. But I don’t really know which of these repos are in production, and which might be not yet, no longer, or never were.  If you have a repo you’d like me to add or remove from the list, let me know! Also any other things you might want the report to include or questions you might want to let it help you answer. Or additional ‘interesting’ gems you’d like included in the report?

I do think it’s pretty cool that the combination of machine-readable Gemfile.lock and the GitHub API lets us do some pretty cool stuff here! If I get around to writing an interactive HTML interface, I’m thinking of trying to do it all in static file Javascript. That would require rewriting some of the analysis tools I’ve already written in ruby, in JS, but might be a good project to experiment with, say, vue.js. I don’t have much fancy new-gen JS experience, and this is a nice isolated thing for trying it out.

I am not sure what to read into these results. They aren’t necessarily good or bad, they just are a statement of what things are, which I think is interesting and useful in itself, and helps us plan and coordinate. I do think it’s worth recognizing that when developers in the community are on old major versions of shared dependencies, it increases the cost for them to contribute back upstream, makes it harder to do as part of “scratching their own itch”, and probably decreases such contributions.  I also found it interesting how many repos use unreleased straight-from-github versions of some dependencies (17 of 28 do at least once), as well as the handful of gems that are fairly widely used in production but still don’t have a 1.0 release.

And here’s the ugly ascii report!

38 total input URLs, 28 with fetchable Gemfile.lock
total apps analyzed: 28
with dependencies on non-release (git or path) gem versions: 17
  with git checkouts: 16
  with local path deps: 1
Date of report: 2017-08-30 15:11:20 -0400

Repos analyzed:

Gems analyzed:


  apps without dependency: 0
  apps with dependency: 28 (100%)

  git checkouts: 0
  local path dep: 0

  3.x (3.0.0 released 2010-08-29): 1 (4%)
    3.2.x (3.2.0 released 2012-01-20): 1 (4%)

  4.x (4.0.0 released 2013-06-25): 16 (57%)
    4.0.x (4.0.0 released 2013-06-25): 2 (7%)
    4.1.x (4.1.0 released 2014-04-08): 1 (4%)
    4.2.x (4.2.0 released 2014-12-20): 13 (46%)

  5.x (5.0.0 released 2016-06-30): 11 (39%)
    5.0.x (5.0.0 released 2016-06-30): 8 (29%)
    5.1.x (5.1.0 released 2017-04-27): 3 (11%)

  Latest release: 5.1.4.rc1 (2017-08-24)

  apps without dependency: 20 (71%)
  apps with dependency: 8 (29%)

  git checkouts: 4 (50%)
  local path dep: 0

  1.x (1.0.1 released 2017-05-24): 4 (50%)
    1.0.x (1.0.1 released 2017-05-24): 4 (50%)

  2.x ( released unreleased): 4 (50%)
    2.0.x ( released unreleased): 4 (50%)

  Latest release: 1.0.4 (2017-08-22)

  apps without dependency: 10 (36%)
  apps with dependency: 18 (64%)

  git checkouts: 8 (44%)
  local path dep: 0

  0.x (0.0.1.pre1 released 2012-11-15): 1 (6%)
    0.1.x (0.1.0 released 2013-02-04): 1 (6%)

  3.x (3.0.0 released 2013-07-22): 1 (6%)
    3.7.x (3.7.0 released 2014-02-07): 1 (6%)

  4.x (4.0.0 released 2014-08-21): 2 (11%)
    4.1.x (4.1.0 released 2014-10-31): 1 (6%)
    4.2.x (4.2.0 released 2014-11-25): 1 (6%)

  5.x (5.0.0 released 2015-06-06): 1 (6%)
    5.0.x (5.0.0 released 2015-06-06): 1 (6%)

  6.x (6.0.0 released 2015-03-27): 6 (33%)
    6.0.x (6.0.0 released 2015-03-27): 2 (11%)
    6.2.x (6.2.0 released 2015-07-09): 1 (6%)
    6.3.x (6.3.0 released 2015-08-12): 1 (6%)
    6.6.x (6.6.0 released 2016-01-28): 2 (11%)

  7.x (7.0.0 released 2016-08-01): 7 (39%)
    7.0.x (7.0.0 released 2016-08-01): 1 (6%)
    7.1.x (7.1.0 released 2016-08-11): 1 (6%)
    7.2.x (7.2.0 released 2016-10-01): 4 (22%)
    7.3.x (7.3.0 released 2017-03-21): 1 (6%)

  Latest release: 7.3.1 (2017-04-26)

  apps without dependency: 21 (75%)
  apps with dependency: 7 (25%)

  git checkouts: 1 (14%)
  local path dep: 1 (14%)

  1.x (1.0.0 released 2016-06-22): 7 (100%)
    1.3.x (1.3.0 released 2016-08-03): 2 (29%)
    1.6.x (1.6.0 released 2016-09-14): 3 (43%)
    1.7.x (1.7.0 released 2016-12-09): 2 (29%)

  Latest release: 2.0.0 (2017-04-20)

  apps without dependency: 11 (39%)
  apps with dependency: 17 (61%)

  git checkouts: 0
  local path dep: 0

  0.x (0.0.1 released 2013-10-04): 9 (53%)
    0.3.x (0.3.0 released 2014-06-20): 1 (6%)
    0.8.x (0.8.0 released 2016-07-07): 1 (6%)
    0.10.x (0.10.0 released 2016-08-16): 3 (18%)
    0.11.x (0.11.0 released 2017-01-04): 4 (24%)

  1.x (1.0.0 released 2017-03-22): 8 (47%)
    1.2.x (1.2.0 released 2017-06-23): 8 (47%)

  Latest release: 1.2.0 (2017-06-23)

  apps without dependency: 3 (11%)
  apps with dependency: 25 (89%)

  git checkouts: 2 (8%)
  local path dep: 0

  0.x (0.0.1 released 2013-06-13): 3 (12%)
    0.5.x (0.5.0 released 2014-08-27): 3 (12%)

  1.x (1.0.0 released 2015-01-30): 6 (24%)
    1.0.x (1.0.0 released 2015-01-30): 4 (16%)
    1.2.x (1.2.0 released 2016-01-21): 2 (8%)

  2.x (2.0.0 released 2016-04-28): 1 (4%)
    2.0.x (2.0.0 released 2016-04-28): 1 (4%)

  3.x (3.1.0 released 2016-08-09): 15 (60%)
    3.1.x (3.1.0 released 2016-08-09): 6 (24%)
    3.3.x (3.3.1 released 2017-05-04): 9 (36%)

  Latest release: 3.3.2 (2017-05-23)

  apps without dependency: 1 (4%)
  apps with dependency: 27 (96%)

  git checkouts: 0
  local path dep: 0

  5.x (5.0.0 released 2012-12-11): 1 (4%)
    5.4.x (5.4.0 released 2013-02-06): 1 (4%)

  6.x (6.0.0 released 2013-03-28): 1 (4%)
    6.5.x (6.5.0 released 2014-02-18): 1 (4%)

  7.x (7.0.0 released 2014-03-31): 3 (11%)
    7.2.x (7.2.0 released 2014-07-18): 3 (11%)

  9.x (9.0.1 released 2015-01-30): 6 (22%)
    9.1.x (9.1.0 released 2015-03-06): 2 (7%)
    9.2.x (9.2.0 released 2015-07-08): 2 (7%)
    9.5.x (9.5.0 released 2015-11-11): 2 (7%)

  10.x (10.0.0 released 2016-06-08): 16 (59%)
    10.0.x (10.0.0 released 2016-06-08): 1 (4%)
    10.3.x (10.3.0 released 2016-09-02): 3 (11%)
    10.4.x (10.4.0 released 2017-01-25): 4 (15%)
    10.5.x (10.5.0 released 2017-06-09): 8 (30%)

  Latest release: 10.5.0 (2017-06-09)

  apps without dependency: 1 (4%)
  apps with dependency: 27 (96%)

  git checkouts: 0
  local path dep: 0

  5.x (5.0.0 released 2012-12-11): 1 (4%)
    5.4.x (5.4.0 released 2013-02-06): 1 (4%)

  6.x (6.0.0 released 2013-03-28): 1 (4%)
    6.5.x (6.5.0 released 2014-02-18): 1 (4%)

  7.x (7.0.0 released 2014-03-31): 3 (11%)
    7.2.x (7.2.0 released 2014-07-18): 3 (11%)

  9.x (9.0.0 released 2015-01-30): 6 (22%)
    9.1.x (9.1.0 released 2015-03-06): 2 (7%)
    9.2.x (9.2.0 released 2015-07-08): 2 (7%)
    9.5.x (9.5.0 released 2015-11-11): 2 (7%)

  10.x (10.0.0 released 2016-06-08): 16 (59%)
    10.0.x (10.0.0 released 2016-06-08): 1 (4%)
    10.3.x (10.3.0 released 2016-09-02): 3 (11%)
    10.4.x (10.4.0 released 2017-01-25): 4 (15%)
    10.5.x (10.5.0 released 2017-06-09): 8 (30%)

  Latest release: 10.5.0 (2017-06-09)

  apps without dependency: 13 (46%)
  apps with dependency: 15 (54%)

  git checkouts: 1 (7%)
  local path dep: 0

  0.x (0.0.1 released 2015-06-05): 15 (100%)
    0.12.x (0.12.0 released 2016-05-24): 1 (7%)
    0.14.x (0.14.0 released 2016-09-06): 2 (13%)
    0.15.x (0.15.0 released 2016-11-30): 2 (13%)
    0.16.x (0.16.0 released 2017-03-02): 10 (67%)

  Latest release: 0.16.0 (2017-03-02)

  apps without dependency: 2 (7%)
  apps with dependency: 26 (93%)

  git checkouts: 0
  local path dep: 0

  0.x (0.0.1 released 2013-07-23): 4 (15%)
    0.0.x (0.0.1 released 2013-07-23): 1 (4%)
    0.1.x (0.1.0 released 2014-05-10): 3 (12%)

  1.x (1.0.0 released 2015-01-30): 6 (23%)
    1.0.x (1.0.0 released 2015-01-30): 1 (4%)
    1.1.x (1.1.0 released 2015-03-27): 3 (12%)
    1.2.x (1.2.0 released 2016-05-18): 2 (8%)

  3.x (3.0.0 released 2015-10-07): 16 (62%)
    3.1.x (3.1.0 released 2016-05-10): 3 (12%)
    3.2.x (3.2.0 released 2016-11-17): 7 (27%)
    3.3.x (3.3.0 released 2017-06-15): 6 (23%)

  Latest release: 3.3.2 (2017-08-17)

  apps without dependency: 3 (11%)
  apps with dependency: 25 (89%)

  git checkouts: 0
  local path dep: 0

  0.x (0.0.1 released 2013-09-17): 25 (100%)
    0.3.x (0.3.0 released 2013-10-24): 25 (100%)

  Latest release: 0.3.3 (2015-10-15)

  apps without dependency: 13 (46%)
  apps with dependency: 15 (54%)

  git checkouts: 0
  local path dep: 0

  0.x (0.0.1 released 2015-06-05): 15 (100%)
    0.8.x (0.8.0 released 2016-05-12): 1 (7%)
    0.9.x (0.9.0 released 2016-08-31): 14 (93%)

  Latest release: 0.9.0 (2016-08-31)

  apps without dependency: 17 (61%)
  apps with dependency: 11 (39%)

  git checkouts: 0
  local path dep: 0

  0.x (0.0.1 released 2013-04-18): 11 (100%)
    0.2.x (0.2.0 released 2014-06-25): 11 (100%)

  Latest release: 0.2.2 (2015-08-14)

  apps without dependency: 10 (36%)
  apps with dependency: 18 (64%)

  git checkouts: 0
  local path dep: 0

  0.x (0.0.1 released 2012-06-15): 1 (6%)
    0.1.x (0.1.0 released 2012-12-21): 1 (6%)

  1.x (1.0.0 released 2013-05-10): 10 (56%)
    1.1.x (1.1.0 released 2013-10-01): 10 (56%)

  2.x (2.0.2 released 2016-04-20): 7 (39%)
    2.0.x (2.0.2 released 2016-04-20): 1 (6%)
    2.1.x (2.1.0 released 2016-08-17): 6 (33%)

  Latest release: 2.1.0 (2016-08-17)

  apps without dependency: 3 (11%)
  apps with dependency: 25 (89%)

  git checkouts: 3 (12%)
  local path dep: 0

  0.x (0.1.0 released 2013-09-24): 25 (100%)
    0.6.x (0.6.0 released 2014-07-31): 1 (4%)
    0.7.x (0.7.0 released 2014-12-10): 1 (4%)
    0.8.x (0.8.0 released 2015-02-27): 5 (20%)
    0.10.x (0.10.0 released 2016-04-04): 5 (20%)
    0.11.x (0.11.0 released 2016-12-31): 1 (4%)
    0.12.x (0.12.0 released 2017-03-01): 2 (8%)
    0.13.x (0.13.0 released 2017-04-30): 2 (8%)
    0.14.x (0.14.0 released 2017-07-07): 8 (32%)

  Latest release: 0.14.0 (2017-07-07)

  apps without dependency: 1 (4%)
  apps with dependency: 27 (96%)

  git checkouts: 1 (4%)
  local path dep: 0

  2.x (2.0.0 released 2012-11-30): 1 (4%)
    2.1.x (2.1.0 released 2013-01-18): 1 (4%)

  3.x (3.0.0 released 2013-03-28): 25 (93%)
    3.1.x (3.1.0 released 2013-05-03): 1 (4%)
    3.3.x (3.3.0 released 2014-07-17): 7 (26%)
    3.4.x (3.4.0 released 2016-03-14): 17 (63%)

  4.x (4.0.0 released 2017-01-26): 1 (4%)
    4.0.x (4.0.0 released 2017-01-26): 1 (4%)

  Latest release: 4.0.0 (2017-01-26)

  apps without dependency: 12 (43%)
  apps with dependency: 16 (57%)

  git checkouts: 0
  local path dep: 0

  0.x (0.1.0 released 2015-12-01): 16 (100%)
    0.5.x (0.5.0 released 2016-06-08): 1 (6%)
    0.6.x (0.6.0 released 2016-09-01): 15 (94%)

  Latest release: 0.6.2 (2017-03-28)

  apps without dependency: 1 (4%)
  apps with dependency: 27 (96%)

  git checkouts: 0
  local path dep: 0

  5.x (5.0.0 released 2012-12-11): 1 (4%)
    5.4.x (5.4.0 released 2013-02-06): 1 (4%)

  6.x (6.0.0 released 2013-03-28): 1 (4%)
    6.5.x (6.5.0 released 2014-02-18): 1 (4%)

  7.x (7.0.0 released 2014-03-31): 3 (11%)
    7.2.x (7.2.0 released 2014-07-18): 3 (11%)

  9.x (9.0.0 released 2015-01-30): 6 (22%)
    9.1.x (9.1.0 released 2015-03-06): 2 (7%)
    9.2.x (9.2.0 released 2015-07-08): 2 (7%)
    9.5.x (9.5.0 released 2015-11-11): 2 (7%)

  10.x (10.0.0 released 2016-06-08): 16 (59%)
    10.0.x (10.0.0 released 2016-06-08): 1 (4%)
    10.3.x (10.3.0 released 2016-09-02): 3 (11%)
    10.4.x (10.4.0 released 2017-01-25): 4 (15%)
    10.5.x (10.5.0 released 2017-06-09): 8 (30%)

  Latest release: 10.5.0 (2017-06-09)

  apps without dependency: 0
  apps with dependency: 28 (100%)

  git checkouts: 0
  local path dep: 0

  4.x (4.0.0 released 2012-11-30): 2 (7%)
    4.0.x (4.0.0 released 2012-11-30): 1 (4%)
    4.7.x (4.7.0 released 2014-02-05): 1 (4%)

  5.x (5.0.0 released 2014-02-05): 10 (36%)
    5.5.x (5.5.0 released 2014-07-07): 2 (7%)
    5.9.x (5.9.0 released 2015-01-30): 1 (4%)
    5.11.x (5.11.0 released 2015-03-17): 1 (4%)
    5.12.x (5.12.0 released 2015-03-24): 1 (4%)
    5.13.x (5.13.0 released 2015-04-10): 1 (4%)
    5.14.x (5.14.0 released 2015-07-02): 2 (7%)
    5.18.x (5.18.0 released 2016-01-21): 2 (7%)

  6.x (6.0.0 released 2016-01-21): 16 (57%)
    6.3.x (6.3.0 released 2016-07-01): 1 (4%)
    6.7.x (6.7.0 released 2016-09-27): 5 (18%)
    6.10.x (6.10.0 released 2017-05-17): 6 (21%)
    6.11.x (6.11.0 released 2017-08-10): 4 (14%)

  Latest release: 6.11.0 (2017-08-10)

  apps without dependency: 4 (14%)
  apps with dependency: 24 (86%)

  git checkouts: 0
  local path dep: 0

  0.x (0.0.1 released 2014-02-05): 24 (100%)
    0.1.x (0.1.0 released 2014-09-05): 2 (8%)
    0.3.x (0.3.0 released 2015-03-18): 2 (8%)
    0.4.x (0.4.0 released 2015-04-10): 5 (21%)
    0.6.x (0.6.0 released 2016-07-07): 4 (17%)
    0.7.x (0.7.0 released 2017-01-24): 1 (4%)
    0.8.x (0.8.0 released 2017-02-07): 10 (42%)

  Latest release: 0.8.0 (2017-02-07)

  apps without dependency: 24 (86%)
  apps with dependency: 4 (14%)

  git checkouts: 0
  local path dep: 0

  5.x (5.0.0 released 2014-02-11): 1 (25%)
    5.0.x (5.0.0 released 2014-02-11): 1 (25%)

  6.x (6.0.0 released 2016-01-26): 3 (75%)
    6.0.x (6.0.0 released 2016-01-26): 1 (25%)
    6.1.x (6.1.0 released 2017-02-17): 2 (50%)

  Latest release: 6.2.0 (2017-08-29)

  apps without dependency: 11 (39%)
  apps with dependency: 17 (61%)

  git checkouts: 0
  local path dep: 0

  2.x (2.0.0 released 2012-11-30): 2 (12%)
    2.1.x (2.1.0 released 2013-07-22): 2 (12%)

  5.x (5.0.0 released 2014-03-18): 9 (53%)
    5.1.x (5.1.0 released 2014-06-05): 7 (41%)
    5.2.x (5.2.0 released 2015-10-12): 2 (12%)

  6.x (6.0.0 released 2016-01-22): 6 (35%)
    6.0.x (6.0.0 released 2016-01-22): 1 (6%)
    6.1.x (6.1.0 released 2016-09-28): 2 (12%)
    6.2.x (6.2.0 released 2016-12-13): 3 (18%)

  Latest release: 6.3.1 (2017-06-15)

  apps without dependency: 1 (4%)
  apps with dependency: 27 (96%)

  git checkouts: 1 (4%)
  local path dep: 0

  5.x (5.0.0 released 2012-11-30): 1 (4%)
    5.6.x (5.6.0 released 2013-02-02): 1 (4%)

  6.x (6.0.0 released 2013-03-28): 1 (4%)
    6.7.x (6.7.0 released 2013-10-29): 1 (4%)

  7.x (7.0.0 released 2014-03-31): 3 (11%)
    7.1.x (7.1.0 released 2014-07-18): 3 (11%)

  9.x (9.0.0 released 2015-01-30): 6 (22%)
    9.0.x (9.0.0 released 2015-01-30): 1 (4%)
    9.1.x (9.1.0 released 2015-04-16): 1 (4%)
    9.4.x (9.4.0 released 2015-09-03): 1 (4%)
    9.7.x (9.7.0 released 2015-11-30): 2 (7%)
    9.8.x (9.8.0 released 2016-02-05): 1 (4%)

  10.x (10.0.0 released 2016-06-08): 3 (11%)
    10.0.x (10.0.0 released 2016-06-08): 1 (4%)
    10.3.x (10.3.0 released 2016-11-21): 2 (7%)

  11.x (11.0.0 released 2016-09-13): 13 (48%)
    11.1.x (11.1.0 released 2017-01-13): 2 (7%)
    11.2.x (11.2.0 released 2017-05-18): 4 (15%)
    11.3.x (11.3.0 released 2017-06-13): 3 (11%)
    11.4.x (11.4.0 released 2017-06-28): 4 (15%)

  Latest release: 11.4.0 (2017-06-28)

  apps without dependency: 9 (32%)
  apps with dependency: 19 (68%)

  git checkouts: 0
  local path dep: 0

  0.x (0.0.1 released 2015-02-14): 1 (5%)
    0.3.x (0.3.0 released 2015-07-14): 1 (5%)

  1.x (1.0.1 released 2015-08-06): 3 (16%)
    1.0.x (1.0.1 released 2015-08-06): 1 (5%)
    1.1.x (1.1.0 released 2016-05-10): 2 (11%)

  2.x (2.0.0 released 2016-11-29): 15 (79%)
    2.0.x (2.0.0 released 2016-11-29): 8 (42%)
    2.2.x (2.2.0 released 2017-05-25): 7 (37%)

  Latest release: 2.2.0 (2017-05-25)

  apps without dependency: 3 (11%)
  apps with dependency: 25 (89%)

  git checkouts: 0
  local path dep: 0

  0.x (0.0.1 released 2014-04-29): 25 (100%)
    0.2.x (0.2.0 released 2014-07-01): 3 (12%)
    0.6.x (0.6.0 released 2015-01-14): 2 (8%)
    0.7.x (0.7.0 released 2015-05-14): 7 (28%)
    0.11.x (0.11.0 released 2016-08-25): 13 (52%)

  Latest release: 0.11.0 (2016-08-25)

  apps without dependency: 6 (21%)
  apps with dependency: 22 (79%)

  git checkouts: 0
  local path dep: 0

  0.x (0.0.1 released 2013-07-31): 22 (100%)
    0.2.x (0.2.0 released 2014-12-11): 1 (5%)
    0.3.x (0.3.0 released 2015-04-03): 1 (5%)
    0.4.x (0.4.0 released 2015-09-18): 4 (18%)
    0.5.x (0.5.0 released 2016-03-08): 3 (14%)
    0.6.x (0.6.0 released 2016-08-11): 6 (27%)
    0.7.x (0.7.0 released 2017-06-12): 7 (32%)

  Latest release: 0.7.0 (2017-06-12)

  apps without dependency: 10 (36%)
  apps with dependency: 18 (64%)

  git checkouts: 0
  local path dep: 0

  1.x (1.0.0 released 2013-01-22): 12 (67%)
    1.1.x (1.1.0 released 2013-12-06): 7 (39%)
    1.99.x (1.99.0 released 2015-10-31): 5 (28%)

  2.x (2.0.0 released 2016-04-11): 6 (33%)
    2.2.x (2.2.0 released 2017-01-23): 6 (33%)

  Latest release: 2.2.3 (2017-08-27)

  apps without dependency: 21 (75%)
  apps with dependency: 7 (25%)

  git checkouts: 0
  local path dep: 0

  0.x (0.0.1 released 2013-11-14): 2 (29%)
    0.2.x (0.2.0 released 2015-11-10): 2 (29%)

  1.x (1.0.0 released 2017-02-01): 5 (71%)
    1.4.x (1.4.0 released 2017-04-11): 3 (43%)
    1.5.x (1.5.0 released 2017-07-20): 2 (29%)

  Latest release: 1.5.1 (2017-08-01)

  apps without dependency: 24 (86%)
  apps with dependency: 4 (14%)

  git checkouts: 1 (25%)
  local path dep: 0

  0.x (0.1.0 released 2016-05-13): 4 (100%)
    0.1.x (0.1.0 released 2016-05-13): 2 (50%)
    0.2.x (0.2.0 released 2017-05-03): 2 (50%)

  Latest release: 0.2.0 (2017-05-03)

  apps without dependency: 26 (93%)
  apps with dependency: 2 (7%)

  git checkouts: 2 (100%)
  local path dep: 0

  2.x ( released unreleased): 2 (100%)
    2.0.x ( released unreleased): 2 (100%)

  No rubygems releases

  apps without dependency: 28 (100%)
  apps with dependency: 0

  git checkouts: 0
  local path dep: 0

  Latest release: 0.6.0 (2017-08-02)

  apps without dependency: 27 (96%)
  apps with dependency: 1 (4%)

  git checkouts: 0
  local path dep: 0

  0.x (0.0.2 released 2015-01-16): 1 (100%)
    0.0.x (0.0.2 released 2015-01-16): 1 (100%)

  Latest release: 0.0.3 (2015-01-21)

  apps without dependency: 26 (93%)
  apps with dependency: 2 (7%)

  git checkouts: 0
  local path dep: 0

  0.x (0.1.0 released 2017-03-30): 2 (100%)
    0.2.x (0.2.0 released 2017-03-30): 2 (100%)

  Latest release: 0.2.2 (2017-08-07)

  apps without dependency: 27 (96%)
  apps with dependency: 1 (4%)

  git checkouts: 1 (100%)
  local path dep: 0

  0.x (0.0.1.pre released 2014-02-21): 1 (100%)
    0.9.x (0.9.0 released 2014-10-27): 1 (100%)

  Latest release: 0.9.1 (2014-12-09)

full-res pan-and-zoom JS viewer on a sufia/hyrax app

Our Digital Collections web app  is written using the samvera/hydra stack, and is currently based on sufia 7.3.

The repository currently has around 10,000 TIFF scanned page and photographic images. They are stored (for better or worse) as TIFFs with no compression, and can be 100MB and up, each. (Typically around 7500 × 4900 pixels). It’s a core use case for us that viewers be able to pan-and-zoom on the full-res images in the browser. OpenSeadragon is the premiere open source solution to this, although some samvera/hydra stack apps use other JS projects that wrap OpenSeadragon with more UI/UX, like UniversalViewer.   All of our software is deployed on AWS EC2 instances.

OpenSeadragon works by loading ’tiles’: Sub-regions of the source image, at the appropriate zoom level,  for what’s in the viewport. In samvera/hydra community it seems maybe popular to use an image server that complies with the IIIF Image API as a tile source, but OpenSeadragon (OSD) can work with a variety of types of tile sources, with an easy plug-in architecture for adding your own too.

Our team ended up spending 4 or 5 weeks investigating various options, finding various levels of suitability, before arriving at a solution that was best for us. I’m not sure if our needs and environment are different than most others in the community; if others are having better success with some solutions than we did; if others are using other solutions not investigated by us. At any rate, I share our experiences hoping to give others a head-start. It can sometimes be difficult for me/us to figure out what use cases, options, or components in the samvera/hyrax stack are mature, production-battle-tested, and/or commonly used, and which are at more of the proof-of-concept stage.

As tile sources for OpenSeadragon, we investigated, experimented with, and developed at least basic proofs of concept for: riiif,, cantaloupe, and pre-generated “Deep Zoom Image” tiles to be stored on AWS S3. We found riiif unsuitable; imgix worked but was going to have some fairly high latency for users; cantaloupe worked pretty fine, but may take fairly expensive server resources to handle heavy load; the pre-generated DZI images actually ended up being what we chose to put into production, with excellent performance and maintainability for minimal cost.

Details on each:


A colleague and I learned about riiif at advanced hydra camp in May and Minneapolis. riiif is a Rails engine that lets you add a IIIF server to a new or existing Rails app. It was easy to set up a simple proof of concept at the workshop. It was easy to incorporate authorization and access controls for your image. I left with the impression that riiif was more-or-less a community standard, and was commonly used in production.

So we started out our work assuming we would be using riiif, and got to work on implementing it.

Knowing that tile generation would likely be CPU-intensive, disk-IO-intensive, and RAM intensive, we decided at the outset that the actual riiif IIIIF image server would live on a different server than our main Rails app, so it could be scaled independently and wouldn’t have resource contention with the main app.  I included the riiif stuff in the same Rails app and repo, but used some rails routing definition tricks so that our main app server(s) would refuse to serve riiif IIIIF routes, and the “image server” would refuse to serve anything but IIIF routes.

Since the riiif image server was obviously also not on the same server as our fedora repo, and shared disk can be expensive and/or unreliable in AWS-land, riiif would be using its HTTPFileResolver to fetch the originals from fedora. Since our originals are big and we figured this would be slow, we planned to give it enough disk space to cache all of them. And additionally worked up code to ‘ping’ the riiif server with an ‘info’ request for all of our images, forcing it to download them to it’s local cache on bootstrapped startup or future image uploads, thereby “pre-loading” them.

Later, in experiments with other tools, I think we saw that downloading even huge files from a fedora on one AWS EC2 to another EC2 on same account is actually pretty quick (AWS internal network is awfully fast), and this may have been a premature optimization. However, it did allow us to do some performance testing knowing that time to download originals from fedora was not a factor.

riiif worked fine in development, and even worked okay with only one user using in a deployed staging app. (Sometimes you had to wait 2-3 seconds for viewer tiles, which is not ideal, but not as disastrous as it got…).

But when we did a test with 3 or 4 (manual, human) users simultaneously opening viewers (on the same or different objects), things got very rough. People waiting 10+ seconds for tiles, or even timing out on OpenSeadragon’s default 30 second timeout.

We spent a lot of time trying to diagnose and trying different things. Among other things, we tried having riiif use GraphicsMagick instead of ImageMagick. When testing image operations individually, GM did perform around 50% faster than IM, and I recommend using it instead of IM wherever appropriate. We also tried increasing the size of our EC2 instance. We tried an m4.large, then a c4.xlarge, and then also keeping our data on a RAID-arrayed EBS trying to increase disk access speeds.   But things were still disastrous under multi-user simultaneous use.

Originally, we were using riiif additionally for generating thumbnails for our ‘show’ pages and search results pages. I had the impression from a slack conversation this was a common choice, and it certainly is convenient if you already have an image server to use it this way. (Also makes it super easy to generate multiple resolutions for responsive srcset attribute). At one point in trying to get riiif working better, we turned off riiif for thumbs, using riiif only on the viewer, to significantly reduce load on the image server. But still, no dice.

After much investigation, we saw that CPU use would often go to 99-100%, and disk IO levels were also through the roof during concurrent use tests. (RAM was actually okay).  Also doing a manual command-line imagemagick  conversion on the server at the same time as concurrent use testing, operations were seen to sometimes take 30+ seconds that would only take a few seconds on an otherwise unloaded server.  We gave up on riiif before diagnosing exactly what was going on (this kind of lower-level OS/infrastructure profiling and diagnosis is kinda hard!), but my guess is saturated disk IO.  If you look at what riiif does, this seems plausible. Riiif will do a separate shell-out to an imagemagick or graphicsmagick command line operation for every image request, if the derivative requested is not yet cached.

If you open up an OpenSeadragon viewer, OSD can start out by sending requests for a dozen+ different tiles. Each one, with a riiif-backed tile source, will result in an independent shell-out to imagemagick/graphicsmagick command line tool, with one of our 100MB+ source image TIFFs as input argument. With several people concurrently using the viewer, this could be several dozens of imagemagick shellouts, each trying to use a 100MB+ file on disk as input.  You could easily imagine this filling up even a fairly fat disk IO pipeline, and/or all of these processes fighting for access to various OS concurrency locks involved in reading from the file system, etc. But this is just hypothesis supported by what we observed, we didn’t totally nail down a diagnosis.

At some point, after spending a week+ on trying to solve this, and with deadlines looming, we decided to explore the other tile source alternatives we’ll discuss, even without being entirely sure what was going on with riiif.

It may be that other people have more success with riiif. Perhaps they are using smaller original sources; or running on AWS EC2 may have exacerbated things for us; or we just had bad luck for some as of yet undiscovered reason.

But we got curious how many people were using riiif in production, and what their image corpus looked like. We asked on samvera-tech listserv, and got only a handful of answers, and none of them were using riiif! I have an in-progress side-project I’m working on that gives some dependency-use statistics for public github repos holding hydra/samvera apps — of 20 public repos I had listed, 3 of them had riiif as a dependency, but I’m not sure how many of those are in production.   I did happen to talk to one other developer on samvera slack who confirmed they were having similar problems to ours. Still interested in hearing from anyone that is using riiif successfully in production, and if so what your source images are like, and how many concurrent viewers it can handle.


Wanting to try alternatives to riiif, the obvious choice was another IIIF server. There aren’t actually a whole lot of mature, reliable, open source IIIF server options, I think IIIF as a technology hasn’t caught on much outside of library/cultural heritage digital repositories. We knew from the samvera-tech listserv question that Loris (written in python) was a popular choice in the community, but Loris is currently looking for a new maintainer, which gave us pause.

We eventually decided on Cantaloupe, written in Java, as the best bet to try first. While it being in Java gave us a bit of concern, as nobody on the team is much of a Java dev, even without being Java experts we could tell looking at the source code that it was impressively clean and readable. The docs are good, and revealed attention to performance issues. The Github repo has all the signs of being an active and well-maintained project.

Cantaloupe too was having a bit of performance trouble when we tried using it for show-page thumbnails too (we have a thumb for every page in a work on our ‘show’ page, as in default sufia/hyrax, and our works can have 200+ pages). So we decided fairly early on that we’d just keep using a pre-generated derivative for thumbs, and stick to our priority use case, the viewer, in our tests of all our alternatives from here on out.

And then, Cantaloupe did pretty well, so long as it had a powerful and RAM-ful enough EC2.

Cantaloupe lets you configure caching separately for originals and derivatives, but even with no caching turned on, it was somehow doing noticeably better than riiif. Max 1-2 second wait times even with 3-4 simultaneous viewers. I’m not really sure how it pulled off doing better, but it did!

Our largest image is a whopping 1G TIFF. When we asked cantaloupe to generate derivatives for that, it unfortunately crashed with a Java OOM, and was then unresponsive until it was manually restarted. We gave the server and cantaloupe more RAM, now it handled that fine too (although our EC2 was getting more expensive). We hadn’t quite figured out how to approach defining how many simultaneous viewers we needed to support and how much EC2 was necessary to do that before moving on to other alternatives.

We started out running cantaloupe on an EC2 c4.xlarge (4 core Xeon E5 and 7.5 GB RAM), and ended up with a m4.2xlarge (8 core and 32 GB RAM) which could handle our 1G image, and generally seemed to handle load better with lower latency.  We used the JAI image processor for Cantaloupe. (It seemed to perform better than the Java2D processor; Cantaloupe also supports ImageMagick or GraphicsMagick as a processor, but we didn’t get to trying those out much).


If you’re not using riiif, and have images meant to be only available to certain/all logged-in users, you need to think about auth. With any external image server, you could do auth by proxying all access through your rails app, which would check auth in the usual way. But I worry about taking up web worker processes/threads in the main app with dozens of image requests. It would be better to keep the servers entirely separate.

There also might be a way to have apache/nginx proxying directly, rather than through the rails app, which would make me more comfortable, but you’d have to figure out how to use a custom auth plugin for apache or nginx.

Cantaloupe also has the very nice option of writing your own custom auth in ruby (even though the server is Java; thanks JRuby!), so we could actually check the existing Rails session (just make sure the cantaloupe server knows your Rails secret key, and figure out the right classes to call to decrypt and load data from the session cookie), and then Fedora/Solr to check auth in the usual samvera way.

Any live checking of auth before delivering an image tile is of course going to increase image response latency.

These were the options we thought of, but we didn’t get to any of them before ultimately deciding to choose pre-generated tile images.

However, Cantaloupe was definitely our second choice — and first choice if we really were to need need a full IIIIF server — it for sure looked like it could have worked well, although at potentially somewhat expensive AWS charges. is a commercial cloud-hosted image server.  I had a good opinion of it from using it for thumbnail-generation on ecommerce projects while working at Friends of the Web last year.  Imgix pricing is pretty affordable.

Imgix does not conform to IIIF API, but it does do pretty much all the image operations that IIIF can do, plus more. Definitely everything we needed for an OpenSeadragon tile source.

I figured, let’s give it a try, get out of the library/cultural-heritage silo, and use a popular, successful, well-regarded commercial service operating in the general web app space.

OpenSeadragon can not use as a tile source out of the box, but OSD makes it pretty easy to write your own tile source. In a day I had a working proof of concept for an OSD tile source, and in a couple more had filed off all the rough edges.

It totally worked. But. It was slow. was willing to take our 100MB TIFF sources, but it was clear this was not really the use case it was designed for.  It was definitely slow downloading our original sources from our web app–the difference, I guess, between downloading directly from fedora on the same AWS subnet, and downloading via our Rails app from who knows where. (I did have to make a bugfix/improvement to samvera code to make sure HTTP headers were delivered quicker for a streaming download, to avoid timing out imgix. Once that was done, no more imgix timeout problems).  We tried pinging it to “pre-load” all originals as we had been doing with riiif — but as a cloud service, and one not expecting originals to be so huge, we had no control over when imgix purged originals from cache, and we found it did sometimes purge not-recently-accessed originals fairly quickly.

Also imgix has a (not really unreasonable) 512MB max for original images; our one 1G TIFF was not gonna be possible (and of course, that’s the one you really want a pan-and-zoom viewer for, it’s huge!).

On the plus side:

  • with imgix, we don’t need to worry about the infrastructure at all, it’s outsourced. We don’t need to plan some redundancy for max uptime or scaling for heavy use, they’re already doing it.
  • The response times are unlikely to change under heavy use, it’s already a cloud-scale service designed for heavy use.
  • Can handle the extra load of using it for thumbs too, just as well as it can for viewer tiles.
  • Our cost estimates had it looking cheaper (by 50%+) than hosting our own Cantaloupe on an EC2.
  • Once originals and derivatives being accessed (say tiles for a given viewer) were cached, it was lightning fast, reliably just 10s of ms for a tile image. But again, you have no control over how long these stay in cache before being purged.

For non-public images, imgix offers signed-and-expiring URLs.  The downside of these is you kind of lose HTTP cacheability of your images. And doesn’t provide any way to auth imgix to get your originals, so if they’re not public you would have to put in some filters recognizing IP addresses (which are subject to change, although they’re good at giving you advance notice), and let them in to private images.

But ultimately the latency was just too high. For images where the originals were cached but not the derivatives, it could take 1-4 seconds to get our tile derivatives; if the originals were not cached, it could take 10 or higher.  (One option would be trying to give it a much smaller lzw or zip compressed TIFF as a source, instead of our uncompressed original originals, cutting down transfer time for fetching originals. But I think this would be probably unlikely to improve latency sufficiently, and we moved on to pre-generated DZI. We would need to give imgix a lossless full-res original of some kind, cause full-res zoom is the whole goal here!)

I think imgix is potentially a workable last resort (and I still love it for creating thumbs for more reasonably sized sources), but it wasn’t as good an option as other alternatives explored for this use case, a tile source for enormous TIFFs.

Pre-Generated Deep Zoom Tiles

Eventually we came back to an earlier idea we originally considered, but then assumed would be too expensive and abandoned/forgot about.  When I realized Cantaloupe was recommending pyramidal TIFFs , which require some preprocessing/prerendering anyway, why not go all the way and preprocess/prerender every tile, and store them somewhere (say, cheap S3?)?  OpenSeadragon has a number of tile sources it supports that are or can be pre-generated images, including the first format it ever supported, Deep Zoom Image (file suffix .dzi).   (I had earlier done a side-project using OpenSeadragon and Deep Zoom tiles to put the awesome Beehive Collective Mesoamerica Resiste poster online).

But then we learned about riiif and it looked cool and we got on that tip, and kind of assumed pre-generating so many images would be unworkable. It took us a while to get back to exploring pre-generated Deep Zoom tiles.  But it actually works great (of course we had learned a lot of domain knowledge about manipulating giant images and tile sources at this point!).

We use vips (rather than imagemagick or graphicsmagick) to generate all the tiles. vips is really optimized for speed, CPU and RAM usage, and if you’re creating all the tiles at once vips can read the source image in once and slice it up into tiles.  We do this as a background job, that we have running on a different server than the Rails app; the built-in sufia bg jobs still run on our single rails app server. (In sufia 7.3, out of the box they can’t run on a different server without a shared file system; I think this may have been improved in as-yet-unreleased-hyrax-master).

We hook into the sufia/hyrax actor stack to trigger the bg job on file upload. A small EC2  (t2.medium 4 GB RAM, 2 core CPU) with five resque workers can handle the dzi creation much faster than the existing actor stack can trigger them when doing a bulk ingest (the existing actor stack is slow, and the dzi creation can’t happen until the file is actually in fedora, so that the job can retrieve it from fedora to make it into dzi tiles. So DZI’s can’t be generated any faster than sufia can do the ingests no matter what).  The files are uploaded to an S3 bucket.

We also provide a rake task to create the .dzi files for all Files in our fedora repo, for initial bootstrapping or if corruption ever needs to be fixed, etc. For our 8000-file staging server, running the creation task on our EC2 t2.medium, it takes around 7 hours to create and upload them all to S3 (I use some multi-threaded concurrency in the uploading), and results in ~3.2 million S3 keys taking up approx 32GB.

Believe it or not, this is actually the cheapest option, taking account of S3 storage and our bg jobs EC2 instance for dzi creation (that we’ll probably try to move more bg jobs to in the future). Cheaper than imgix, cheaper than our own Cantaloupe on an EC2 big enough to handle it.

If you have 800K or 8 million images instead of 8000, it’ll get more complicated/expensive. But S3 is so cheap, and a spot-priced temporary fleet of EC2s to do a mass dzi-creation ingest affordable enough you might be surprised how affordable it is. Alas fedora makes it a lot less straightforward to parallelize ingest than if it were a more conventional stack, but there’s probably some way to do it. Unless/until fedora itself becomes your bottleneck. There are costs to our stack.

It handles our 1GB original source just fine (takes about 30 seconds to create all tiles for this one). It’s also definitely fast for the end-user. Once the tiles are pre-generated, it’s just a request from S3. Which I’m seeing taking like 40-80ms in Chrome Dev Tools. Under a really lot of load (I’m guessing 100+ users concurrently using viewer), or to reduce latency beyond that 40-80ms, the standard approach would be to put a CDN in front of S3.  Probably either Amazon’s own CloudFront, or CloudFlare. This should be simple and affordable. But this is already reliably faster than any of our other options, and can handle way more concurrent load without a CDN compared to our other options, we aren’t worrying about it for now.  When/if we want to add a CDN, it oughta only be a couple clicks and a hostname change to implement.

And of course, there’s very little server maintenance to deal with, once the files are generated, they’re just static assets on S3, there’s nothing to “crash” really. (Well, except S3 itself, which happens very occasionally. If you wanted to be very safe, you’d mirror your S3 bucket to another AWS region or another cloud provider). Just one pretty standard and easily-scalable bg-job-running server for creating the DZIs on image upload.

We’re still punting on auth for now. Which talking on slack channel, seems to be a common choice with auth and IIIF image servers. One dev told me they just didn’t allow non-public images to be viewed in the image viewer (or via their image server) at all, which I guess works if your non-public images are all really just in-progress-in-workflow only viewable to staff.  As is true for us here. Another dev told me they just don’t worry about it, no links will be generated to non-public images, but they’ll be there via the image server without auth — which again works if your non-public images aren’t actually sensitive or legally un-shareable, they’re just in-process-not-quite-ready. Which is also true for us, for now anyway.  (I would not ever count on “nobody knows the URL”-based-security for actual sensitive or legally un-shareable content, for anything where it actually matters if someone happens to come across it. For our current and foreseeable future content, it doesn’t really. knock on wood. It does make me nervous!).

There are some auth options with the S3 approach, read about them as well as some internal overview documentation of what we’ve done in our code here, or see our PR  for initial implementation of the pre-generated-DZI-on-S3 approach for our complete solution.  Pre-generated DZI on S3 is indeed the approach we are going with.

IIIF vs Not

Some readers may have noticed that two of the alternatives we examined are not IIIF servers, and the one we ended up with — pre-generated DZI tiles — is not a dynamic image server at all. You may be reacting with shocked questions: You can do that? But what are you missing? Can you still use UniversalViewer? What about those other IIIF things?

Well, the first thing we miss is truly dynamic image generation. We instead need to pre-generate all the image derivatives we need upon image upload, including the DZI tiles. If we wanted a feature like, say, user enters a number of pixels and we deliver a JPG scaled to the user-specified width, a dynamic image server would be a lot more convenient. But I only thought of that feature when brainstorming for things that would be hard without a dynamic image server, it’s not something we are likely to prioritize. For thumbs and downloads at various preset sizes, pre-generating should work just fine with regards to performance and cost, especially with a bg job running on it’s own jobs server and stored on S3 (both don’t happen out of the box on sufia, but may in latest unreleased-master hyrax).

So, UniversalViewer. UniversalViewer uses OpenSeadragon to do the actual pan-and-zoom viewer.  Mirador  seems to as well. I think OpenSeadragon is pretty much the only viable open source JS pan-and-zoom viewer, which is fine, because OSD is pretty great.  UV, I believe, just wraps OSD in some additional UI/UX, with some additional features like table of contents viewing, downloads, etc.

We decided, even when we were still planning on using riiif, to not use UniversalViewer but instead develop directly with OpenSeadragon. Some of the fancier UV features we didn’t really need right now, and it was unclear if it would take more time to customize UV UX to our needs, or just develop a relatively light-weight UI of our own on top of OSD.

As these things do, our UI took slightly longer to develop than estimated, but it was still relatively quick, and we’re pretty happy with what we’ve got.  It is still subject to changes as we continue to user-test — but developing our own gives us a lot more control of the UI/UX to respond to such.  Later it turned out in other useful non-visual UX ways too — in our DZI implementation, we put something in our front-end that, if the dzi file is not available on S3, automatically degrades to a smaller not-very-zoomable image with an apology/warning message.  I’m not sure if I would have wanted to try and hack that into UV.

So using OpenSeadragon directly, we don’t need to give it an IIIF Image API URL, we can give it anything it handles (or you write a plugin for), and it works just fine. No code changes necessary except giving it a URL pointing to a different thing. No problem, everything just worked, it required no extra work in our front-end to use DZI instead of IIIF. (We did do some extra work to add some feature toggles so we could switch between various back-ends easily). No problem at all, the format of your tile source, so long as OSD can handle it, is a very loosely coupled dependency.

But what if you want to use UV or Mirador? (And we might in the future ourselves, if we need features they provide and we discover they are non-trivial to develop in our homegrown UI).  They take IIIF as input, right?

To be clear, we need to distinguish between the IIIF Image API (the one where a server provides image derivatives on demand), and the IIIF Manifest spec. The Manifest spec, part of the IIIF Presentation API, defines a JSON-ld file that “represents a single object and any intellectual work or works embodied within that object…  includes the descriptive, rights and linking information for the object… embeds the sequence(s) of canvases that should be rendered to the user.”

It’s the IIIF Manifest that is input to UV or Mirador. Normally these tools would extract one or more IIIF Image API URLs out of the Manifest, and just hand them to OpenSeadragon. Do they do anything else with an IIIF Image API url except hand it to OSD? I’m not sure, but I don’t think so. So if they just handed any other URI that OSD can handle to OSD, it should work fine? I think so.

An IIIF Manifest doesn’t actually need to include an IIIF Image API url.  “If a IIIF Image API service is available for the image, then a link to the service’s base URI should be included.” If. And an IIIF Manifest can include any other sort of image resource, from any external service,  identified by a uri in @context field.  So you can include a link to the .dzi file in the IIIF Manifest now, completely legally, the same IIIF Manifest you’d do otherwise just with a .dzi link instead of an IIIF Image API link — you’d just have to choose a @context URI to identify it as a DZI. Perhaps ``, although that might not be the most reliable URI identifier. But, really, we could be just as standards-compliant as ever and put a DZI URL in the IIIF Manifest instead of an IIIF Image API URL.

Of course, as with all linked data work, standards-compliant doesn’t actually make it interoperable. We need mutually-recognizable shared vocabulary. UV or Mirador would have to recognize the image resource URL supplied in the Manifest as being a DZI, or at any rate at least as something that can be passed to OSD. As far as I know UV or Mirador won’t do this now. It should probably be pretty trivial to get them to, though, perhaps by supporting configuration for “recognize this @context uri as being something you can pass to OSD.”  If we in the future have need for UV/Mirador (or IIIF Manifests), I’d look into getting them to do that, but we don’t right now.

What about these other tools that take IIIF Manifests and aggregate images from different sites?  Probably the same deal, they just gotta recognize an agreed-upon identifier meaning “DZI format”, and know they can pass such to OpenSeadragon.

I’m not sure if any such tools currently exist used for real work or even real recreation, rather than as more of a demo. I’ll always choose to achieve greatness rather than mediocrity for our current actual real prioritized use cases, above engineering for a hoped-for-but-uncertain future. Of course, when you can do both without increasing expense or sacrificing quality for your present use cases, that’s great too, and it’s always good to keep an eye out for those opportunities.

But I’m feeling pretty good about our DZI choice at the moment. It just works so well, cheaply, with minimal expected ongoing maintenance, compared to other options — and works better for end-users too, with reliable nearly instantaneous delivery of tiles even under heavy load. Now, if you have a lot more images than us, the cost-benefit calculus may end up different. Especially because a dynamic image server scales (gets more expensive) with number of concurrent users/requests more or less regardless of number of images, while the pre-gen DZI solution gets more expensive with more images more or less regardless of concurrent request level. If you have a whole lot of images (say two orders of magnitude bigger than our 10K), your app typically gets pretty low use (and you don’t care about it supporting lots of concurrent use), and maybe additionally if your original source images aren’t nearly as large as ours, pre-gen DZI might not be such a sweet spot. However, you might be surprised, pre-generated DZI is in the end just such a simple solution, and S3 storage is pretty affordable.

“Small functions considered harmful”

From a blog post by Cindy Sridharan.

Remind you of any codebases you’ve worked with lately?

Some people seem so enamored with small functions that the idea of abstracting any and every piece of logic that might seem even nominally complex into a separate function is something that is enthusiastically cheered on.

I’ve worked on codebases inherited from folks who’d internalized this idea to such an unholy extent that the end result was pretty hellish and entirely antithetical to all the good intentions the road to it was paved with. In this post, I hope to explain why some of the oft-touted benefits don’t always pan out the way one hopes and the times when some of the ideas can actually prove to be counterproductive.

I think blindly following Rubocop’s dictatorial micro-advice without a human thinking about the macro-level and “does this make the code more readable/maintainable” (and “what are the use-cases for flexibility, what dimensions do we expect to change or be changed? And how do we provide for that?”) can contribute to this.

My main problem with DRY is that it forces one into abstractions — nested and premature ones at that. Inasmuch as it’s impossible to abstract perfectly, the best we can do abstract well enough insofar as we can. Defining “well enough” is hard and is contingent on a large number of factors, some of them being:

— the nature of the assumptions underpinning the abstraction and how likely (and for how long) they are likely to hold water
— the extent to which the layers of abstractions underlying the abstraction in question are prone to remain consistent (and correct) in their implementation and design
— the flexibility and extensibility of both the underlying abstractions as well as any abstraction built on top of the abstraction in question currently
— the requirements and expectations of any future abstractions that might be built on top of the abstraction in question

…DRYing up code to the fullest extent possible right now would mean depriving our future selves of the flexibility to accommodate any changes that might be required. It’s akin to trying to find the perfect fit, when what we really should be optimizing for is to allow ourselves enough leeway to make the inevitable changes that will be required sooner or later.

Or as Sandi Metz has said, “duplication is far cheaper than the wrong abstraction”.

As a result, the cognitive overhead of processing the verbose function (and variable) names, mapping them into the mental model I’ve been building so far, deciding which functions to dig deeper into and which to skim, and piecing together the puzzle to uncover the “big picture” becomes rather difficult.

…This has already been stated before but it bears reiterating — an explosion of small functions, especially one line functions, makes the codebase inordinately harder to read. This especially hurts those for whom the code should’ve been optimized for in the first place — newcomers….

…Simple code isn’t necessarily the easiest code to write, and rarely is it ever the DRYest code. It takes an enormous amount of careful thought, attention to detail and care to arrive at the simplest solution that is both correct and easy to reason about. What is most striking about such hard-won simplicity is that it lends itself to being easily understood by both old and new programmers, for all possible definitions of “old” and “new”.

Actually one of the best essays I’ve seen on code architecture matching my own experiences I’ve seen. I recommend reading the whole thing. 

on hooking into sufia/hyrax after file has been uploaded

Our app (not yet publicly accessible) is still running on sufia 7.3. (A digital repository framework based on Rails, also known in other versions or other drawings of lines as hydra, samvera, and hyrax).

I had a need to hook into the point after a file has been added to fedora, to do some post-processing at that point.

(Specifically, we are trying to run a riiif instance on another server, without a shared file system (shared FS are expensive and/or tricky on AWS). So, the riiif server needs to copy the original image asset down from fedora. Since our original images are uncompressed TIFFs that average around 100MB, this is somewhat slow, and we want to have the riiif server “pre-load” at least the originals, if not the derivatives it will create. So after a new image is uploaded, we want to ‘ping’ the riiif server with an info request, causing it to download the original, so it’s there waiting for conversion requests, and at least it won’t have to do that. But it can’t pull down the file until it’s in fedora, so we need to wait until after fedora has it to ping. phew.)

Here are the cooperating objects in Sufia 7.3 that lead to actual ingest in Fedora. As far as I can tell. Much thanks to @jcoyne for giving me some pointers as to where to look to start figuring this out.

Keep in mind that I believe “actor” is just hydra/samvera’s name for a service object involved in handling ‘update a persisted thing’. Don’t get it confused with the concurrency notion of an ‘actor’, it’s just an ordinary fairly simple ruby object (although it can and often does queue up an ActiveJob for further processing).

The sufia default actor stack at ActorFactory includes the Sufia::CreateWithFilesActor.


  • AttachFilesToWork job does some stuff, but then calls out to a CurationConcerns::Actors::FileSetActor#create_content. (we are using curation_concerns 1.7.7 with sufia 7.3) — At least if it was a direct file upload (I think is what this means). If the file was a `CarrierWave::Storage::Fog::File` (not totally sure in what circumstances it would be), it instead kicks off an ImportUrlJob.  But we’ll ignore that for now, I think the FileSetActor is the one my codepath is following. 





  • We are using hydra-works 0.16.0. AddFileToFileSet I believe actually finishes things off synchronously without calling out to anything else related to ‘get this thing into fedora’. Although I don’t really totally understand what the code does, honestly.
    • It does call out to Hydra::PCDM::AddTypeToFile, which is confusingly defined in a file called add_type.rb, not add_type_to_file.rb. (I’m curious how that doesn’t break things terribly, but didn’t look into it).


So in summary, we have six fairly cooperating objects involved in following the code path of “how does a file actually get added to fedora”.  They go across 3-4 different gems (sufia, curation_concerns, hydra-works, and maybe hydra-pcdm, although that one might not be relevant here). Some of the classes involved inherit from, mix-in, or have references to classes from other gems. The path involves at least two (sometimes more in some paths?) bg jobs — a bg job that queues up another bg job (and maybe more).

That’s just trying to follow the path involved in “get this uploaded file into fedora”, some  of those cooperating objects also call out to other cooperating objects (and maybe queue bg jobs?) to do other things, involving a half dozenish additional cooperating objects and maybe one or two more gem dependencies, but I didn’t trace those, this was enough!

I’m not certain how much this changed in hyrax (1.0 or 2.0), at the very least there’d be one fewer gem dependency involved (since Sufia and CurationConcerns were combined into Hyrax). But I kind of ran out of steam for compare and contrast here, although it would be good to prepare for the future with whatever I do.

Oh yeah, what was I trying to do again?

Hook into the point “after the thing has been successfully ingested in fedora” and put some custom code there.

So… I guess…  that would be hooking into the ::IngestFileJob (located in CurationConcerns), and doing something after it’s completed. It might be nice to use the ActiveJob#after_perform hook to this.  I actually hadn’t known about that callback, haven’t used it before — we’d need to get at least the file_set arg passed into it, which the docs say maybe you can get from the passed-in job.arguments?  That’s a weird way to do things in ruby (why aren’t ActiveJob’s instances with their state as ordinary state? I dunno), but okay! Or, of course we could just monkey-patch override-and-call-super on perform to get a hook.

Or we could maybe hook into Hydra::Works::AddFileToFileSet instead, I think that does the actual work. There’s no callbacks there, so that’d just be monkey-patch-and-call-super on #call, I guess.

This definitely seems a little bit risky, for a couple different reasons.

  • There’s at least one place where a potentially different path is followed, if you’re uploading a file that ends up as a CarrierWave::Storage::Fog::File instead of a CarrierWave::SanitizedFile.  Maybe there are more I missed? So configuration or behavior changes in the app might cause my hook to be ignored, at least in some cases.


  • Forward-compatibility seems unreliable. Will this complicated graph of cooperating instances get refactored?  Has it already in future versions of Hyrax? If it gets refactored, will it mean the object I hook into no longer exists (not even with a different namespace/name), or exists but isn’t called in the same way?  In some of those failure modes, it might be an entirely silent failure where no error is ever raised, my code I’m trying to insert just never gets called. Which is sad. (Sure, one could try to write a spec for this to warn you…  think about how you’d do that. I still am.)  Between IngestFileJob and AddFileToFileSet, is one ‘safer’ to hook into than the other? Hard to say. If I did research in hyrax master branch, it might give me some clues.

I guess I’ll probably still do one of these things, or find another way around it. (A colleague suggested there might be an entirely different place to hook into instead, not the ‘actor stack’, but maybe in other code around the controller’s update action).

What are the lessons?

I don’t mean to cast any aspersions on the people who put in a lot of work, very well-intentioned work, conscientious work, to get hydra/samvera/sufia/hyrax where it is, being used by lots of institutions. I don’t mean to say that I could or would have done differently if I had been there when this code was written — I don’t know that I could or would have.

And, unfortunately, I’m not saying I have much idea of what changes to make to this architecture now, in the present environment, with regards to backwards compat, with regards to the fact that I’m still on code one or two major versions (and a name change) behind current development (which makes the local benefit from any work I put into careful upstream PR’s a lot more delayed, for a lot more work; I’m not alone here, there’s a lot of dispersion in what versions of these shared dependencies people are using, which adds a lot of cost to our shared development).  I don’t really! My brain is pretty tired after investigating what it’s already doing. Trying to make a shared architecture which is easily customizable like this is hard, no ways around it.  (ActiveSupport::Callbacks are trying to do something pretty analogous to the ‘actor stack’, and are one of the most maligned parts of Rails).

But I don’t think that should stop us from some evaluation.  Going forward making architecture that works well for us is aided immensely by understanding what has worked out how in what we’ve done before.

If the point of the “Actor stack” was to make it easy/easier to customize code in a safe/reliable way (meaning reasonably forward-compatible)–and I believe it was–I’m not sure it can be considered a success. We gotta start with acknowledging that.

Is it better than what it replaced?  I’m not sure, I wasn’t there for what it replaced. It’s probably harder to modify in the shared codebase going forward than the presumably simpler thing it replaced though… I can say I’d personally much rather have just one or two methods, or one ActiveJobs, that I just hackily monkey-patch to do what I want, that if it breaks in a future version will break in a simple way, or one that takes less time and brain to figure out what’s going on anyway. That wouldn’t be a great architecture, but I’d prefer it to what’s there now, I think.  Of course, it’s a pendulum, and the grass is always greener, if I had that, I’d probably be wanting something cleaner, and maybe arrive at something like the ‘actor stack’ — but now we’re all here now with what we’ve got, so we can at least consider that this may have gone in some unuseful directions.

What are those unuseful directions?  I think, not just in the actor stack, but in many parts of hydra, there’s an ethos that breaking things into many very small single-purpose classes/instances is the way to go, then wiring them all together.  Ideally with lots of dependency injection so you can switch in em and out.  This reminds me of what people often satirize and demonize in stereotypical maligned Java community architecture, and there’s a reason it’s satirized and demonized. It doesn’t… quite work out.

To pull this off well — especially in shared library/gem codebase, which I think has different considerations than a local bespoke codebase, mainly that API stability is more important because you can’t just search-and-replace all consumers in one codebase when API changes — you’ve got to have fairly stable APIs, which are also consistent and easily comprehensible and semantically reasonable.   So you can replace or modify one part, and have some confidence you know what it’s doing, when it will be called, and that it will keep doing this for at least a few months of future versions. To have fairly stable and comfortable APIs, you need to actually design them carefully, and think about developer use cases. How are developers intended to intervene in here to customize? And you’ve got to document those. And those use cases also give you something to evaluate later — did it work for those use cases?

It’s just not borne out by experience that if you make everything into as small single-purpose classes as possible and throw them all together, you’ll get an architecture which is infinitely customizable. You’ve got to think about the big picture. Simplicity matters, but simplicity of the architecture may be more important than simplicity of the individual classes. Simplicity of the API is definitely more important than simplicity of internal non-public implementation. 

When in doubt if you’re not sure you’ve got a solid stable comfortable API,  fewer cooperating classes with clearly defined interfaces may be preferable to  more classes that each only have a few lines. In this regard, rubocop-based development may steer us wrong, too much to the micro-, not enough to the forest.

To do this, you’ve got to be careful, and intentional, and think things through, and consider developer use cases, and maybe go slower and support fewer use cases.  Or you wind up with an architecture that not only does not easily support customization, but is very hard to change or improve. Cause there are so many interrelated coupled cooperating parts, and changing any of them requires changes to lots of them, and breaks lots of dependent code in local apps in the process. You can actually make forwards-compatible-safe code harder, not easier.

And this gets even worse when the cooperating objects in a data flow are spread accross multiple gems dependencies, as they often are in the hydra/samvera stack. If a change in one requires a change in another, now you’ve got dependency compatibility nightmares to deal with too. Making it even harder (rather than easier, as was the original goal) for existing users to upgrade to new versions of dependencies, as well as harder to maintain all these dependencies.  It’s a nice idea, small dependencies which can work together — but again, it only works if they have very stable and comfortable APIs.  Which again requires care and consideration of developer use cases. (Just as the Java community gives us a familiar cautionary lesson about over-architecture, I think the Javascript community gives us a familiar cautionary lesson about ‘dependency hell’. The path to code hell is often paved with good intentions).

The ‘actor stack’ is not the only place in hydra/samvera that suffers from some of these challenges, as I think most developers in the stack know.  It’s been suggested to me that one reason there’s been a lack of careful, considered, intentional architecture in the stack is because of pressure from institutions and managers to get things done, why are you spending so much time without new features?  (I know from personal experience this pressure, despite the best intentions, can be even stronger when working as a project-based contractor, and much of the stack was written by those in that circumstance).

If that’s true, that may be something that has to change. Either a change to those pressures — or resisting them by not doing rearchitectures under those conditions. If you don’t have time to do it carefully, it may be better not to commit the architectural change and new API at all.  Hack in what you need in your local app with monkey-patches or other local code instead. Counter-intuitively, this may not actually increase your maintenance burden or decrease your forward-compatibility!  Because the wrong architecture or the wrong abstractions can be much more costly than a simple hack, especially when put in a shared codebase. Once a few people have hacked it locally and seen how well it works for their use cases, you have a lot more evidence to abstract the right architecture from.

But it’s still hard!  Making a shared codebase that does powerful things, that works out of the box for basic use cases but is still customizable for common use cases, is hard. It’s not just us. I worked last year with spree/solidus, which has an analogous architectural position to hydra/samvera, also based on Rails, but in ecommerce instead of digital repositories. And it suffers from many of the same sorts of problems, even leading to the spree/solidus fork, where the solidus team thought they could do better… and they have… maybe… a little.  Heck, the challenges and setbacks of Rails itself can be considered similarly.

Taking account of this challenge may mean scaling back our aspirations a bit, and going slower.   It may not be realistic to think you can be all things to all people. It may not be realistic to think you can make something that can be customized safely by experienced developers and by non-developers just writing config files (that last one is a lot harder).  Every use case a participant or would-be participant has may not be able to be officially or comfortably supported by the codebase. Use cases and goals have to be identified, lines have to drawn. Which means there has to be a decision-making process for who and how they are drawn, re-drawn, and adjudicated too, whether that’s a single “benevolent dictator” person or institution like many open source projects have (for good or ill), or something else. (And it’s still hard to do that, it’s just that there’s no way around it).

And finally, a particularly touchy evaluation of all for the hydra/samvera project; but the hydra project is 5-7 years old, long enough to evaluate some basic premises. I’m talking about the twin closely related requirements which have been more or less assumed by the community for most of the project’s history:

1) That the stack has to be based on fedora/fcrepo, and

2) that the stack has to be based on native RDF/linked data, or even coupled to RDF/linked data at all.

I believe these were uncontroversial assumptions rather than entirely conscious decisions (edit 13 July, this may not be accurate and is a controversial thing to suggest among some who were around then. See also @barmintor’s response.), but I think it’s time to look back and wonder how well they’ve served us, and I’m not sure it’s well.  A flexible powerful out-of-the-box-app shared codebase is hard no matter what, and the RDF/fedora assumptions/requirements have made it a lot harder, with a lot more uncharted territory to traverse, best practices to be invented with little experience to go on, more challenging abstractions, less mature/reliable/performant components to work with.

I think a lot of the challenges and breakdowns of the stack are attributable to those basic requirements — I’m again really not blaming a lack of skill or competence of the developers (and certainly not to lack of good intentions!). Looking at the ‘actor stack’ in particular, it would need to do much simpler things if it was an ordinary ActiveRecord app with paperclip (or better yet shrine), it would be able to lean harder on mature shared-upstream paperclip/shrine to do common file handling operations, it would have a lot less code in it, and less code is always easier to architect and improve than more code. And meanwhile, the actually realized business/institutional/user benefits of these commitments — now after several+ years of work put into it — are still unclear to me.  If this is true or becomes consensual, and an evaluation of the fedora/rdf commitments and foundation does not look kindly upon them… where does that leave us, with what options?