UAlberta had a sufia 6 app in production that was a pretty stock “institutional repository holding PDFs. Around Fall 2015, they started trying to “catch up” to sufia 7 with PCDM etc. — to get features they believed would make it easier to incorporate more ‘digital collections’ content, and to just avoid stale non-maintained dependencies.
In Summer 2017, after having spent almost two years trying to get on sufia 7, with mounting frustrations and still seeming far from the finish line — and after having hired a few non-library-archives-museum-experienced but experienced Rails developers — the University of Alberta libraries development team decided on a radical new approach. They decided it wasn’t clear what Sufia was giving them to justify the difficulty they were having with it. They started over, trying to keep things as close to “ordinary Rails” as possible.
At that time, Fedora still was an institutional requirement. So they didn’t toss out all of the samvera stack. They decided that they’d chop off the trunk as close to the bottom as they could while still getting tools for working with fedora, and to them that meant a hydra-works dependency, but few other hyrax dependencies. They basically started the app over.
Within about 6 months of that effort (Early spring 2018) with approximately two full-time developers, they were live with their app (jupiter repo), and have been happy with it so far. But they also still haven’t gotten to the originally planned content beyond the IR-type PDFs, the scanned monographs, newspapers, etc. And have had some developer turnover. (Hey, they’re currently hiring y’all).
The jupiter app implementation
My understanding of how their app works is based on just an hour conversation, plus a few hours spent looking at their source code and internal docs — I may get some things wrong!
Jupiter seems to be to be a pretty simple app, a fairly basic idea of an “institutional repository”. Most of the items are single PDFs, without children/members. The software does support some items being composed of a list of files — but no “child works”. The metadata seems relatively simple; some repeatable strings, but no nested/compound objects (ie, an attribute whose values are multi-property objects themselves). While there is some self-deposit, there is no complicated workflow, basically just an edit form.
The permissions model is relatively simple. Matt Barnett, a lead developer for much of this process (who was there for our conversation, but left the team soon after that) told me that originally some internal stakeholders had been pushing for a more complex permissions model. But knowing what a tar-pit trying to implement ACLs could be, Matt pushed back, and they ultimately implemented a simple model: There are owners who can edit the things they own, and admins who can edit everything, and that’s about it. By virtue of their campus SSO system, they got “shared accounts” for free, so people could log into a shared account when they needed to share edit privs.
They had been using hydra-deratives for their simple derivative needs (maybe just a single thumbnail for each PDF?), but when ActiveStorage, part of Rails, was released, they began switching the code to that (may or may not be merged into master/deployed in repo yet as this gets published).
Fedora is still there, modeled with hydra-works. The indexing to solr is whatever was built into hydra-works. They just wrote their own straightforward forms with simple_form. They also do a lot of CSV-based ingest, which they just wrote code for, like even sufia users would I think.
They use UUID primary keys.
Their app does index to solr — using the general ActiveFedora indexing methods, I think, solrizer and all. You can see that their indexer is pretty stock, it mostly just calls “super”.
All of their objects exist as ActiveRecord “draft” objects while they are being edited, through more or less ordinary Rails mechanisms. When they have multi-valued fields, they use postgres json arrays, rather than actual normalized schema (which would suggest a different table). I’m not sure what they need to do to get this to work with forms and controller updates. These active record objects seem to use something custom for collection memberships, rather than actual active record associations. So in these regards it’s not quite a totally ordinary activerecord modelling.
The objects have a life in activerecord, but are mirrored to fedora at certain life cycle points — I believe this is also what gets them into solr (using samvera/active-fedora solr indexing code). The public-facing front-end is based entirely on data from solr — but not using Blacklight, simply writing Rails code to issue queries and handle responses to Solr (with Rsolr I think).
A brief overview of their architecture, by Matt before he left, focusing especially on the persistence stuff (since that’s the least “rails”-y and most custom stuff), can be found in their public repo, here. Warning, it is pretty negative about samvera/sufia/active_fedora, gird yourself. You can see there they have done a couple custom local things to make the ActiveFedora objects and classes to some extent mimic ActiveRecord, to make using them in Rails easier, trying to encapsulate the fedora-specific stuff inside API boundaries. While at a high level this is what ActiveFedora’s goal is — their implementation is simpler, smaller, less powerful and custom-fit to what they need. We can just say they’re happier with how their local implementation turned out. They also explicitly wrote it to support potential future elimination of fedora altogether.
Matt said if he had to do it over, he might have pushed harder on stripping fedora out too, and just having everything in postgres. And that is something the team still plans to look at seriously for the future.
So what does “just a rails app” mean? And how do you deal with increased complexity of your requirements?
The most useful thing for me in the conversation was that Matt pushed back on my outline of a potential plan, saying I was still undertaking too much abstraction.
The U Alberta team suggested that I should keep it even simpler, with less DRY abstraction (and thus less tools that could be shared between institutions/apps), and more just building your app for what you need right now. You can see some of this push-back, specifically in the context of what U Alberta needs, in another document he wrote before he left Alberta in the jupiter repo, on notes for future expansion. It is really worth reading, to see an argument from even more extreme simplicity, from a developer experienced with Rails but not “infected” with “how libraries do things” But beware, it’s not shy about our community shibboleths.
We developers (and we library folks) are really drawn the abstraction, generalization, and shared tools that meet as many needs as possible. It can sometimes lead us astray. It is actually very common advice in software engineering to stick to what you actually need today, for your app you are developing (you know your business/user needs and which are the highest priorities to business value, right?). “Do the simplest thing that could possibly work”, “You aren’t gonna need it.” It keeps us honest.
However, I also think it’s possible to code yourself into a corner this way, where your app was fine for exactly what you needed then, but when you need one more thing… you can find you need to re-write large parts of it to accommodate. In some ways this is my experience with current samvera stack, early fundamental architectural decisions pen us in when we have new use cases. That kind of problem stays smaller when you avoid harder-to-change shared code, but I don’t it goes away entirely. Trying to play for the future always entails some “YAGNI” risk, but the more domain knowledge and experience you have… maybe you can do better at seeing where you are going and planning for it?
Just some of the specific component plans Matt was skeptical of…
attr_json vs. Just Plain ActiveRecord schemas
The jupiter app already has an activerecord implementation which isn’t strictly “ordinary” activerecord, in the sense they serialize multi-valued/repeatable fields to json arrays, rather than needing a separate table for each attribute as an actual normalized schema would require. Plus the logic I don’t entirely understand but think might not be ordinary AR associations around collection and “community” membership.
So this already gets you away from the strict “ordinary Rails” path — I’m not sure how the JSON array fields are handled by form submission, or querying if you wanted to do querying (it’s possible all their querying is solr-based, which is familiar to samvera-land, and also unusual for “ordinary rails”).
At my institution, we already have the need for complex repeatable data–a simple example would be repeatable “inscription” notations, each of which has the text of the inscription and the location in the book. So not just an array of strings, but perhaps an array of hashes. Weiwei Shi (Digital Initiatives Applications Librarian) suggested in a follow-up message, “We can use the JSON data type to support a more complex data structure as needed” — that is, if I understand it, they are contemplating actual postgres representation somewhat similar to what I am with attr_json, if they end up needing complex json. Matt’s second document tries to draw a line between how they are doing things in “more-or-less completely standard Rails models” and the way I was proposing to do things — I’m not sure I actually see such a great distinction, the representations in postgres to me seem pretty similar, neither of which is standard Active Record patterns.
They do have each attribute in a separate column, whereas I was contemplating putting them all in a single json column. Their approach does have advantages for avoiding update race conditions (or needing optimistic locking to avoid them). I perhaps should consider that, as an extra feature to attr_json. Although either way you get columns you can’t necessarily use ordinary ActiveRecord querying or form-based update with.
Everyone seems to doubt that attr_json is really going to work, ha. The skepticism towards newly invented non-trivial dependencies is justified, but I can point out attr_json is a lot fewer lines of code than ActiveFedora, or even Valkyrie — I think it is a risk, but it’s scoped carefully and implemented robustly, and I can’t figure out any other way I’m confident would be simpler to actually meet our modeling needs — once you start doing this complex json stuff, I think you’ll find that it doesn’t behave like “ordinary rails” — for forms/updates, validations, etc. — and rather than hack it out on a case by case basis, it makes a lot of sense to me to solve the problem with something like attr_json, encapsulating the “not really like ordinary ActiveRecord” stuff as much as possible.
The other option of course would be an actual normalized schema, with one table per attribute. For our “inscriptions” that table might have two columns (text and location), for a simple repeatable alternate title it might only have one. It’s going to be a mess to try to prevent n+1 queries and keep db access performant. I am encouraged I’m not on an insane track by the fact that even the U Alberta is using JSON serializations in postgres, not actually ordinary normalized data — I think as your data gets more complex (not just array of primitives, but need serialization as arrays of hashes), you’re really going to want something like attr_json. But maybe I’m wrong.
And for better or worse, I have trouble giving up entirely on the idea of some shared tools to make things easier for others in the community too — because it’s fun and rewarding, and why should we all constantly re-invent the wheel? But it’s good to be reminded of the dangers that lie in that direction.
I’m not sure if Matt mentioned this specifically, but I realize I have added a lot of non “basic ActiveRecord” complexity to the data modelling my plan in order to support the PCDM-ish association modeling, where a work has “members” and the members can be either works of themselves (which can have multiple members) or single file objects, and they all need to be kept in order.
U Alberta’s app doesn’t have that. A work can have a list of files, the end.
At my institution I actually spent a while trying to convince stakeholders that we didn’t need that either, but it was the one thing I could make no headway on — eventually they convinced me we did, to accomplish our actual business goals.
If you need this, I can’t figure out any way to get there in an “ActiveRecord-first”-ish direction, except either single-table-inheritance or polymorphic associations. Both of which are some of the oddest and least reliable corners of ActiveRecord. Of the two, I think STI is probably least weird and most likely to do more of standard use cases minimizing number of db queries. (Valkryie’s approach is somewhat similar in how it uses the DB to single-table inheritance, but without actually using that AR feature).
Matt thought that shrine might do more than ActiveStorage now, but history shows things built into Rails will probably expand and get better. (Yes, but it’s unclear to me how to make audio or video “variants” or derivatives with ActiveStorage, which my place of work predicts to need very shortly. If we are really ruthless about only what we need right now, are we going to have to just rewrite it as soon as we need another thing? There are no easy answers, “YAGNI” is simpler when it’s all about software you are writing yourself and not dependencies… but there are grey areas too).
But I’m not certain about this, after trying to help shrine developers enhance the versions/derivatives functionality to better support some flexibility we need as to storage locations and point-in-time of creation. The answer may just be trying to write an app which adds on locally to do exactly what it needs (whether in terms of shrine or ActiveStorage), without trying to make a shareable general purpose tool?
Matt was very suspicious of using Blacklight at all, he found that it was quite easy for them to just write the UI they needed based on Solr responses. And their app certainly is as good as many sufia/hyrax apps (it even has an actual search-within-the-collection feature on collection pages, which I think our sufia 7 app didn’t, although I think latest hyrax does).
But remember my inability to entirely give up on the idea of a shareable toolkit? I really would like something that serves as “scaffolding” that gives you an app out of the box with basic file ingest, metadata edit, and search out of the box. And Blacklight is a way to do this. And I think my plan to segregate Blacklight from the rest of the app (it should be a dependency you can switch out) by immediately fetching records from postgres corresponding to solr search results — may be able to keep Blacklight from “infecting” the rest of the app with Blacklight-isms, as Matt was worried it could.
How simple is simple?
It was useful to have Matt call my bluff to some extent: What I have been hypothetically proposing isn’t really “just plain rails”. But it’s a lot closer than current samvera, or even valkyrie.
I think some of the valkyrites think valkyrie’s departures from “ordinary Rails” are a a positive, that they can use different patterns to do better than Rails… which is not a unique idea to them… but I think is a bit hubristic, to think you can invent something better (and easier to onboard new developers with?) than Rails. (I also wonder if the valkyrites, if freed from the need to support fedora too, would simply use hanami?)
The same charges of hubris can be brought to my initial sketch of plans though — it was useful to be challenged from the “left” of “you’re still not simple enough” by Matt. I am so used to thinking about my in-formation plans as a/the simple alternative to, well, samvera or even valkyrie… it was a refreshing and needed corrective to be talking to Matt who thought my plans were still too much abstraction, not as simple as possible, not sticking close enough to implementing only what was needed for my business needs. On the one hand, it makes me worried he’s right; on the other, it makes me more comfortable to be in a nice middle ground of moderation with people advocating things on both sides or me, both heavier-weight and lighter-weight, sharing more code with the LAM digital collections community on one side, and sharing basically none on the other.
Really, “just plain rails” or “just plain [any code]” is to some extent a mirage, or an aspiration. We’re always writing code when we build a Rails app. We’re always using some dependencies. While there can be a false economy in trying to share all your code in hopes of minimizing the amount of code that has to be written in aggregate (it often doesn’t work out that way because building good re-usable abstractions is hard) — there can also of course be a false economy in never using anyone elses dependency, and “not invented here” syndrome. And if you’re writing it yourself — it’s writing abstraction layers that are potentially giving you not-worth-it complexity, whether you keep them in the app or make them into a gem. But abstraction layers are also what allow us to do complex things that we can still comprehend as humans — when it works.
Software is still a craft. As Alberta needs to add additional features, with their aspirations to add a lot more digital-collections-like content — it’s going to take software craftsmanship to figure out how to keep it simple. What I like about U Alberta’s approach is they realize this. They realize they are an internal development shop, and need to let developers do what developers do — rather than have non-technical stakeholders making technical decisions for non-technical reasons. (At one point someone said: After having been ‘burned’ before, they are very suspicious of using common/shared software, vs. just writing their app — which is part of their skepticism towards attr_json — I think they’re not wrong).
One thing letting an internal development shop excel entails is figuring out how to recruit and retain a solid development team with limited budget, which is one reason Alberta is trying to be so ruthless about keeping it simple and “standard”. One phrase I heard repeated was “industry-standard onboarding”, which I think also implies needing to be accessible to relatively “junior” new hires, which requires keeping your stack simple and standard. (That is, traditional-samvera or valkyrie-using institutions do not necessarily have any less challenge here and may have more, as for instance Steven Anderson of BPL argued)
(But I wonder if on-boarding a new developer to an existing project that has a very small dev team is going to be challenging across the industry! I am not convinced that “Where the Rails community has a diversity of opinions on an approach, we should prefer the approach espoused by the Rails core team” (from a Matt/Alberta manifesto) always and necessarily leads to the simplest code or easiest to on-board new developers with. sometimes you can build a monster in the pursuit of not doing something novel…. the irony, right? But it’s always worth considering the trade-offs).
I definitely might end up re-orienting. For instance, Matt reminded me of something I knew but tried to forget even when writing out my notes for a possible plan: A generalized permissions/ACL system is a craggy shore that many ships have crashed upon. Should I just write for my app the permissions we need instead? After doing some more business analysis to figure out what they are? Perhaps. More broadly, if we end up trying to implement this “toolkit” and I’m running into troubles and worrying our reach exceeded our grasp — retreat to just the app good enough for what we need right now is always a valid escape hatch.
U Alberta’s story, where they’ve been working on this app with a very different approach for over a year, and so far are happy — is another good data point reminding us that dissatisfaction with the samvera stack is not new, especially institutions that have developers with wider Rails experience have been suspicious of the value propositions of fedora and samvera for some time. And that there are a variety of approaches being tried. We all need community to bounce our ideas off of and get feedback, especially those of us who operate in 2-4 person development shops need more than we may get internally. I’m so glad they were willing to spend some time talking to me. And I highly encourage reading all of Matt/U Alberta’s somewhat iconoclastic analysis docs, as one way of considering other perspectives. I’m not sure if I can find the time, but I’d kind of like to “onboard” myself into their codebase, and understand how it works better as one example.
Thanks to the whole U Alberta team, and especially Peter Binkley, Weiwei Shi, and Matt Barnett, for spending time explaining what they were up to to me. Thanks to Peter and Weiwei for reviewing this post for any terrible errors. All remaining mistakes and wrong opinions are my own.