On the present and future of samvera technical architectures

Here where I work, we have a digital collections app (live; source) based on sufia 7.4. This is not sustainable for the long-term, as the community’s development efforts have largely moved from sufia to its replacement hyrax, and the latest version of Rails sufia runs on is 5.0, which will eventually be end-of-lifed. (exact schedule unknown).

Upgrading/migrating to hyrax would be the ‘obvious’ path, but it would take some significant work; we aren’t super happy with the sufia/hyrax architecture; and this turns out to be a time of some transition in the samvera community.

In figuring out what’s going on and identifying and evaluating available options, I’ve had to do quite a bit of research.  So I wanted to share my evaluation and analysis with the community, to hopefully help others understand the lay of the land — but also to explain why we are considering some new approaches. As I’ve been doing this, I have begun to develop opinions on how to move forward here, and I’m leaning towards a novel approach not based on existing community consensuses — I’ve done my best to present information objectively and identify the parts that are my own judgements/evaluations, but I’ll be up front about my present bias/leanings so you can judge for yourself or be cautious.

Also, while there has been recent exciting work on changing and improving governance processes and structures in Samvera, this post will focus only on the software products and technical architectures in the samvera community, “the stack”.

The Challenging Near Past/Present

I think it’s important to be clear about some of the challenges of the current software stack, to understand why people are going in some different directions, and how to evaluate those directions.

The current situation is that many, probably not all, but more than a few, people and teams working with sufia/hyrax and the samvera stack have found it very challenging in a variety of ways.  Here are some I know about, many from personal experience, that you may have seen me address in past blog posts too.

Performance can be a significant problem, at several different parts of the stack. Some I have encountered:

⇒ Saving a Work can take a 10 or more seconds in our app. Perhaps only an inconvenience for a single work, but can add up to be a real problem in higher-order functions that save multiple works, bulk ingests, or test suites. (also increases the cost of logic that saves multiple times where one time could conceivably have worked, as I have encountered in the stack).

⇒ So far in our attempts to make a feature to let you change a fileset into a child work (delete fileset, create work at same order in members list, with come copied attributes over), the operation can take five minutes to complete. We are in the midst of quite a bit of developer work to try to figure out what’s going on and if we can improve it. This feature is taking several weeks to develop because of underlying stack complexity.

⇒ Our app with stock sufia code had several “n+1 query” problems, where on display a separate Solr query was being done for each item displayed (on results page, or child items on a work detail page), making response time unacceptably slow. When using ActiveRecord this has well-understood and easy fixes, but with this stack it took some somewhat complex local hacking to fix.

⇒ Re-indexing (to solr) our corpus consisting of ~6400 GenericWorks and ~18500 FileSets can take from 3 hours to 9+ hours, depending on nature of indexing, and even after some extensive optimization work. Comparing the 1.25/second best case to industry standards, it doesn’t look good.  For instance, indexing MARC to Solr using traject, people routinely get from a couple hundred to 1000+ records/s.

Trying to customize or add features to a sufia/hyrax app can be quite complicated, some find they are spending as much or more time trying to figure out how to get it to integrate with shared stack code (without creating large forwards-compat problems on upgrades) as they spend on the actual ‘business logic’.

⇒ This isn’t really about adding for more features to be built-in/configurable to Sufia/Hyrax. No matter how much is, our use cases vary enough that people will always want to be changing things in local ways or adding custom local features, and sufia/hyrax and the rest of the stack has always meant to support this.

Some organizations have tried but had problems attracting or retaining Rails developers (with Rails experience but without library/samvera experience).  These developers can find the samvera stack unnecessarily complex considering the problems it solves.

The cost of keeping your app up to date with new versions of stack dependencies can be great enough that many institutions wind up staying on old versions of shared dependencies.  My attempts at analyzing this appear to show a pretty big spread among sufia/hyrax and other dependency versions in repos “in the wild”.  (Here where I am, we are on sufia 7.4 — after valiantly trying to stay up to date, we decided we had to stick there to meet our launch deadlines).

ActiveFedora was intended to be a kind of port of ActiveRecord, with close to api-compatible modelling/persistence layer (not including querying).  But ActiveRecord is an incredibly complicated stack with literally years of developer time put into it, and is constantly evolving itself. What we’ve ended up with in AF has been found by many to be unreliable, with unpredictable performance characteristics and general behavior, very difficult to figure out how to use ‘correctly’, with very complex architecture hard to debug.

Parts of the stack, especially in sufia/hyrax, often seem not as mature as expected; there are bugs in features one thought were long-standing in the app; there isn’t necessarily clear and accurate shared understanding about what things are present in the code already, and what things need more work, or are likely to have lots of edge case bugs. This may be because of the several times there have been major refactorings to the sufia/hyrax codebase (fedora 3 to 4; an institutional repo focused app to more general; sufia to hyrax; etc). (It should be noted that the documentation working group is, working on building out better recorded shared understanding of features).

When thinking about this, I often go back to Richard Schneeman’s post on “polish” in software:

I’ve previously called these types of moments papercuts. They’re not life threatening and may not even be mission critical but they are much more painful than they should be. Often these issues force you to stop what you’re doing and either investigate the root cause of the rogue behavior or at bare minimum abandon your thought process and try something new.

When we say something is “polished” it means that it is free from sharp edges, even the small ones. I view polished software to be ones that are mostly free from frustration. They do what you expect them to and are consistent.

My experience  building an app to meet local needs using the samvera stack has often been at the other end of this continuum — near constant “papercuts”, sharp edges, frustrations, and “yak-shaving” investigations of the root causes of some unexpected behavior. My experience is that the software often does not do what I expect, or behave consistently.

I think sometimes when I discuss these issues, non-engineers think I’m just talking about programmers’ personal experience/emotions, that the code isn’t “fun” to work with. Now, I do think the affective result on your programmers’ day-to-day matters, how your programmers feel — burn-out is a real thing — but certainly no more than the pleasantness and efficacy of day-to-day work for all other non-programmer staff too; and we don’t expect it all to be “fun”, that’s why it’s a job.

But the reason this matters to your organization isn’t primarily because of how it makes programmers feel. It’s because all of the foregoing significantly increases the cost of launching and maintaining your software. Organizations find it takes much longer, or many more engineers, than expected to get to first launch. Adding what even the engineers might have expected would be a fairly simple feature can take order(s) of magnitude more time than expected. Bugs can appear which are enormously time-consuming to track down and fix, if they can feasibly be fixed at all by your engineers. In addition to cost/schedule, this can also simply affect your chances and levels of successfully meeting your business needs, both in initial launch and ongoing maintenance and development.

And when making technical choices, that’s what matters to an organization above all else — meeting business needs as efficiently and cost-effectively as possible (including staff-time and number of staff; staff is the biggest costs for most of us).  And to many, it wasn’t clear that current directions were getting them there.  Building and maintaining a samvera-stack based app that met local business needs well has seemed to some very expensive.

These are not observations unique to me, there has been a growing recognition of these challenges in the samvera development community. It has led to new samvera processes working to improve the situation gradually and steadily (for instance, the “Component Maintenance Working Group”, the Hyrax maintenance working group and the “Road Map Interest Group”); but has also led others to think it’s time to explore new architectural approaches and more drastic software changes.

Valkyrie: A new approach

Princeton University Libraries had an app called plum supporting their digital collections. It was:

  • A hydra app based on curation_concerns and some fairly old hydra dependency versions (not sufia/hyrax).
  • Staff-only editing/workflow. No self-deposit.
  • Used for metadata/asset management (with fedora 4 back-end), had no public interface of it’s own — (meta)data was vended to other public-facing app(s).

As outlined in two blog posts on a PUL Systems blog, they ran into some pretty severe performance problems. They spent significant development effort  improving performance, both locally and in PR’s back to hyrax and the stack.

In a presentation at Samvera Virtual Connect 2018, Esmé Cowles (presentation begins at 40:00 in video) said Princeton’s eventual direction (valkyrie) was motivated “not just becuase of performance problems, but because while we were working on those problems, we were constantly butting up against the complexity of the stack… That complexity was impeding us doing what we wanted to do to work on performance.”

While frustration with performance or legibility of the inherited architecture was not new to either Princeton or others, Princeton reached a point where they decided they had basically no choice but to take a departure from the inherited architecture, if they wanted to achieve their business goals, that the “inherited” stack was simply not tenable for their needs. Additionally, as the performance problems were centered on Fedora (as well as the ActiveFedora architecture), they decided the best path  was to move away from Fedora as the persistent store, and towards using the postgres rdbms.

We could imagine responding to that by writing either a bespoke local app or a shared toolkit/framework simply based on postgres. But Princeton really prioritized not separating from the samvera community, and based on that, decided instead to build a persistence abstraction that would allow the developer to switch between multiple back-ends (mainly targeting fedora or postgres, both likely in concert with solr), using the same class/method-level APIs for both.

That is what valkyrie is. It is just a modeling/persistence layer.  As far as what it does, valkyrie could be roughly compared to ActiveFedora or ActiveRecord.  It is not a “solution bundle”. It pretty much only addresses API(s) for modelling metadata and saving those models, whether to fedora, to postgres, or to other hypothetical future back-ends.  The rest of the business logic in a digital collections or institutional repository application would come from somewhere other than valkyrie, whether shared gems or local code.

Princeton proposed an official hydra/samvera working group to work on valkyrie, and got significant interest from other developers active in samvera community. valkyrie became a samvera community project, and as I write this is housed in the samvera-labs grouping.

Valkyrie uses a “Repository/Data Mapper” architecture that is different in some ways from Rails’ ActiveRecord design, and seems to be inspired by Hanami’s repository/data mapper implementation.  Valkyrie also uses some of the dry-rb libraries that are also used by hanami.   Valkyrie also requires the use of the reform form object library, generally in the form of the ChangeSet reform sub-class specialization.

In building out the main modelling and persistence abstraction to meet planned use cases, other particular-to-valkyrie abstractions were required, like ChangeSets (I don’t entirely understand them, but I think someone building an app based on valkyrie is going to have to) , and others that may normally stay “below the hood” like OptimisticLockToken.

Valkryie is not fundamentally based on linked data/RDF, its models are not defined based on linked data. The valkyrie fedora metadata adapter requires a mapping from model attributes to RDF predicates so it can be serialized to fedora; other external RDF serializations would require similar.

valkyrie “bespoke” apps

Princeton is live with figgy, their plum-replacement app based on valkyrie. figgy kind of served as a ‘demonstration/proof-of-concept app’ throughout valkyrie development, and still serves that role to some extent, as I believe the only valkyrie-based app in production, and the one by the same group of developers most central to valkyrie development.

Figgy is a rewrite of plum to meet same basic usage parameters. It is not technically a git fork/branch of plum, but some business logic was ported from plum.

Figgy does not use a samvera “solution bundle” (such as hyrax). It uses only a a few existing samvera-community dependencies as component building blocks where it makes sense (mainly hydra-editor and hydra-derivatives, see their Gemfile.lock). Existing pre-valkyrie components that can be used with a valkyrie-based app will generally be de-coupled enough that they can also be easily swapped out if the need ever arises. (Personally, my experience with hydra-derivatives for my own local needs would not lead me to follow their lead in using hydra-derivatives! But perhaps porting hydra-derivatives using code from plum to figgy made sense as a starting point).

Figgy then has a lot of local/bespoke architecture custom-fitted for it’s business needs, sometimes based on existing general Rails dependencies. One major example is custom local workflow logic/architecture based on the popular aasm (“acts as state machine”) gem.  It also vends changes to the other apps serving as front-ends using an RabbitMQ based eventing system, also more-or-less locally designed.

The other known valkyrie app in development is Penn State Library’s cho. I know less about cho, but my understanding is that it is not yet in production, and takes some very original/innovative architectural approaches — it is largely based on ingesting and editing via CSVs (rather than interactive web-based GUIs), including being able to dynamically define metadata schemas based on CSV.  Cho seems to use few existing samvera components on top of valkyrie, perhaps even fewer than figgy; mainly hydra-characterization.

Where is valkryie at now

Valkyrie has been under development for around 2 years, and has probably hundreds of developer-hours of work. As I write this a 1.2.0 version has an imminent release.  While valkyrie is already being used in production by princeton’s figgy, features that some might expect, need, or want for generalized use are still being developed on an ongoing basis. The 1.2.0 release (as I write this still in pre-release) adds some significant features, including: The ability to store single-values (rather than arrays of values) in properties; Optimistic locking; and Guaranteed persistently-ordered values (the first value in a list stays the first value in the list).

To some extent, as is common for open source, features are added to valkyrie when apps using valkyrie need them and the developers of those apps spend the time to add them to valkyrie.  While the valkyrie team is thinking to the future and trying to generalize for others, right now it’s primarily the needs of figgy and cho driving prioritization.  For instance, an Issue to suggest providing a generalized solution in valkyrie to “n+1 query” problems (a problem pretty central to my experience and concerns, as discussed above, but maybe not initially figgy or cho) was recently created, after it came up in figgy development.

If you need something that is conceptually part of modelling/persistence layer but isn’t really built into valkyrie, you often still have an option to add it, which often involves going “under the hood” and adding custom logic to the valkyrie adapters or custom queries.  So you may have to reckon with architectural components/complexity  ‘under the hood’ to meet such needs; and likely also means that you’d have to re-implement if you switched storage layers (from fedora to postgres or vice versa).

For instance, at present if you wanted values to be indexed to solr as a numeric type instead of string/text (so it could be sorted or range-facetted in solr), Trey Pendragon told me “you’d need to add a custom indexer to the solr adapter.” One should probably be cautious of assuming what features or use-case-supports are or aren’t already built out in valkyrie (like any other relatively complex dependency still reaching for maturity).

You can watch what things are being considered and prioritized for future valkyrie development in the open valkyrie waffle board.

Milestones in valkyrie and figgy history

Some personal analysis and evaluation — Valkyrie

Princeton and others investing in Valkyrie began from the requirement of being able to provide a stable consistent API on top of underlying data that could be stored either in Fedora or Postgres.

If you start from that requirement, the Valkyrie architecture is exactly where you are reasonably going to end up, this is an appropriate way of approaching that requirement (which typical Rails apps are not capable of fulfilling).

However (in my own opinion/experience/evaluation, like everything in this section), there is a significant cost to building the abstractions to make that possible. Every abstraction has a cost: in implementation, ongoing maintenance, and cognitive burden and ongoing work of developers using the abstraction and integrating it with others.  Building successful (efficient, polished, good TCO) abstractions as shared code between apps with diverse needs can be especially challenging.

Valkyrie is a fairly significant abstraction.  Its development necessarily involves significant work to figure out the right APIs and create implementations for features that, if you were simply assuming an rdbms (or even postgres specifically) and using ActiveRecord might just already be there. In addition to the basic mechanics of persistence, also: ordered values; optimistic locking; associations, joins and eager-loading to handle n+1 queries.  Or Rails recommended “Russian-Doll Caching” with automatic touching of parents.  In ActiveRecord, not just already there, but very mature with well-understood community knowledge about strengths, weaknesses, work-arounds, and best-practice usage patterns. Whereas all of these things, if they are to be used, need to be designed and implemented (and iterated to maturity and polish) in valkyrie — and with the increased challenge of making them work well for each of the persistence back-ends valkyrie intends to support.

Whether these costs are worth it depends on how important or beneficial the foundational requirement is, as well as how well the abstractions meet developer use cases. In part, this is hard to be absolutely sure about in advance — both the ultimate benefits and the ultimate costs can to some extent only be estimated/predicted and not known with certainty in advance of the actual development and community use.

Will valkyrie actually result in shared codebases/dependencies between postgres-using and fedora-using applications in our community?  At this point, there are not many examples already existing, it’s more a design goal for the future. I think it’s hard to know to what extent this prediction of the future will pan out.

How one evaluates the value proposition of the valkyrie effort also depends on the value one places on the base goal/requirement of supporting both fedora and postgres as persistence back-ends. It may be worth asking in what circumstances does fedora actually make sense, and how widespread are these circumstances?  I believe few (none?) of the current developers/institutions investing in Valkyrie are actually planning on using fedora, or missing it.   The requirement to support the possibility of back-end agnosticism may be less about the technical needs of anyone investing in valkyrie, and more about the social/political situation in our community, which has always used fedora before, and where simply moving to a non-fedora solution seemed originally too big a jump to be comprehensible as staying within the community.

⇒ (While there was some initial hope that the performance problems could be significantly improved while still using fedora by using valkyrie with, say, a non-active-fedora-based means of accessing fedora — so far only relatively minor improvements have been seen via this route, not enough to resolve the performance issues that led to valkyrie. It’s possible future implementations of the fedora APIs, whether from the fcrepo implementation or other, will do differently; predicting the future is always a gamble).

The valkyrie enthusiasts have been wisely careful not to express any judgement over the commitments of other institutions to fedora (we each have different business needs) — however, many of us beyond valkyrie have been increasingly questioning what value fedora brings us at what costs for some time, and I think it’s worth considering in exactly what conditions using fedora actually makes sense, and how common these conditions are.

If the eventual result is that most/all codebases using Valkyrie are using postgres rather than fedora — and I think that’s a real possibility — that is a significant cost we paid in development to make other things possible, and a significant ongoing cost we’ll continue to bear in maintaining and developing against the abstractions that were designed for that. (Or in a subsequent costly switch to not using them).

Esmé suggests that another benefit of valkyrie can be in hiring/retaining/onboarding developers, especially Rails developers from outside our development community, and that “following the patterns those developers know makes it easier to hire Rails developers and have them be productive and happy, (instead of being frustrated by ActiveFedora and Fedora more broadly).”

I share that concern and goal, but it is not totally clear to me how much valkyrie achieves  there — choosing to write to the Valkyrie API instead of ActiveRecord arguably takes us substantially outside of patterns that Rails developers know. While it seems safe to believe it will result in some level of improvement over previous stack,  when I look at figgy code I am not that confident in predicting to what extent a figgy-style app will be legible to the typical Rails developer, or escape what we could call the “weird custom community architecture” problem.

For myself, it’s not clear that the costs of developing (and developing against) the valkyrie abstraction will bear benefits justifying it. Might there be other ways to meet our individual as well as shared business/domain needs more efficiently in total-cost-of-development-and-ownership?  Might there be other ways for different teams to try different things while staying part of a community of practice?  Is the valkyrie approach actually necessary or sufficient for allowing people using different back-ends (or other architectures) to share other domain logic?

It is hard to answer these questions with surety, they rely on estimations and predictions of future events and comparing hypothetical counter-factuals. But based on an experience of challenges from complex and newer/less-mature architectures, I’m interested in trying to be ruthless about minimizing the number and complexity of abstractions/architectures, trying to find the simplest architecture possible to optimize our efficiency/productivity. “as simple as possible, but no simpler.” A significant abstraction layer to make possible both fedora and postgres does not excite me, when that’s not a requirement I think important for our local business needs.

However, one thing that is worth commenting is that I am actually totally happy with the valkyrites demonstrating the viability and sense of writing a “bespoke” app (which can still be based, where possible, on shared components), instead of trying to use a pre-built application/”solution bundle” that you customize according to it’s customization points.  Providing the latter in a high-quality way, mature, efficiency-increasing way is hard — especially when the developer community has diverse needs — and I personally suspect that a much wider swath of business cases will be better-served by the ‘component’ approach than has often been recognized in our community.

I suspect that using the hydra/samvera stack has almost always required local programming expertise, it has never (for most institutions) provided a “shrinkwrap” install-and-go experience. I appreciate the “bespoke” valkyrie apps openly trying to demonstrate that at least in some cases an explicit and acknowledged component-based put-it-together-yourself approach may be more productive (as Esmé in particular has pointed out).

The two current real-world valkyrie demonstration apps actually differ from what I see as the recent  “consensus path” in two significant ways:  valkyrie persistence layer and in explicitly embracing an “assemble-components” approach instead of a “customize-pre-built-solution” approach.

A Hyrax based on Valkyrie?

Okay, so we talked about valkyrie, and “bespoke” apps using valkyrie — what about the idea of refactoring/enhancing Hyrax to be based on valkyrie?

It is my impression that those initiating valkyrie, via a samvera working group, hoped this would the ultimate outcome, believing it was important for keeping those with “bespoke” valkyrie-based apps connected to and participating in the wider community — as well as a contribution the valkyrie effort could make to institutions wanting to stay on hyrax but with persistence layer flexibility.

As the valkyrie working group work wrapped up, even before the “final report” was released actually, there seemed to be general community consensus on this, and I believe a community decision was made to commit to this, although I’m not certain.

Work to switch hyrax over to valkyrie was begun, and significant development-hours were put into it. At one point it was believed there would be a hyrax version 3.0 based on valkyrie released around May 2018.

However, that phase of effort didn’t reach the goal-line (a release of valkyrie based on hyrax, or even a merge into master) before work mostly halted. I believe the valkyrie branch in the hyrax repo has the product of that work — last commit there is from March 6, 2018. I think it’s very hard to estimate how much work was remaining on that branch to get to a release (most of us have experienced the phenomenon where the “last 5%” can become more like half of total development effort).   Some of the developers who were primarily involved in that work seem, at least for the moment, no longer spending as much development time on hyrax generally; and as other hyrax development has continued, that branch would need to be reconciled with current master.

Since that work, Tom Johnson (@no-reply) has taken over as formal “technical lead” of hyrax, meaning technical architect in this case.

I asked on slack about the current thinking on future Hyrax and valkryie. Tom provided some info on his current plans and thinking in messages in the #hyrax channel of the samvera slack, dated August 13 2018 12:22PM and 12:34PM (eastern). (Alas, we may have no slack archives).

– Moving away from `ActiveFedora` and toward a backend-agnostic persistence technology is viewed as on the critical path for Hyrax’s success

– The community’s capacity to maintain `ActiveFedora` is quickly diminishing, in part because the software is challenging to maintain and in part because the development personnel best equipped to maintain it have shifted to other projects (including but not limited to Valkyrie)

– Valkyrie is the presumptive replacement; this is the case largely because of key community members succeeding at delivering (and generally being happy developing) applications based on it.

– We are committed to making this transition without it looking like a stop-the-world-and-rewrite-the application affair for existing adopters.

That is (this interpretation/re-wording also based on further discussion in slack channel and PMs), some kind of work to make Hyrax have a backend-agnostic persistence layer is in the plans, and it is presumed this will involve Valkyrie.

But it will likely not involve simply refactoring Hyrax to use valkyrie instead of ActiveFedora, which was that original valkryie branch approach. Tom is committed to making future Hyrax releases less disruptive existing adopters, and that original approach would be the kind of “stop the world” rewrite involving significant backwards-incompatibilities that has been disruptive in the past.  It probably will involve re-using/porting/copy-pasting code (as well as ideas) in that existing  valkyrie branch, but probably will not be based on that branch in the repo.

Instead, there will probably (these are Tom’s current thoughts not official plans) be a first step to create an architecture within Hyrax that “that is open to Valkyrie, but ships using active fedora by default”.  Then a period of “getting an advanced guard trying to build apps based on this [which] can and should provide a lot of useful information about how platform support needs to work.”  Then later, “a transition to valkyrie-by-default and removing AF would then be based on what we learn and demand[s] from adopters.”

Tom plans to share some of these road-map-recommendations on this at Samvera Connect in October, at which point some of this will presumably start becoming somewhat more formalized and official as plans.

I think it’s very hard to predict calendar timelines for all this. If you were waiting for the end-point, a hyrax version that just uses valkyrie (and allows postgres as a backend thusly) out-of-the-box, supported/documented/tested… I personally would predict it could be 1-2 years, but others may have much more optimistic estimates; one answer is just that it’s very difficult to predict at this point, depending on so much including what developers/institutions are interested in contributing to what extent.  We can be sure it won’t be May 2018.  :)

Note well the current Valkyrie fedora adapter does not store things in fedora in a way compatible with current hyrax/sufia modelling. Either a new adapter (with some challenges) needs to be created, or there would have to be data migration.

Some personal analysis and evaluation

I totally understand and support @no-reply’s priority to keep Hyrax stable, with changes being iterative and backwards-compatible, no “stop the world” changes — this is one of the biggest challenges facing Hyrax users I think, and it makes sense to prioritize it.

And I understand how that leads to his current thinking on how to approach valkyrie — by providing the architecture where valkyrie can be optionally switched in as a simultaneous alternative to what’s already there, which for at least a time remains there.

But this leads to a kind of ironic/counter-intuitive outcome.  Valkryie is already an abstraction layer for swappable persistence back-ends.  For reasons that are indeed sensible in overall hyrax context, we’ve arrived at a proposal to add more architecture (more abstraction) on top, to valkryie itself swappable in or out (at the point you swap it in, you can then use it to swap actual back-ends). An persistence abstraction API to let us use another persistence abstraction API beneath it.

Abstraction has costs, including to legibility of the codebase.  One might wonder if you’re going to put in this extra hyrax-specific persistence-swappability architecture anyway, does it even make sense to swap to valkyrie as the happy path supported option, or should you swap directly to postgres instead and skip valkyrie?  But there might be various reasons it really does make sense — but, it’s got a cost.

So in evaluating hyrax-on-valkyrie, I think we start out with all the pros and cons outlined in the valkyrie analysis section above.

On top of that we have pro’s and con’s of hyrax itself. How you’ll evaluate those depends on your experience with or otherwise evaluation of hyrax generally. There will be significant advantages for people who have found that hyrax has features they need, and using them via hyrax (including any customization) has worked out well and seemed like an efficient path compared to alternatives — and who want to switch to a postgres-based persistence store.

I have not had a great experience with sufia. I’ve found it very challenging to deal with the existing architecture when implementing the customizations and features we need. When I’ve tried to look at what has changed in hyrax I don’t expect significant improvements for my business cases. On the other hand, there has been code added  which increase architectural complexity for me without giving me features relevant to my needs (adminsets, nested collections).   Of course hyrax will continue to improve — especially under Tom’s excellent technical leadership, which I have a lot of faith in.  But the community’s historic value on new features over architectural rehabilitation comes from structural pressures that will have be resisted or changed. And even within the realm of architecture rehab, an investment in hyrax-on-valkyrie — while it might be a totally reasonable and appropriate priority — is development hours not spent on improving the other parts of hyrax architecture that have gotten in my way and lowered our efficiency (raised TCO) of running sufia, and which may have to temporarily increase architectural complexity/number of abstractions.

I am concerned that hyrax may have painted itself into a corner where it could be quite a while until the problems with fundamental architectural aspects of hyrax that I have run into become better; a while until the app’s architecture becomes more legible with the minimal amount of abstraction/architecture needed for its goals, instead of more complex with more layers of abstraction as a bridge to get there.  Doing this in a way that minimizes upgrade pain will make it take even longer/more effort, but not doing that is not desirable/feasible either, I believe Tom is making the right decision to prioritize minimizing upgrade/backwards-incompat pain in hyrax.

But my experiences with sufia have not been positive enough to excite me about trying to upgrade my app to present hyrax, or about a hyrax based on valkyrie or postgres but otherwise largely similar backwards/compat with current hyrax release. If you take out the persistence parts that are proposed to change, and the business logic components where I have had a lot of trouble using them to meet my local needs — I’m not sure how much of hyrax is left. From my experience, I am not enthused about investing lots more in hyrax (whether that’s contributing to the shared codebase, or work on upgrading-or-rewriting our app from sufia 7.4 to a recent hyrax version and continuing to maintain it). I’d be more excited about trying to find a more efficient way to invest development time that could ultimately, get us to a happy place quicker — both in terms of our local app, and shareable components.

What if there’s another way? (my “fantasy plan”)

Let’s say valkyrie (and apps and architecture built from it) starts from the basic non-negotiable requirement: Allow code using fedora or postgres as a persistence back-end to use the same persistence APIs; and then adds on some subsidiary goals, including sticking closer to common Rails patterns where possible.

What if instead we started from the basic requirement: Stick as close to standard Rails patterns as possible, with as few and as simple additional abstractions as we can; as simple as we can while still not requiring re-invention of the wheel in digital collections use cases?

How would we do this, what would it look like? Like valkyrie, we’ll start from modelling/persistence.

We could consider really just putting all our metadata in a standard normalized database schema. But that’s going to result in some really complex and challenging to work with rdbms schemas, for the kinds of metadata schemas we use; for instance, with frequent repeatable fields, and apps that need to handle multiple “types” of objects in the same app.

Let’s rule that out.  What’s a next step up in complexity, still straying as little from standard Rails as possible, with as few and as simple new abstractions as possible? Is there a way where we still use ActiveRecord, but we aren’t required to create normalized rdbms schemas for our complex/various/evolving metadata schemas?

Recently some rdbms have developed features to allow “schemaless” use by storing json in a column. Really, you could always have used rdbms this way by serializing complex structured data to text columns, but the additional features, especially in postgres, make this even more attractive.  (Although keep in mind that our legacy fedora-based architecture basically stores complex data as a blob without indexing or querying capabilities either; although this is part of what makes it challenging to work with).

While ActiveRecord has basic support for storing arbitrary json-able hashes in MySQL or postgres json columns, the individual data elements aren’t really “first-class” objects and are missing many standard AR modelling/persistence features.

I developed the attr_json gem as an experiment to see if I could add more seamless support for many of the standard AR model features, for individual attributes serialized to json(b), sticking as close to how AR normally works as possible. attr_json supports typing, complex/nested objects, standard Rails-style form support, dirty-tracking, and some limited postgres-jsonb query support. This allows you to use many standard Rails patterns and approaches with individual attributes serialized to json in the rdbms.

attr_json has received some limited attention from the wider rails community. A handful or rails developers have communicated with me in github issues, one or two are already using it in production, and it has 32 ‘stars‘ and 5 watchers on github, almost all apparently from developers not from the LAM/samvera community.  While I’d like even more attention and collaboration, this is still encouraging, and all reviews so far have been very positive.

What if we were to try to build out a developer’s toolkit for creating digital collections/repository applications, beginning from attr_json (in ActiveRecord and probably requiring postgres) as the central modelling/persistence layer, but not stopping there, trying to launch with an expanded toolkit addressing other app and business needs too?

This is what I’ve been calling my “fantasy plan”.  I think it could provide a way to build digital collections/repo apps with a better developer experience and overall lower TCO  (both in building out the toolkit and building apps based on it) then other options. Of course, success isn’t guaranteed and it also has risks. This is not a plan I or my institution are currently committed to at this point, but we are considering it.

In addition to modelling/persistence, the next most core area of functionality in our domain, I’d suggest, is handling bytestreams/digital assets. Both originals  and derivatives. My fantasy plan developer’s toolkit would be based on shrine for this area — possibly with custom shrine plugins.  shrine’s goal itself is as a toolkit for file/attachment handling, giving you components and primitives you can assemble into exactly what you need, leads me to judge it well-suited for use when flexibility around how to handle bytestreams/assets (including storage platforms) is so core to our domain requirements.

I have more ideas about building out this “developer’s toolkit”, and analysis of the potential benefits and risks of this approach, but I’ll save them for a follow-up post focusing on this possible plan specifically. 

But is this Samvera? The spreading out of the community

I think we are at a moment where, like it or not, different institutions are trying different things.

Even just within the new “based on valkyrie” approach (which people are valiantly trying to make a community consensus), we have both “bespoke” apps and the potential future possibility of “solution bundles”.

There is experimentation and divergent planning going on apart from this too.

Christina Harlow of Stanford recently presented at ELAG in Prague on Stanford’s current planning to re-architect their digital collection/repository system(s) in a project called TACO. (slides; video; See 8:35 in video for some brief evaluation of hyrax for their needs).  If I understand the current plans (and I may not!) they are planning an architecture which is substantially written in Go (not rails or even ruby); which does not involve Fedora; which is not based on RDF/linked data at the basic persistence level; and I think which is not planned to involve samvera shared code, although it may involve Blacklight.   Stanford clearly has a much more complex environment  than many of us, requiring a fairly complex architecture to keep it more sane than it had become — although they were running into some of the same problems of architectural legibility and performance discussed above, just at a larger scale (“scale” more in terms of diverse and complex business requirements and collections than necessarily scale of documents or users/use). [update September 5 2018, more info/documentation on Stanford’s approach is being made available here.]

In 2016 at Hydra Connect, Steven Anderson, then of Boston Public Library, gave an 8-minute lightning talk presentation called “I love you fedora, but it’s over”, about their plans to move to a non-fedora non-samvera stick-close-to-Rails kind of architecture. (slides; video).  He mentioned experiencing some of the same problems with architectural legibility and performance with the existing stack that we’ve discussed previously, and arrived at a plan similar in some ways to my “fantasy plan” above. So there have been rumblings on this for a while now — I hadn’t seen this presentation until now, but feel a real affinity with it.  Steven left BPL shortly after this talk though, and Eben English (who is still at BPL) tells me the plans basically stalled out at that point. BPL is currently still using their previously existing app based on active-fedora 8.0 and fedora 3.8. (no sufia), and is awaiting some additional hiring before determining future plans.

In one sense, the samvera community has for years been less homogenous than our aspirations or self-images. Actual samvera-based apps in production have become very spread out amongst various versions of various samvera gems seen as consensus path at various times in samvera history: just hydra-head and active-fedora, curation_concerns, sufia, hyrax, etc., all at various recent and historical versions (and both fedora 3 and fedora 4). (Plus other branches of effort with varying levels of code-sharing and varying possible futures, like Avalon and Hyku).

There does seem to be a recent increase in heterogeneity of plans though. What does this mean for Samvera? Samvera (née hydra) has always been described as a community, not a users’ group.  (Perhaps a community of practice?).   We are a community of people working on similar problems; sharing knowledge; developing and sharing techniques; developing shared understandings of options and patterns available, and best practices; and looking for opportunities to collaborate on development and use of shared software.

To be sure, if people go in different technical/software directions, it makes this more challenging, but it doesn’t make it impossible, we don’t all need to be using the same software to be such a community (and even just all using Rails is actually significant opportunity for code-sharing).  One of the things I missed most in my year outside of library world in a more for-profit world — was the way that here in non-profit library-archive-museum-land, our peers are collaborators not competitors.  And I think most of the individuals and institutions involved in the community don’t want to lose this, and want to find ways to maintain our community ties and benefits even if our software becomes more heterogenous. I think we can do it. We are a community, not a users’ group.

In some ways, I think the increase in software diversity in our community indicates some good things going on. Some institutions are realizing that the current stack wasn’t working well for them, and going back to “first principles” of technical decision-making — in being clear about their local business needs/requirements, and analyzing the path most likely to meet those well and efficiently. And diverse investigations of approaches will give our community more information and knowledge.

Personally, I think samvera community efforts have been hampered by people/institutions making technical plans influenced in part by what they think other people want, what they think “everyone else” is doing, or occasionally even where grant money is available.  The “self-interest” in “enlightened self-interest” sometimes got the short-shrift.  (To be clear, this is just one factor among many. No matter what creating a shared codebase in this kind of domain is hard and comes with no guarantees of success).  Institutions going back to their local business needs/requirements and using local technical expertise to try diverse approaches can strengthen our community with more knowledge and experience and options, compared to an attempt at a monoculture.

And also to be clear, we couldn’t be here without what has gone before. That many found the “consensus” stack wasn’t meeting their needs does not mean the community was a failure. None of these new approaches would be possible without all that’s been learned — by individuals, institutions, and the community — about our individual and shared use cases, requirements, approaches, options, dead-ends, patterns, etc. We were literally figuring out the domain together, and we will continue to do so. Plus what we’ve all learned individually, institutionally, and as a community about software architecture and design in those years. Plus the additional software tools that have come to exist giving us radically new options (the first hydra release was prior to Rails 3.0!!)

It does mean that we’re in a time where those with the capacity to do so have to actually go back to those first principles of 1) evaluating our local business needs 2) using technical expertise to evaluate the best way to meet them (short and long term), taking into account 3) opportunities for collaboration, which can mutually benefit all collaborators.   To the extent that there are institutions that have this capacity, but where decision making on choice of software platforms is not being led by people with the technical expertise to make technical decisions, and decisions are being made on other than technical grounds…  it is not serving us, and in the best case this new situation will force the issue, and we’ll all come out stronger.  In any event, these are exciting times, and I think we have some opportunities for real advancement in the maturity of the software we use and provide.

Feedback welcome

I may have gotten some things wrong; my subjective evaluations and analyses can be disagreed with. Discussion and feedback is very welcome: As comments here, as blog responses, in Slack, wherever you like is good with me.

I am also of course interested in connecting with developers or institutions who may be interested in the “Rails-first” developer’s toolkit approach we are considering, which I’ll go into more about in a subsequent follow-up post.


Thanks for early comments from Eddie Rubeiz, Dan Sanford, Anna Headley, Trey Pendragon, and Tom Johnson. All errors and all opinions are solely my own. 

Leave a comment