Our progress on new digital collections app, and introducing kithe

In September, I wrote a post on a “Proposed Rails-based digital collections developer’s toolkit”

What has happened since then?

Yes we decided to go ahead with a rewrite of our digital collections app, with the new app not based on Hyrax or Valkryie, but a persistence layer based on ActiveRecord (making use of postgres-specific features were appropriate), and exposing ActiveRecord models to the app as a whole.

No, we are not going forward with trying to make that entire toolkit”, with all the components mentioned there.

But Yes, unlike Alberta, we are taking some functionality and putting it in a gem that can be shared between institutions and applications. That gem is kithe. It includes some sharable modeling/persistence code, like Valkyrie (but with a very different approach than Valkyrie), but also includes some additional fundamental components too.

Scaling back the ambition—and abstraction—a bit

The total architecture outlined in my original post was starting to feel overwhelming to me. After all, we also need to actually produce and launch an app for ourselves, on a “reasonable” timeline, with fairly high chance of success.  I left my conversation with U Alberta (which was quite useful, thank you to the Alberta team!), concerned about potential over-reach and over-abstraction. Abstraction always has a cost and building shared components is harder and more time-consuming than building a custom app.

But, then, also informed by my discussion with Alberta,  I realized we basically just had to build a Rails app, and this is something I knew how to do, and we could, as we progressed, jetison anything that didn’t seem actually beneficial for that goal or seem feasible at the moment. And, also after discussion with a supportive local team, my anxiety about the project went down quite a bit — we can do this.

Even when writing the original proposal, I knew that some elements might be traps. Building a generalized ACL permissions system in an rdbms-based web app… many have tried, many have fallen. :)  Generalized controllers are hard, because they are a piece very tightly tied to your particular app’s UI flows, which will vary.

So we’ve scaled back from trying to provide a toolkit which can also be “scaffolding” for a complete starter app.  The goals of the original thought-experiment proposal — a toolkit which provides  pieces developers put together when building their own app — are better approached, for now, by scaling back and providing fewer shared tools, which we can make really solid.

After all, building shared code is always harder than building code for your app. You have more use cases to figure out and meet, and crucially, shared code is harder to change because it’s (potentially) got cross-institutional dependents, which you have to not break. For the code I am putting into kithe, I’m trying to make it solidly constructed and well-polished. In purely local code,  I’m more willing to do something experimental and hacky — it’s easy enough (comparatively!) to change local app code later.  As with all software, get something out there that works, iterating, using what you learn. (It’s just that this is a lot harder to do with shared dependencies without pain!)

So, on October 1st, we decided to embark on this project. We’re willing to show you our fairly informal sketch of a work plan, if you’d like to look.

Introducing kithe

But we’re not just building a local app, we are also trying to create some shareable components. While the costs and risks of shared code and abstractions are real,  I ultimately decided that “just Rails” would not get us to the most maintainable code after all. (And of course nothing is really just Rails, you are always writing code and using non-Rails dependencies; it’s a matter of degree, how much your app seems like a “typical” Rails app to developers).

It’s just too hard to model the data we ourselves already needed (including nested/compound/repeated models) in “just” ActiveRecord, especially in a way that lets you work with it sanely as “just” ActiveRecord, and is still performant. (So we use attr_json, which I also developed, for a No-SQLy approach without giving up rdbms or ActiveRecord benefits including real foreign-key-based associations). And in another example, ActiveStorage was not flexible/powerful enough for our file-handling needs (which are of course at the core of our domain!), and I wasn’t enthused about CarrierWave either — it makes sense to me to make some solid high-quality components/abstractions for some of our fundamental business/domain concerns, while being aware of the risks/costs.

So I’ve put into kithe the components I thought seemed appropriate on several considerations:

  • Most valuable to our local development effort
  • Handling the “trickiest” problems, most useful to share
  • Handling common problems, most likely to be shareable; and it’s hard to build a suite of things that work together without some modelling/persistence assumptions, so got to start there.
  • I had enough understanding of the use-cases (local and community) that I thought I could, if I took a reasonable amount of extra time, produce something well-polished, with a good developer experience, and a relatively stable API.

That already includes, in maybe not 1.0-production-ready but used in our own in-progress app and released (well-tested and well-documented) in kithe:

  • A modeling and persistence layer tightly coupled to ActiveRecord, with some postgres-specific features, and recommending use of attr_json, for convenient “NoSQL”-like modelling of your unique business data (in common with existing samvera and valkyrie solutions, you don’t need to build out a normalized rdbms schema for your data). With models that are samvera/PCDM-ish (also like other community solutions).
    • Including pretty slick handling of “representatives”, dealing with the performance issues in figuring out representative to display with constant query time (using some pg-specific SQL to look up and set “leaf” representative on save).
    • Including UUIDs as actual DB pk/fks, but also a friendlier_id feature for shorter public URL identifiers, with logic to automatically create such if you wish.
  • A nice helper for building Rails forms with repeatable complex embedded values. Compare to the relevant parts of hydra-editor, but (I think) lighter and more flexible.
  • A flexible file-handling architecture based on shrine — meaning transparent cloud-storage support out of the box.
    • Along with a new derivatives architecture, which seems to me to have the right level of abstraction and affordances to provide a “polished” experience.
    • All file-handling support based on assuming expensive things happen in the background, and “direct upload” from browser pre-form-submit (possibly to cloud storage)

It will eventually include some solr/blacklight support, including a traject-based indexing setup, and I would like to develop an intervention in blacklight so after solr results are returned, it immediately fetches the “hit” records from ActiveRecord (with specified eager-loading), so you can write your view code in terms of your actual AR models, and not need to duplicate data to solr and logic for dealing with it. This latter is taken from the design of sunspot.

But before we get there, we’re going to spend a little bit of time on purely local features, including export/import routines (to get our data into the new app; with some solid testing/auditing to be confident we have), and some locally bespoke workflow support (I think workflow is something that works best just writing the Rails). 

We do have an application deployed as demo/staging, with a basic more-than-just-MVP-but-not-done-yet back-end management interface (note: it does not use Solr/Blacklight at all which I consider a feature), but not yet any non-logged-in end-user search front-end. If you’d like a guest login to see it, just ask.

Technical Evaluation So Far

We’ve decided to tie our code to Rails and ActiveRecord. Unlike Valkyrie, which provides a data-mapper/repository pattern abstraction, kithe expects the dependent code to use ActiveRecord APIs (along with some standard models and modelling enhancements kithe gives you).

This means, unlike Valkyrie, our solution is not “persistence-layer agnostic”. Our app, and any potential kithe apps, are tied to Rails/ActiveRecord, and can’t use fedora or other persistence mechanisms. We didn’t have much need/interest in that, we’re happy tying our application logic and storage to ActiveRecord/postgres, and perhaps later focusing on regularly exporting our data to be stored for preservation purposes in another format, perhaps in OCFL.

It’s worth noting that the data-mapper/repository pattern itself, along the lines valkyrie uses, is favored by some people for reasons other than persistence-swapability. In the Rails and ruby web community at large, there is a contingent that think the data-mapper/repository pattern is better than what Rails gives you, and gives you better architecture for maintainable code. Many of this contingent is big on hanami, and the dry-rb suite.  (I have never been fully persuaded by this contingent).

And to be sure, in building out our approach over the last 4 months, I sometimes ran right into the architectural issues with Rails “model-based” architecture and some of what it encourages like dreaded callbacks.  But often these were hypothetical problems, “What if someone wanted to do X,” rather than something I actually needed/wanted to do now. Take a breath, return to agility and “build our app”.

And a Rails/ActiveRecord-focused approach has huge advantages too. ActiveRecord associations and eager-loading support are very mature and powerful tools, that when exposed to the app as an API give you very mature, time-tested tools to build your app flexibly and performantly (at least for the architectures our community are used to, where avoiding n+1 queries still sometimes seems like an unsolved problem!).  You have a whole Rails ecosystem to rely on, which kithe-dependent apps can just use, making whatever choices they want (use reform or not?) as with most any Rails app, without having to work out as many novel approaches or APIs. (To be sure, kithe still provides some constraints and choices and novelty — it’s a question of degree).

Trying to build up an alternative based on data-mapper/repository, whether in hanami or valkyrie, I think you have a lot of work to do to be competitive with Rails mature solutions, sometimes reproducing features already in ActiveRecord or it’s ecosystem. And it’s not just work that’s “time implementing”, it’s work figuring out the right APIs and patterns. Hanami, for instance, is probably still not as mature, as Rails, or as easy to use for a newcomer.

By not having to spend time re-inventing things that Rails already has solutions for, I could spend time on our actual (digital collections) domain-specific components that I wasn’t happy with existing solutions for. Like spending time on creating shareable file handling and derivatives solutions that seem to me to be well-polished, and able to be used for flexible use-cases without feeling like you’re fighting the system or being surprised by it. Components that hopefuly can be re-used by other apps too.

I think schneem’s thoughts on “polish” are crucial reading when thinking about the true costs of shared abstractions in our community.  There is a cost to additional abstractions: in initial implementation, ongoing maintenance, developer on-boarding, and just figuring out the right architectures and APIs to provide that polish. Sometimes these costs are worthwhile in delivered benefits, of course.

I’d consider our kithe-based approach to be somewhere in between U Alberta’s approach and valkryie, in the dimension of “how close do we stick to and tie our line to ‘standard’ Rails”.

Unlike Hyrax, we are building our own app, not trying to use a shared app or “solution bundle” like Hyrax. I would suggest we share that aspect with both the U Alberta approach as well as the several institutions building valkyrie-not-hyrax apps. But if you’ve had good experiences with the over-time maintenance costs of Hyrax, you have a use case/context where Hyrax has worked well for you — then that’s great, and there’s never anything wrong with doing what has worked for you.

Overall, 4 months in, while some things have taken longer to implement than I expected, and some unexpected design challenges have been encountered — I’m still happy with the approach we are taking.

If you are considering a based-on-valkyrie-no-hyrax approach, I think you might be in a good position to consider a kithe approach too.

How do we evaluate success?

Locally,

We want to have a replacement app launched in about a year.

I think we’re basically on target, although we might not hit it on the nose, I feel confident at this point that we’re going to succeed with a solid app, in around that timeline. (knock on wood).

When we were considering alternate approaches before committing to this one, we of course tried to compare how long this would take to various other approaches. This is very hard to predict, because you are trying to compare multiple hypotheticals, but we had to make some ballpark guesses (others may have other estimates).

Is this more or less time than it would have taken to migrate our sufia app to current hyrax? I think it’s probably taking more time to do it this new way, but I think migrating our sufia app to current hyrax (with all it’s custom functionality for current features) would not have been easy or quick — and we weren’t sure current hyrax was a place we wanted to end up.

Is it going to take more or less time than it would have taken to write an app on valkyrie, including any work we might contribute to valkyrie for features we needed? It’s always hard to guess these things, but I’d guess in the same ballpark, although I’m optimistic the “kithe” approach can lead to developer time-savings in the long-run.

(Of course, we hope if someone else wants to follow our path, they can re-use what’s now worked out in kithe to go quicker).

We want it to be an app whose long-term maintenance and continued development costs are good

In our sufia-based app, we found it could be difficult and time-consuming to add some of the features we needed. We also spent a lot of time trying to performance-tune to acceptable levels (and we weren’t alone), or figure out and work towards a manageable and cost-efficient cloud deployment architecture.

I am absolutely confident that our “kithe” approach will give us something with a lower TCO (“total cost of ownership”) than we had with sufia.

Will it be a lower TCO than if we were on the present hyrax (ignoring how to get there), with our custom features we needed? I think so, and that current hyrax isn’t different enough from sufia we are used to — but again this is necessarily a guess, and others may disagree. In the end, technical staff just has to make their best predictions based on experience (individual and community).  Hyrax probably will continue to improve under @no-reply’s steady leadership, but I think we have to make our decisions on what’s there now, and that potential rosey future also requires continued contribution by the community (like us) if it is to come to fruition, which is real time to be included in TCO too.   I’m still feeling good about the “write our own app” approach vs “solution bundle”.

Will we get a lower TCO than if we had a non-hyrax valkyrie-based app? Even harder to say. Valkryie has more abstractions and layers that have real ongoing maintenance costs (that someone has to do), but there’s an argument that those layers will lower your TCO over the long-term. I’m not totally persuaded by that argument myself, and when in doubt am inclined to choose the less-new-abstraction path, but it’s hard to predict the future.

One thing worth noting is the main thing that forced our hand in doing something with our existing sufia-based app is that it was stuck on an old version of Rails that will soon be out-of-support, and we thought it would have been time-consuming to update, one way or another.  (When Rails 6.0 is released, probably in the next few months, Rails maintenance policy says nothing before 5.2 will be supported.) Encouragingly, both kithe and attr_json dependency (also by me), are testing green on Rails 6.0 beta releases — and, I was gratified to see, didn’t take any code changes to do so, they just passed.  (Valkyrie 1.x requires Rails 5.1, but a soon-to-be-released 2.0 is planned to work fine up to Rails 6; latest hyrax requires Rails 5.1 as well, but the hyrax team would like to add 5.2 and 6 soon).

We want easier on-boarding of new devs for succession planning

All developers will leave eventually (which is one reason I think if you are doing any local development, a one-developer team is a bad idea — you are guaranteeing that at some point 100% of your dev team will leave at once).

We want it to be easier to on-board new developers. We share U Alberta’s goal that what we could call a “typical Rails developer” should be able to come on and maintain and enhance the app.

Are we there? Well, while our local app is relatively simple rails code (albeit using kithe API’s), the implementation of  kithe and attr_json, which a dev may have to delve into, can get a bit funky, and didn’t turn out quite as simple as I would have liked.

But when I get a bit nervous about this, I reassure myself remembering that:

  • a) Our existing sufia-based app is definitely high-barrier for new devs (an experience not unique to us), I think we can definitely beat that.
    • Also worth pointing out that when we last posted a position, we got no qualified applicants with samvera, or even Rails, experience. We did make a great hire though, someone who knew back-end web dev and knew how to learn new tools; it’s that kind of person that we ideally need our codebase to be accessible to, and the sufia-based one was not.
  • b) Recruiting and on-boarding new devs is always a challenge for any small dev shop, especially if your salaries are not seen as competitive.  It’s just part of the risk and challenge you accept when doing local development as a small shop on any platform. (Whether that is the right choice is out of scope for this post!)

I think our code is going to end up more accessible to actually-existing newly onboarded devs  than a customized hyrax-based solution would be. More than Valkyrie? I do think so myself, I think we have fewer layers of “specialty” stuff than valkyrie, but it’s certainly hard to be sure, and everyone must judge for themselves.

I do think any competent Rails consultancy (without previous LAM/samvera expertise) could be hired to deal with our kithe-based app no problem; I can’t really say if that would be true of a Valkyrie-based app (it might be); I do not personally have confidence it would be true of a hyrax-based app at this point, but others may have other opinions (or experience?).

Evaluating success with the community?

Ideally, we’d of course love it if some other institutions eventually developed with the kithe toolkit, with the potential for sharing future maintenance of it.

Even if that doesn’t happen, I don’t think we’re in a terrible place. It’s worth noting that there has been some non-LAM-community Rails dev interest in attr_json, and occasional PRs; I wouldn’t say it’s in a confidently sustainable place if I left, but I also think it’s code someone else could step into and figure out. It’s just not that many lines of code, it’s well-tested and well-documented, and and i’ve tried to be careful with it’s design — but take a look at and decide for yourself!. I can not emphasize enough my belief that if you are doing local development at all (and I think any samvera-based app has always been such), you should have local technical experts doing evaluation before committing to a platform — hyrax, valkyrie, kithe, entirely homegrown, whatever.

Even if no-one else develops with kithe itself, we’d consider it a success if some of the ideas from kithe influence the larger samvera and digital collections/repository communities. You are welcome to copy-paste-modify code that looks useful (It’s MIT licensed, have at it!). And even just take API ideas or architectural concepts from our efforts, if they seem useful.

We do take seriously participating in and giving back to the larger community, and think trying a different approach, so we and others can see how it goes, is part of that. Along with taking the extra time to do it in public and write things up, like this. And we also want to maintain our mutually-beneficial ties to samvera and LAM technologist communities; even if we are using different architectures, we still have lots of use-cases and opportunities for sharing both knowledge and code in common.

Take a look?

If you are considering development of a non-Hyrax valkyrie-based app, and have the development team to support that — I believe you have the development team to support a kithe-based approach too.

I would be quite happy if anyone took a look, and happy to hear feedback and have conversations, regardless of whether you end up using the actual kithe code or not. Kithe is not 1.0, but there’s definitely enough there to check it out and get a sense of what developing with it might be like, and whether it seems technically sound to you. And I’ve taken some time to write some good “guide” overview docs, both for potential “onboarding” of future devs here, and to share with you all.

We have a staging server for our in-development app based on kithe; if you’d like a guest login so you can check it out, just ask and I can share one with you.

Our local app also should also probably be pretty easy for you to get installed (with dependencies) from a git checkout, and just run it and see how it goes. See: https://github.com/sciencehistory/scihist_digicoll/

Hope to hear from you!

2 thoughts on “Our progress on new digital collections app, and introducing kithe

Leave a comment