on the dangers of having too MUCH time/resources available in development

When you have a deadline for a release, and limited development resources to get there, you are forced to be ruthless about features you develop, and identifying the business value of each of them. You have to think about “Minimum Viable Product” (with “viable” being a real part of that phrase).

And you have to ruthlessly think about the “business value” (whether that’s in terms of meeting end-user needs, or your organizations needs), and make sure you are focusing on the features that provide maximum value most efficiently. That is, trying to minimize developer time and other expense while maximizing value.

(These are not usually totally quantifiable things, at least in advance, so this isn’t actually just an equation. Developer time may be quantifiable in retrospect, but is only an estimate before the fact. “Value” may be quantifiable if you are a for-profit concern who can measure value by revenue, but often again not until after the fact. Although there are ways to A/B testing and such to do minimal investment to discover if more investment is justified. I’m more used to working in non-commercial environments where value isn’t really easily quantifiable at all — but it’s still there).

You have to do this because you have an enormous “wishlist” of things various stakeholders want, and you know you can’t possibly accomplish them all and get your product out by the deadline — whether that’s a hard deadline, or just in the sense of getting the product out ever, or not waiting long past when you could have gotten it out, in pursuit of accomplishing everything you possibly could.

Well, maybe you don’t have to, but it’s generally more clear to everyone involved that you ought to, when you are worried about getting the product out the door soon enough to meet deadlines or stakeholder expectations, when there’s actually some stress involved and you aren’t totally confident this will be easy to do.  Its more clear that the only way to keep from going crazy and meet your goals is to apply a bit of ruthlessness to evaluating possible things you could do in terms of business value.

(If you are using any kind of agile, scrum, or buzzword-less process/division of labor that involves a “product owner” role, then it’s the product owner’s responsibility and authority to be making these decisions. About what work to choose to do next, in order to maximize value. In consultation with all sorts of stakeholders, as well as the rest of the development team (the “product owner” is part of the development team too), of course).

But okay, let’s say you met that deadline, your product is out, and let’s say that it was quite successful, by whatever means you measure success, quantifiable or not. (Again my experience is mostly in non-commercial environments where success is not as obviously or easily quantifiable as just “revenue”, but there’s still some kind of success, even if it’s just the judgements of internal and external stakeholders and end-users — if there wasn’t such a thing as success vs less success, why were you even doing it in the first place? :) ).

Now the pressure is off — you made it to the finish line!  You still maybe have a giant list of things some stakeholders thought of as possible things to do, or things they would like to do, or someone’s pet thing, or whatever. Or maybe you don’t, and the possible things to develop just come up popcorn-style.  Either way you’re no longer so worried about it, maybe it seems like you have all the time in the world.

But, I suggest, the continued success of your product/project still relies on bringing that ruthlessness to evaluating business value, even when it’s less obvious that there are external constraints forcing it. The continued success of your product, and you and your teams’ sanity.

If you lose your focus on that, the product starts going all discombobulated. You can end up doing things that actually hurt business value. You can have stakeholders wondering why their pet feature wasn’t implemented, but someone else’s was, and turning it into a political battle, and realizing too late that you can’t really explain/justify the choices made.  You can end up spending weeks on something you thought would only take a couple days, when really there was no reason to be doing it in the first place (or you would have quickly realized after a few days when the true scope became apparent) if you had been ruthless about business value.  You can spend a lot of time “rearranging deck chairs,” which ultimately can be bad for the morale of both the development team and other stakeholders.

I actually hate working under time-pressure and resource-stress (who doesn’t?), but in fact the time-pressure and resource-stress of trying to get across the finish line provided some useful focus that resulted in a better product and better morale. (Towards the end of writing this, I found a blog post titled “Can we truly be agile in maintenance projects” which suggests this may not be a unique thought).

But I think the solution to keeping a focus on business value without these stressors is probably the same as it was with them:  The agile practice of managing/grooming your “backlog” and “sprint planning”. 

We actually have one benefit in an “already crossed the first finish line”,  “maintenance and continued development” scenario:  There is less pressure to un-agilely plot out everything for the next 6+ months (ie, to get across that first finish line). It should actually be a bit easier to do things “agilely”.  (One of the insights of agile development, I think, is that all that really matters to the development team is what are we doing right now.  This isn’t an absolute, sometimes you gotta know what’s coming down the line, or make long-term plans. But the end (or rather the beginning) of the day, one human can really only be doing one thing at a time, and what ultimately matters is choosing what that is).

If you keep that backlog management practice even in a “maintenance” phase, then it becomes obvious that you can’t do it without maintaining a focus on business value.

Personally, I don’t necessarily think it’s important to always have your entire backlog ordered (as some scrum “manuals” will insist). What’s important is that when you plan what you’re going to work on now (ie, the next “sprint”, whether that’s a day, a week, or a month), the things you put in there have been evaluated for business value (with the product owner having ultimate responsibility and authority for that). Which requires having enough of your backlog prioritized/ordered enough that you don’t need to always be looking at the entire backlog and re-litigating it before every sprint (cause that’s going to drive you crazy and suck all your time), but what practices you need to make that happen can be contextual.

If you keep a focus on “what are we doing now/next”, and keep your agile practice of determining this with a “ruthless” focus on maximizing business value (which requires being able to articulate the business value of what you choose; which requires continual pursuit of the contextual knowledge — user-testing; internal mission and strategy; internal ‘politics’) — even in the absence of high-stress deadlines and resource crunches — I think you can increase your likelihood of continuing to produce a great product,  being able to explain/justify to stakeholders why you did X and not Y, and keeping the development team sane and their work rewarding.

Advertisements

BrowseEverything in Sufia, and refactoring the ingest flow

[With diagram of some Sufia ingest classes]

So, our staff that ingests files into our Sufia 7.4-based repository regularly needs to ingest dozens of 100MB+ TIFFs. For our purposes here, we’re considering uploading a bunch of “children” (in our case usually page images) of a single “work”, through the work edit page.

Trying to upload so much data through the browser ends up being very not great — even with the fancy JS immediately-upload-with-progress-bar code in Sufia. Takes an awful long time (hours; in part cause browsers’ 3-connections-per-host limit is a bottleneck compared to how much network bandwidth you could get), need to leave your browser open the whole time, and it actually locks up your browser from interacting with our site in any other tabs (see again 3-connections-per-host limit).

The solution would seem to be getting the files on some network-accessible storage, and having the app grab them right from there. browse_everything was already included in sufia, so we decided to try that. (Certainly another solution would be having workflow put the files on some network-accessible storage to begin with, but there were Reasons).

After a bunch of back-and-forth’s, for local reasons we decided to use AWS S3. And a little windows doohickey that gives windows users a “folder” they can drop things into, that will be automatically uploaded to S3. They’ve got to wait until the upload is complete before the things are available in the repo UI. (But it goes way faster than upload through browser, doesn’t lock up your browser, you don’t even need to leave your browser open, or your computer on at all, as the windows script is actually running on a local network server).  When they do ask the sufia app to ingest, the sufia app (running on EC2) can get the files from S3 surprisingly quickly — in-region AWS network is pretty darn fast.

Browse_everything doesn’t actually work in stock Sufia 7.4

The first barrier is, it turns out browse_everything doesn’t actually work in Sufia 7.4, the feature was broken.

(Normally when I do these things, I try to see what’s been fixed/changed in hyrax: To see if we can backport hyrax fixes;  to get a sense of what ‘extra’ work we’re doing by still being in sufia; and to report to you all. But in this case, I ended up just getting overwhelmed and couldn’t keep track. I believe browse_everything “works” in Hyrax, but may still have problems/bugs, not sure, read on.)

ScholarSphere had already made browse-everything work with their sufia 7.x, by patching various parts of sufia, as I found out from asking in Slack and getting helpful help from PSU folks, so that could serve as a model.  The trick was _finding_ the patches in the scholarsphere source code, but it was super helpful to not have to re-invent the wheel when I did. Sometimes after finding a problem in my app, I’d have a better sense of which files to look at in ScholarSphere for relevant patches.

Browse-everything S3 Plugin

Aside from b-e integration on the sufia side, the S3 plugin for browse-everything also had some problems.  The name of the file(s) you choose in the b-e selector didn’t show up in the sufia edit screen after you selected it, because the S3 b-e adapter wasn’t sending it. I know some people have told me they’re using b-e with S3 in hyrax (the successor to sufia) — I’m not sure how this is working. But what I did is just copy-and-paste the S3 adapter to write a custom local one, and tell b-e to use that.

The custom local one includes a fix for the file name thing (PR’d to browse-everything), and also makes the generated S3 public links have a configurable expires_in (PR’d to browse-everything) — which I think you really want for S3 use with b-e, to keep them from timing out before the bg jobs get to them.

Both of those PR’s have been merged to b-e, but not included in a release yet. It’s been a while since a b-e release (As I write this latest b-e is 0.15.1 in Dec 2017; also can we talk about why 0.15.1 isn’t just re-released as 1.0 since it’s being used in prod all over the place?).  Another fix in b-e which isn’t in prod yet, is a fix for directories with periods in them, which I didn’t notice until after we had gone live with our implementation, and then back-ported in as a separate PR.

Instead of back-porting this stuff in as patches, one could consider using b-e off github ‘master’. I really really don’t like having dependencies to particular un-released git trees in production. But the other blocker for that course of action is that browse-everything master currently has what I consider a major UX regression.  So back-port patches it is, as I get increasingly despondent about how hard it’s gonna be to ever upgrade-migrate our sufia 7.4 app to (some version of) hyrax.

The ole temp file problem

Another problem is that the sufia ImportUrlJob creates some files as ruby Tempfiles, which means the file on disk can/will be deleted by Tempfile code whenever it’s reference gets garbage collected. But those files were expected to stay around for other code, potentially background jobs, to have to process.  But bg jobs are in entirely different ruby processes, they aren’t keeping a reference to the TempFile keeping it from being deleted.

In some cases the other things expecting the file are able to re-download it from fedora if it’s not there (via the WorkingDirectory class), which is a performance issue maybe, but works. But in other cases, they just 500.

I’m not sure why that wasn’t a problem all along for us, maybe the S3 ingest changed timing to make it so? It’s also possible it still wasn’t a problem, I just mistakenly thought it was causing the problems I was having, but I noticed the problem code-reading trying to figure out the mysterious problems we were having, so I went ahead and fixed it it into our custom ImportUrlJob.

Interestingly, while the exact problem I had already been fixed in Hyrax —  a subsequent code-change in Hyrax re-introduced a similar TempFile problem in another way, then fixed again by mbklein. That fix is only in Hyrax 2.1.0.

But then the whole Sufia/Hyrax ingest architecture…

At some point I had browse-everything basically working, but… if you tried to ingest say 100 files via S3, you would have to wait a long time for your browser to get a response back. In some cases timing out.

Why? Because while a bunch of things related to ingest are done in background jobs, the code in sufia tried to create all the FileSet objects and attach them to the Work in  Sufia::CreateWithRemoteFilesActor, which ends up called in the foreground, during the request-response loop.  (I believe this is the same in Hyrax, not positive). (This is not how “local”/”uploaded” files are handled).

And this is a very slow thing to do in Sufia. Whether that’s becuase of Fedora, ActiveFedora, the usage patterns of ActiveFedora in sufia/hyrax… I think it’s combo of all of them. The code paths being used sometimes do slow things things once-per-new file that really could be done just once for the work. But even fixing that, it still ain’t really speedy.

At this point (or maybe after a day or two of unsuccessfully hacking things, I forget), I took a step back, and spent a day or two getting a handle on the complete graph of classes involved in this ingest process, and diagraming it.

sufia7.4_ingest_28Jun2018

You may download XML you can import into draw.io to edit, if you’d like to repurpose for your own uses, for updating for Hyrax, local mods, whatever.  

This has changed somewhat in Hyrax, but I think many parts are still substantially the same.

A few thoughts.

If I’m counting right, we have nine classes/objects involved in: Creating some new “child” objects, attaching an uploaded file to each one (setting a bit of metadata based on original file name), and then attaching the “child” objects to a parent (copying a bit of metadata from parent). (This is before any characterization or derivatives).

This seems like a lot. If we were using ActiveRecord and some AR file attachment library (CarrierWave, or I like the looks of shrine) this might literally be less than 9 lines of code.

Understanding why it ended up this way might require some historical research. My sense is that: A) The operations being done are so slow (again, whether due to Fedora, AF, or Sufia architecture) that things had to be broken up into multiple jobs that might not have to be otherwise. B) A lot of stuff was added by people really not wanting to touch what was already there (cause they didn’t understand it, or cause it was hard to get a grasp on what backwards incompat issues might arise from touching it), so new classes were added on top to accomodate new use cases even if a “greenfield” implementation might result in a simpler object graph (and less code duplication, more DRY).

But okay, it’s what we got in Sufia. Another observation though is that the way ‘local’ files (ie “uploaded” files, via HTTP, to a dir the web app can access) and ‘remote’ files (browse-everything) are handled is not particularly parallel/consistent, the work is divided up between classes in pretty different ways for the two paths. I suspect this may be due to “B” above.

And if you get into the implementations of various classes involved, there seems to be some things being done _multiple times_ accross different classes, the same things. Which doesn’t help when the things are very slow (if they involve saving a Work).  Again I suspect (B) above.

So, okay, at this point I hubristically thought, okay, let’s just rewrite some parts of this to make more sense, at least to my view of what makes sense. (What was in Hyrax did not seem to me to be substantially different in the ways relevant here). Partially cause I felt it would be really hard to figure out and fix the remaining bugs or problems in the current code, which I found confusing, and it’s lack of parallelism between local/remote file handling meant a problem could be fixed in one of those paths and not in the other which did things very differently.

Some of my first attempts involved not having a class that created all the new “filesets” and attached them to the parent work.  If we could just have a job for each new file, that created a fileset for that file and attached it to the work, we’d be fitting into the ActiveJob architecture better — where you ideally want a bunch of fairly small and quick and ideally idempotent jobs, not one long-running job doing a lot of things.

The problem I ran into there, is that every time you add a member to a ‘Work’ in the Sufia/Fedora architecture, you actually need to save that Work, and do so by updating a single array of “all the members”.  So if a bunch of jobs are running concurrently trying to add members to the same Work at once, they’re going to step on each others toes. Sufia does have a “locking” mechanism in place (using redlock), so they shouldn’t actually overwrite each others data. But if they each have to wait in line for the lock, the concurrency benefits are significantly reduced — and it still woudln’t really be playing well with ActiveJob architecture, which does’t expect jobs to be just sitting there waiting for a lock blocking the workers.  Additionally, in dev, i was sometimes getting some of these jobs timing out trying to get the lock (which may have been due to using SQLite3 in dev, and not an issue if I was using pg, which I’ve since switched to in dev to match prod).

After a few days of confusion and banging my head against the wall here, I returned to something more like stock sufia where there is one mega-job that creates and associates all the filesets. But it does it in some different ways than stock sufia, in a couple places having to use “internal” Sufia API — with the goal of _avoiding_ doing slow/expensive things multiple times (save the work once with all new filesets added as members, instead of once for each member as stock code did), and getting the per-file jobs queued as soon as possible under the constraints.

I also somewhat reduced the number of different bg jobs. There was at least one place in stock code where a bg job existed only to decide which of two other possible bg jobs it really wanted to invoke, and then perform_later on them. I had my version of a couple jobs do a perform_now instead — I wanted to re-use the logic locked in the two ActiveJob workers being dispatched, but there was no reason to have a job that existed only for milliseconds whose purpose was only to queue up another job, it could call that existing logic synchronously instead.

I also refactored to try to make “uploaded” (local) vs “remote” file ingest much more consistently parallel — IMO it makes it easier to get the code right, with less code, and easier to wrap your head around.

Here’s a diagram of where my architecture ended up:

sufia-7.4-scihist-custom-28Jun2018.png

 

 

Did it work?

So I began thinking we had a solution to our staff UX problem that would take “a couple days” to implement, because it was “already a Sufia feature” to use browse-everything from S3.

In fact, it took me 4-5 weeks+ (doing some other parts of my job in those weeks, but with this as the main focus).  Here’s the PR to our local app.

It involved several other fixes and improvements that aren’t mentioned in this report.

We found several bugs in our implementation — or in sufia/cc — both before we actually merged and after we merged (even though we thought we had tested all the use cases extensively, there were some we hadn’t until we got to real world data — like the periods-in-directory-names b-e bug).

In general, I ran into something I’ve run into before — not only does sufia has lots of parts, but they are often implicitly tightly-coupled, assuming that other parts are doing things in a certain way, where if the other things change that certain way, it breaks the first things, with none of these assumptions documented (or probably intentional or even conscious from the code writers).

Another thing I think happens, is that sometimes there can be bugs in ActiveFedora, but the particular way the current (eg) Sufia implementation is implemented doesn’t hit them, but you change the code in certain ways that probably ought to be fine, and now they hit bugs that were actually always there, but nobody noticed since the shared implementation didn’t hit them.

Some time after we deployed the new feature, we ran into a bug that I eventually traced to an ActiveFedora bug (one I totally  don’t understand myself), which had already been fixed and available in AF 11.5.2 (thanks so much to Tom Johnson for, months ago, backporting the fix to AF 11.x, not just in 12.x).  We had been running ActiveFedora 11.1.6. After some dependency hell of getting a consistent dependency tree with AF 11.5.2, it seems to have fixed the problem without breaking anything else or requiring any other code changes (AF appears to have not actually introduced backwards incommpats between these minor version releases, which is awesome).

But what’s a mystery to me (well, along with what the heck is up with that bug, which I don’t understand at all in the AF source), is why we didn’t encounter this bug before, why were the functions working just fine with AF 11.1.6 until recently? It’s a mystery, but my wild guess is that the changes to order and timing of how things are done in my ingest refactor made us hit an AF bug that the previous stock Sufia usage had not.

I can’t hide it cause I showed you the PR, I did not write automated tests for the new ingest functionality. Which in retrospect was a mistake. Partially I’m not great at writing tests; partially because when I started it was so experimental and seemed like it could be a small intervention, but also implementation kept changing so having to keep changing tests could have been a slowdown. But also partially cause I found it overwhelming to figure out how to write tests here, it honestly gave me anxiety to think about it.  There are so many fairly tightly coupled moving parts, that all had to change, in a coordinated fashion, and many of them were ActiveJob workers.

Really there’s probably no way around that but writing some top-level integration tests, but those are so slow in sufia, and difficult to write sometimes too. (Also we have a bunch of different paths that probably all need testing; one of our bugs ended up being with when someone had chosen a ‘format’ option in the ‘batch create’ screen; something I hadn’t been thinking to test manually and wouldn’t have thought to test automated-ly either. Likewise the directory-containing-a-period bug. And the more separate paths to test, the more tests, and when you’re doing it in integration tests… your suite gets so so slow.  But we do plan to add at least some happy path integration tests, we’ve already got a unit of work written out and prioritized for soonish. Cause I don’t want this to keep breaking if we change code again, without being caught by tests.

So… did it work?  Well, our staff users can ingest from S3 now, and seems to have successfully made their workflow much more efficient, productive, and less frustrating, so I guess I’d say yes!

What does this say about still being on Sufia and upgrade paths?

As reported above, I did run into a fair number of bugs in the stack that would be have been fixed if we had been on Hyrax already.  Whenever this happens, it rationally makes me wonder “Is it an inefficient use of our developer time that we’re still on Sufia dealing with these, should we have invested developer time in upgrading to Hyrax already?”

Until roughly March 2018, that wouldn’t have really been an option, wasn’t even a question. At earlier point in the two-three-ish year implementation process (mostly before I even worked here), we had been really good at keeping our app up to date with new dependency releases. Which is why we are on Sufia 7.4 at least.

But at some point we realized that getting off that treadmill was the only way we were going to hit our externally-imposed deadlines for going live. And I think we were right there. But okay, since March, it’s more of an open book at the moment — and we know we can’t stay on Sufia 7.4.0 forever. (It doesn’t work on Rails 5.2 for one, and Rails before 5.2 will be EOL’d before too long).  So okay the question/option returns.

I did spend 4-5 weeks on implementing this in our sufia app. I loosely and roughly and wild-guessedly “estimate” that upgrading from our Sufia 7.4 app all the way to Hyrax 2.1 would take a lot longer than 4-5 weeks. (2, 3, 4 time as long?)

But of course this isn’t the only time I’ve had to fight with bugs that would have been fixed in Hyrax, it adds up.

But contrarily, quite a few of these bugs or other architecture issues corrected here are not fixed in Hyrax yet either. And a couple are fixed in Hyrax 2.1.0, but weren’t in 2.0.0, which was where Hyrax was when I started this.  And probably some new bugs too. Even if we had already been on Hyrax before I started looking at “ingest from S3”, it would not have been the “couple day” implementation I naively assumed. It would have been somewhere in between that and the 4-5 week+ implementation, not really sure where.

Then there’s the fact that even if we migrate/upgrade to Hyrax 2.1 now… there’s another big backwards-incompatible set of changes slated to come down the line for a future Hyrax version already, to be based on “valkyrie” instead.

So… I’m not really sure. And we remain not really sure what’s going to become of this Sufia 7.4 app that can’t just stay on Sufia 7.4 forever. We could do the ‘expected’ thing and upgrade to hyrax 2.1 now, and then upgrade again when/if future-valkyrie-hyrax comes out. (We could also invest time helping to finish future-valkyrie-hyrax). Or we could actually contribute code towards a future (unexpected!) Sufia release (7.5 or 8 or whatever) that works on Rails 5.2 — not totally sure how hard that would be.

Or we could basically rewrite the app (copying much of the business logic of course, easier in business logic we managed to write in ways less coupled to sufia) — either based on valkyrie-without-sufia (as some institutions have already done for new apps, I’m not sure if anyone has ported a sufia or hyrax app there yet; it would essentially be an app rewrite to do so) or…. not.  If it would be essentially an app rewrite to go to valkyrie-without-hyrax anyway (and unclear at this point how close to an app rewrite to go to a not-yet-finished future hyrax-with-valkyrie)…

We have been doing some R&D development into what an alternate digital collections/repo architecture could look like, not necessarily based on Valkyrie — my attr_json gem is part of that, although doesn’t demonstrate a commitment to actually use that gem in the future here at MPOW, we’re just exploring different things.

Deep-dive into hydra-derivatives

(Actually first wrote this in November, five months ago, getting it published now…)

In our sufia 7.4 digital repository, we wanted to add some more derivative thumbnails and download JPGs from our large TIFF originals: 3-4 sizes of JPG to download, and 3 total sizes of thumbnail for the three sizes in our customized design, with each of them having a 2x version for srcset too. But we also wanted to change some of the ways the derivatives-creation code worked in our infrastructure.

1. Derivatives creation is already in a bg ActiveJob, but we wanted to run it on a different server than the rails app server. While the built-in job was capable of this, downloading the original from fedora, in our experience,in at least some circumstances, it left behind that temporary download instead of removing it when done. Which caused problems especially if you had to do bulk derivatives creation of already uploaded items.

  • Derivative-creating bg jobs ought not to be fighting over CPU/RAM with our Rails server, and also ought to be able to be on a server separately properly sized and scaled for the amount of work to be done.

2. We wanted to store derivatives on AWS S3

  • All our stuff is deployed on AWS, storing on S3 is over the long-term cheaper than storing on an Elastic Block Storage ‘local disk’.
  • If you ever wanted to horizontally scale your rails server “local disk” storage (when delivered through a rails controller as sufia 7 does it) requires some complexity, probably a shared file system, which can be expensive and/or unreliable on AWS.
  • If we instead deliver directly from S3 to browsers, we take that load off the Rails server, which doesn’t need it. (This does make auth more challenging, we decided to punt on it for now, with the same justification and possible future directions as we discussed for DZI tiles).
  • S3 is just a storage solution that makes sense for a whole bunch of JPGs and other assets you are going to deliver over the web, it’s what it’s for.

3. Ideally, it would be great to tweak the TIFF->JPG generation parameters a bit. The JPGs should preferably be progressive JPGs, for instance, they weren’t out of stock codebase. The parameters might vary somewhat between JPGs intended as thumbnails and on-screen display, vs JPGs intended as downloads. The thumb ones should ideally use some pretty aggressive parameters to reduce size, such as removing embedded color profiles. (We ended up using vips instead of imagemagick).

4. Derivatives creation seemed pretty slow, it would be nice to speed it up a bit, if there were opportunities discovered to do so. This was especially inconvenient if you had to generate or re-generate one or more derivatives for all objects already existing in the repo. But could also be an issue even with routine operation, when ingesting many new files at once.

I started with a sort of “deep-dive” into seeing what Sufia (via hydra-derivatives) were doing already. I was looking for possible places to intervene, and also to see what it was doing, so if I ended up reimplementing any of it I could duplicate anything that seemed important.  I ultimately decided that I would need to customize or override so many parts of the existing stack, it made sense to just replace most of it locally. I’ll lead you through both those processes, and end with some (much briefer than usual) thoughts.

Deep-dive into Hydra Derivatives

We are using Sufia 7.4, and CurationConcerns 1.7.8. Some of this has changed in Hyrax, but I believe the basic architecture is largely similar. I’ll try to make a note of parts I know have changed in Hyrax. (links to hyrax code will be to master at the time I write this, links to Sufia and CC will be to the versions we are using).

CreateDerivativesJob

We’ll start at the top with the CurationConcerns CreateDerivativesJob. (Or similar version in Hyrax).  See my previous post for an overview of how/when this job gets scheduled.  Turns out the execution of a CreateDerivativesJob is hard-coded into the CharacterizeJob, you can’t choose to have it run a different job or none at all. (Same in hyrax).

The first thing this does is acquire a file path to the original asset file, with `CurationConcerns::WorkingDirectory.find_or_retrieve(file_id, file_set.id, filepath)`. CurationConcerns::WorkingDirectory (or see in hyrax) checks to see if the file is already there in an expected place inside CurationConcerns.working_directory, and if not copies it to the working directory from a fedora fetch,  using a Hydra::PCDM::File object.

Because it’s using Hydra::PCDM::File object #content API, it fetches the entire fedora file into memory, before writing it to the CurationConcerns.working_directory.  For big files, this uses a lot of RAM temporarily, but more distressing to me is the additional latency, to first fetch the thing into RAM and then stream RAM to disk, instead of streaming right to disk. While the CurationConcerns::WorkingDirectory code seems to have been written originally to try to stream, with a copy_stream_to_working_directory method in terms of streams, the current implementation just turns a full in-memory string into a StringIO instead.  The hyrax implementation is the same. 

Back to the CreateDerivativesJob, we now have a filename to a copy of the original asset in the ‘working directory’.  I don’t see any logic here to clean up that copy, so perhaps this is the source of the ‘temporary file buildup’ my team has sometimes seen.  I’m not sure why we only sometimes see it, or if there are other parts of the stack meant to clean this up later in some cases. I’m not sure if the contract of `CurationConcerns::WorkingDirectory#find_or_retrieve` is to always return a temporary file that the caller is meant to clean up when done, if it’s always safe to assume the filename returned can be deleted by caller; or if instead future actors are meant to use it and/or clean it up.

The CreateDerivativesJob does an acquire_lock_for: I think this is probably left over from when derivatives were actually stored in fedora, now that they are not, this seems superflous (and possibly expensive, not sure). And indeed it’s gone from the hyrax version, so that’s probably true.

Later, the CreateDerivativesJob reindexes the fileset object (first doing a file_set.reload, I think that’s from fedora, not solr?), and in some cases it’s parent.   This is a potentially expensive operation — which matters especially if you’re, say, trying to reindex all derivatives. Why does it need a reindex? Well, sufia/hyrax objects in Solr index have a relative URL to thumbnails in a `thumbnail_path_ss` field (a design our app no longer uses).  But thumbnail paths in sufia/hyrax are consistently predictable from file_set_id, of the form /downloads/#{file_set_id}?file=thumbnail.  Maybe the reindex dates from before this is true? Or maybe it’s just meant to register “yes, a thumbnail is there now”, so the front-end can tell the difference between missing and absent thumb?  (I’d rather just keep that out of the index and handle thumbs not present at expected URLs with some JS. )

I tried removing the index update from my locally overridden CreateDerivativesJob, and discovered one reason it is there. In normal operation, this is the only time a parent work gets reindexed after a fileset is added to it that will be marked it’s representative fileset. And it needs to get reindexed to have the representative_id and such.  I added it to AddFileToFileSet instead, where it belongs. Phew!

So anyway,  how are the derivatives actually created?  Just by calling file_set.create_derivatives(filename). Note the actual local (working directory) method on the model object doesn’t seem quite right for this, you might want different derivatives in different contexts for the same model, but it works. Hyrax is making the same call.  Hyrax introduces a DerivativeService class not present in Sufia/CC , which I believe is meant to support easier customization.

FileSet#create_derivatives

FileSet#create_derivatives is defined in a module that gets mixed into your FileSet class. It branches on the mime type of your original, running different (hard-coded) classes from the hydra-derivatives gem depending on type.  For images, that’s:

Hydra::Derivatives::ImageDerivatives.create(filename,
 outputs: [{ label: :thumbnail, 
             format: 'jpg', 
             size: '200x150>', 
             url: derivative_url('thumbnail') }])

You can see it passes in the local filepath again, as well as some various options in an outputs keyword arg — including a specified url of the to-be-created derivative — as a single hash inside an array for some reason. derivative_url uses a derivative_path_factory, to get a path (on local FS?), and change it into a file: url — so this is really more of a path than a URL, it’s apparently not actually the eventual end-user-facing URL, but just instructions for where to write the file. The derivative_path_factory is a DerivativePath, which uses CurationConcerns.config.derivatives_path, to decide where to put it — it seems like there’s a baked-in assumption (passed through several layers) that  destination will  be on a local filesystem on the machine running the job.

Hyrax actually changes this somewhat — the relevant create_derivatives method seems to moved to the FileSetDerivativeService — it works largely the same, although the different code to run for each mime-type branch has been moved to separate methods, perhaps to make it easier to override. I’m not quite sure how/where FileSet#create_derivatives is defined (Hyrax CreateDerivativesJob still calls it), as the Hyrax::FileSet::Derivatives module doesn’t seem to mix it in anymore. But FileSet#create_derivatives presumably calls #create_derivatives for the FileSetDerivativeService somehow.  Since I was mainly focusing on our code using Sufia/CC, I left the train here. The Hyrax version does have a cleanup_derivatives method as a before_destroy presumably on the FileSet itself, which is about cleaning up derivatives is a fileset is deleted (did the sufia version not do that at all?) Hyrax seems to still be using the same logic from hydra_derivatives to actually do derivatives creation.

Since i was mostly interested with images, I’m going to specifically dive in only to the  Hydra::Derivatives::ImageDerivatives code.  Both Hyrax and Sufia use this. Our Sufia 7.4 app is using hydra-derivatives 3.2.1. At the time of this writing, hydra-derivatives latest release is 3.3.2, and hyrax does require 3.3.x, so a different minor version than what I’m using.

Hydra::Derivatives::ImageDerivatives and cooperators

If we look at Hydra::Derivatives::ImageDerivatives (same in master and 3.2.1) — there isn’t much there. It sets a self.processor_class to Processors::Image, inherits from Runner, and does something to set a format: png as a default argument.

The superclass Hydra::Derivatives::Runner has some business logic for being a derivative processor. It has a class-wide output_file_service defaulting to whatever is configured as Hydra::Derivatives.output_file_service.  And a class-wide source_file_service defaulting to Hydra::Derivatives.source_file_service.  It fetches the original using the the source file service. For each arg hash passed in (now we understand why that argument was an array of hashes), it just sends it to the configured processor class, along with the output_file_service:  The processor_class seems to be responsible for using the passed-in  output_file_service to actually write output.  While it also passes in the source_file_service, this seems to be ignored:  The source file itself has already been fetched and had it’s local file system path passed in directly, and I did not find anything using the passed-in source_file_service.  (this logic seems the same between 3.2.1 and current master).

In my Sufia app, Hydra::Derivatives.output_file_service is CurationConcerns::PersistDerivatives — which basically just writes it to local file system, again using a derivative_path_factory set to DerivativePath.  The derivative_path_factory PersistDerivatives probably has to match the one up in FileSet#create_derivatives — I guess if you changed the derivative_path_factory in your FileSet, or probably bad things would happen?  And Hydra::Derivatives.source_file_service is CurationConcerns::LocalFileService which does nothing but open the local file path passed in, and return a File object. Hyrax has pretty much the same PersistDerivatives and LocalFileService services, I would guess they are also the defaults, although haven’t checked.

I’d guess this architecture was designed with the intention that if you wanted to get a source file from somewhere other than local file system, you’d set a custom  source_file_service.   But even though Sufia and Hyrax do get a source file from somewhere else, they don’t customize the source_file_service, they fetch from fedora a layer up and then just pass in a local file that can be handled by the LocalFileService.

Okay, but what about actually creating derivatives?

So okay, the actual derivative generation though, recall, was handled by the processor_class dependency, hard-coded to Processors::Image.

Hydra::Derivatives::Processors::Image I think is the same in hydra-derivatives 3.2.1 and current master. It uses MiniMagick to do it’s work. It will possibly change the format of the image. And possibly set (or change?) it’s quality (which mostly only effects JPGs I think, maybe PNGs too). Then it will run a layer flatten operation the image.  And resize it.  Recall that #create_derivatives actually passed in an imagemagick-compatible argument for desired size, size: '200x150>', so create_derivatives is actually assuming that the Hydra::Derivatives::ImageDerivatives.create will be imagemagick-based, or understand imagemagick-type size specifications, there’s some coupling here.

MiniMagick actually does it’s work by shelling  out to command-line imagemagick (or optionally graphicsmagick, which is more or less API-compatible with imagemagick). A line in the MiniMagick README makes me concerned about how many times MiniMagick is writing temporary files:

MiniMagick::Image.open makes a copy of the image, and further methods modify that copy (the original stays untouched). We then resize the image, and write it to a file. The writing part is necessary because the copy is just temporary, it gets garbage collected when we lose reference to the image.

I’m not sure if that would apply to the flatten command too. Or even the format and quality directives?  If the way MiniMagick is being used, files are written/read multiple times, that would definitely be an opportunity for performance improvements, because these days touching the file system is one of the slowest things one can do. ImageMagick/GraphicsMagick/other-similar are definitely capable of doing all of these operations without interim temporary file system writes in between each, I’m not certain if Hydra::Derivatives::Processors::Image use of MiniMagick is doing so.

It’s not clear to me how to change what operations Hydra::Derivatives::Processors::Image​ does — let’s say you want to strip extra metadata for a smaller thumb as for instance Google suggests, how would you do that? I guess you’d write your own class to use as a processor_class. It could sub-class Hydra::Derivatives::Processors::Image or not (really no need for a sub-class I don’t think, what it’s doing is pretty straightforward).  How would you set your custom processor to be used?  I guess you’d have to override the line in Hydra::Derivatives::ImageDerivatives Or perhaps you should you instead provide your own class to replace Hydra::Derivatives::ImageDerivatives, and have that used instead? Which in Sufia would probably be by overriding FileSet#create_derivatives to call your custom class.   Or in Hyrax, there’s that newer Hyrax::DerivativeService stuff, perhaps you’d change your local FileSet to use a different DerivativeService, which seems at least more straightforward (alas I’m not on Hyrax). If you did this, I’m not sure if it would be recommended for you to re-use pieces of the existing architecture as components (and in what way), or just write the whole thing from scratch.

Some Brief Analysis and Decision-making

So I actually wanted to change nearly every part of the default pipeline here in our app.

Reading: I want to continue reading from fedora, being sure to stream it from fedora to local file system as a working copy.

Cleanup: I want to make sure to clean up the temporary working copy when you’re done with it, which I know in at least some cases was not being done in our out of the box code. Maybe to leave it around for future ‘actor’ steps? In our actual app, downloading from one EC2 to another on the same local AWS network is very speedy, I’d rather just be safe and clean it up even if it means it might get downloaded again.

Transformation:  I want to have different image transformation options. Stripping metadata, interlaced JPGs, setting color profiles. Maybe different parameters for images to be used as in-browser thumbs vs downloadable files. (See advice about thumb parameters from  Google’s, or vips). Maybe using a non-ImageMagick processor (we ended up with vips).

Output: I want to write to S3, because it makes sense to store assets like this there, especially but not only if you’re deploying on AWS already like we are.  Of course, you’d have to change the front-end to find the thumbs (and/or downloads) at a separate URL still, more on that later.

So, there are many parts I wanted to customize. And for nearly all of them, it was unclear to me the ‘right’/intended/best way to to customize in the current architecture. I figured, okay then, I’m just going to completely replace CreateDerivativesJob with my own implementation.

The good news is that worked out pretty fine — the only place this is coupled to the rest of sufia at all, is in sufia knowing what URLs to link to for thumbs (which I suspect many people have customized already, for instance to use an IIIF server for thumbs instead of creating them statically, as the default and my new implementation both do). So in one sense that is an architectural success!

Irony?

Sandi Metz has written about the consequences of “the wrong abstraction”, sometimes paraphrased as “the wrong abstraction is worse than no abstraction.”

hydra-derivatives, and parts of sufia/hyrax that use it, have a pretty complex cooperating object graph, with many cooperating objects and several inheritance hierarchies.  Presumably this was done intending to support flexibility, customization, and maintainability, that’s why you do such things.

Ironically, adding more cooperating objects (that is, abstractions), can paradoxically inhibit flexibility, customizability, or maintainability — if you don’t get it quite right. With more code, there’s more for developers to understand, and it can be easy to get overwhelmed and not be able to figure out the right place to intervene for a change  (especially in the absence of docs). And changes and improvements to the codebase can require changes across many different accidentally-coupled objects in concert, raising the cost of improvements, especially when crossing gem boundaries too.

If the lines between objects, and the places objects interface with each other, aren’t drawn quite right to support needed use cases, you may sometimes have to customize or override or change things in multiple places now (because you have more places) to do what seems like one thing.

Some of this may be at play in hydra_derivatives and sufia/hyrax’s use of them.  And I think some of it comes from people adding additional layers of abstraction to try to compensate for problems in the existing ones, instead of changing the existing ones (Why does one do this? For backwards compat reasons? Because they don’t understand the existing ones enough to touch them? Organizational boundaries? Quicker development?)

It would be interesting to do a survey see how often hooks in hydra_derivatives that seem to have been put there for customization have actually been used, or what people are doing instead/in addition for the customization they need.

Getting architecture right (the right abstractions) is not easy, and takes more than just good intentions. It probably takes pretty good understanding of the domain and expected developer usage scenarios; careful design of object graphs and interfaces to support those scenarios; documentation of such to guide future users and developers. Maybe ideally starting some working individual examples in local ‘bespoke’ codebases that are only then abstracted/generalized to a shared codebase (which takes time).  And with all that, some luck and skill and experience too.

The number of different cooperating objects you have involved should probably be proportional to how much thinking and research you’ve done about usage scenarios to support and how the APIs will support them — when in doubt keep it simpler and less granular.

What We Did

This article previous to here, I wrote about 5 months ago. Then I sat it on it until now… for some reason the whole thing just filled me with a sort of psychic exhaustion, can’t totally explain it. So looking back to code I wrote a while ago, I can try to give you a very brief overview of our code.

Here’s the PR, which involves quite a bit of code, as well as building on top of some existing custom local architecture.

We completely override the CreateDerivativesJob#perform method, to just call our own “service” class to create derivatives (extracted into a service object instead of being inline in the job!)– if our Env variables are configured to use our new-fangled store-things-on-s3 functionality.  Otherwise we call super — but try to clean up the temporary working files that the built-in code was leaving lying around to fill up our file system.

Our derivatives-creating service is relatively straightforward.  Creating a bunch of derivatives and storing them in S3 is not something particularly challenging.

We made it harder for ourself by trying to support derivatives stored on S3 or in local file system, based on config — partially because it’s convenient to not have to use S3 in dev and test, and partially thinking about generalizing to share with the community.

Also, there needs to be a way for front-end code to get urls to derivatives of course, and really this should be tied into the derivatives creation, something hydra-derivatives appears to lack.  And in our case, we also need to add our derivatives meant to be offered as downloads to our ‘downloads’ menu, including in our custom image viewer. So there’s a lot of code related to that, including some refactoring of our custom image viewer.

One neat thing we did is (at least when using S3, as we do in production) deliver our downloads with a content-disposition header specifying a more human-friendly filename, including the first few words of the title.

Generalizing? Upstream? Future?

I knew from the start that what I had wasn’t quite good enough to generalize for upstream or other shareable dependency.  In fact, in the months since I implemented it, it hasn’t worked out great even for me, additional use cases I had didn’t fit neatly into it, my architecture has ended up overly complex and confusing.

Abstracting/generalizing to share really requires even more care and consideration to get the right architecture, compared to having something that works well enough for your app. In part, because refactoring something only used by one app is a lot less costly than with a shared dependency.

Initially, some months ago, even knowing what I had was not quite good enough to generalize, I thought I had figured out enough and thought about enough to be able to spend more time to come up with something that would be a good generalized shareable dependency.  This would only be worth spending time on if there seemed a good chance others would want to use it of course.

I even had a break-out session at Samvera Connect to discuss it, and others who showed up agreed that the current hydra-derivatives API was really not right (including at least one who was involved in writing it originally), and that a new try was due.

And then I just… lost steam to do it.  In part overwhelmed by community things; the process of doing a samvera working group, the uncertainty of knowing whether anyone would really switch from hydra-derivatives to use a new thing, of whether it could become the thing in hyrax (with hyrax valkyrie refactor already going on, how does this effect it?), etc.

And in part, I just realized…. the basic challenge here is coming up with the right API and architecture to a) allow choice of back-end storage (S3, local file system, etc), with b) URL generation, and ideally API for both streaming bytes from the storage location and downloading the whole thing, regardless of back-end storage. This is the harder part architecturally then just actually creating the derivatives. And… nothing about this is particularly unique to the domain of digital collections/repositories, isn’t there something already existing we could just use?

My current best bet is shrine.  It already handles those basic things above with a really nice very flexible decoupled architecture.  It’s a bit more confusing to use than, say, carrierwave (or the newer built-into-Rails ActiveStorage), but that’s because it’s a more flexible decoupled-components API, which is probably worth it so we can do exactly what we want with it, build it into our own frameworks. (More flexibility is always more complexity; I think ActiveStorage currently lacks the flexibility we need for our communities use cases).   Although it works great with Rails and ActiveRecord, it doesn’t even depend on Rails or ActiveRecord (the author prefers hanami I think), so quite possibly could work with ActiveFedora too.

But then the community (maybe? probably?) seems to be… at least in part… moving away from ActiveFedora too. Could you integrate shrine, to support derivatives, with valkyrie in a back-end independent way? I’m sure you could, I have no idea how the best way would be to do so, how much work it would be, the overall cost/benefit, or still if anyone would use it if you did.

So I’m not sure I’m going to be looking at shrine myself in a valkyrie context. (Although I think the very unsuitable hydra-derivatives is the only relevant shared dependency anyone is currently using with valkyrie, and presumably what hyrax 3 will still be using, and I still think it’s not really… right).

But I am going to be looking at shrine more — I’ve already started talking to the shrine author about what I see as my (and my understanding of our communities) needs for features for derivatives (which shrine currently calls “versions”), and I think I’m going to try to do some R&D on a new shrine plugin that meets my/our needs better. I’m not sure I’ll end up wanting to try to integrate it with valkyrie and/or hyrax, or with some new approaches I’ve been thinking on and doing some R&D on, which I hope to share more about in the medium-term future.

Another round of citation features in a sufia app

I reported before on our implementation of an RIS export feature in our sufia 7.4 app.

Since then, we’ve actually nearly completely changed our implementation. Why? Well, it started with us moving on to our next goal: on-page human-readable citation. This was something our user analysis had determined portions of our audience/users wanted.

Turns out that what seemed “good enough” metadata for an RIS export (meeting or exceeding user expectations; users were used to citation exports not being that great, and having to hand-edit them themselves) seemed not at all good enough when actually placed on the page as a human-readable citation (in Chicago format).

We ended up first converting our internal metadata to citeproc-json format/schema. Then using that intermediate metadata as a source for our RIS export, as well as for conversion to human-readable citation with citeproc-ruby.  The conversion/production happens at display-time, from data in our Solr index, which required us to add some data to the Solr index that wasn’t previously there.

On metadata and citations

Turns out getting the right machine-interprable metadata for a really correct citation is pretty tricky.

It occurs to me that if citations is a serious use case, you should probably consider it when designing your metadata schema in the first place, to make sure you have everything you need in machine-readable/interprable format. (As unrealistic as this suggestion sounds for many actual projects in our sector). Otherwise can find you simply don’t have what you need for a reasonable citation.

We ended up adding a few metadata fields, including a “source” field for items in our digital collection that are excerpts from works (which are not in our collection), and need the container work identified in the citation.

In other cases, an excerpt is an independent work in our repo, but also has a ‘child’ relationship to a parent, that is it’s container for purposes of citation. But in yet other cases, there’s a work with a ‘parent’ work that is for organizational/arrangement purposes only, and is not a container for purposes of citation — but our metadata leaves the software no way to know which is which. (In this case we just treat them all like containers for purposes of citation, and tolerate the occasional not-really-correct-ness, as the “incorrect” citations still unambiguously identify the thing cited).

We also implemented a bunch of heuristics to convert various “just string” fields to parsed metadata. For instance our author (or publisher) names, while from FAST and other library vocabularies, are just in our system as plain single strings. The system doesn’t even record the original authority identifier. (I think this is typical for a sufia/hyrax app, while they use the qa gem to load terms, if the gem supplies identifiers from the original vocabulary, they aren’t recorded).

So, the name `Stayner, Heinrich, -1548` needs to be displayed in some parts of the citation (first author for instance) as Stayner, Heinrich, but in other parts (second author or publisher) as Heinrich Stayner, and in no case includes the dates in the citation, so we gotta try parsing it.  Which is harder than you’d think with all the stuff that can go into an AACR2-style name heading (question marks or the word “approximately”, or sometimes the word “active”, other idiosyncracies).  And then a corporate name like an imaginary design firm Jones, Smith, Garcia is never actually Garcia Jones, Smith or something like that.

Then there’s turning our dates from a custom schema into something that fits what a citation expects.

Our heuristics get good enough — in fact, I think our automatically-generated human readable citations end up as good or better as anything else I’ve seen automatically generated on the web, including from major publishers–but they are definitely far from perfect, and have lots of errors in many edge cases. Hopefully all errors that don’t change or confuse about the thing cited, which of course is the point.

CSL, CSL-json, and ruby-citeproc

CSL, the Citation Style Language, is a system for automatically generating human-readable citations according to XML stylesheets for various citation formats/styles.

While I believe CSL originally came out of zotero, some code has been extracted (and is open source like zotero itself), and the standard itself as an independent standard. Whether via the code or the schema/standard implemented in other and various code open source and not, it has been adopted by other software packages too (like Mendeley, which is not open source).

One part of CSL is a json format (defined with a json schema) to represent an individual “work to be cited”.  This also originally came from Zotero, and doesn’t seem to totally have a universal name yet, or a ton of documentation.  The schema in the repo is called “csl-data.json,” but I’ve also seen this format referred to as just “csl-json”, as well as “citeproc-json” (with or without the hyphens).  It also has even more adoption beyond zotero — it is one of the standard formats that CrossRef (and other DOI resolvers?) can return.  The common IANA/MIME “Content-Type” is `application/vnd.citationstyles.csl+json`, but historically another (incorrect?) form has sometimes been used, `application/citeproc+json`. Some of the names/content type(s) might confuse you into thinking this is a JSON representation of a CSL style (describing a citation format/style like “Chicago” or “MLA”), but it’s not, it’s a format of metadata about a particular “work to be cited”.  I kind of like to call it “csl-data-json” (after the schema URL) to avoid confusion.

Even apart from JSON serialization, this is a useful schema in that it separates out fields one will actually need to generate a citation (including machine-readable individual sub-elements for parts of a name or date).  It’s best available documentation, in addition to the JSON schema itself, seems to be this document written for the original Javascript implementation and not entirely applicable to generic implementations.

There is, amazingly, a ruby CSL processor in the citeproc-ruby gem.  Not only can it take input in csl-json and format it as an individual citation in a desired style, but, as a standard CSL processor, it can also format a complete bibliography and footnotes in the context of a complete document (where some citation styles call for appropriate ibid use in the context of multiple citations, etc).  I was only interested in formatting an individual citation though.

Initially, I wasn’t completely sure the citeproc-ruby gem would work out for me, for performance or other reasons. But I still decided to split processing into two steps: translating our internal metadata into a csl-json compatible format, and then formatting a human readable citation. This two step process just makes sense for manageable code, trying to avoid an unholy mess of nested if-elsifs all jumbled together. And gives you clear separation if you need to generate in multiple human-readable styles, or change your mind about what style(s) to generate. The csl-json schema is great for an intermediate format even if you are going to format as human-readable by non-CSL means, as it’s been road-tested and proven as having the right elements you need to generate a citation.

However, I did end up using citeproc-ruby in the end.  @inkshuk it’s author was amazingly helpful and giving in my questions on the GH issues. Initially it looked like there were some extreme performance problems, but using alternate citeproc-ruby API to avoid re-loading/parsing XML style documents from disk every time (with one PR by me to make this work for locale XML style docs too) avoided those.

Citeproc-ruby can’t yet handle formatting of date ranges in a citation (inkshuk has started on the first steps to an implementation in response to my filed issue).  So when I have a date range in a work-to-be-cited, I just format it myself in my own ruby code, and include it in the csl-data-json as a date “literal”.

CSL is amazing, and using a CSL processor handles all sorts of weird idiosyncratic edge cases for you. (One example, if a title already includes double-quotes, but is to be double-quoted in the citation, it changes the internal double quotes to single quotes for you. There are so many of these, that you’re not going to think of initially yourself in a custom hobbled-together unholy mess of if-elsif statement implementation).

Also, while I didn’t do it, you could hypothetically customize some of the existing styles in CSL XML if you need to for local context needs. I believe citeproc-ruby even gives you a way to override parts of an existing style in ruby code.

The particular and peculiar challenges of sufia/hyrax/samvera

There are two main, er, idiosyncracies of the sufia/hyrax/samvera architecture that provided additional challenges. One: the difficulty of efficiently determining the parent work of a work-in-hand, and (in sufia but not hyrax) the collection(s) that contain a work. Two: The split architecture between Solr index data (used at display-time), and fedora data (used at index time), and the need to write code very differently to get data in each of these sources/times.

Initially, I was worried about citeproc-ruby performance. So started out having our sufia app generate the human-readable citation at index time, and store it as text/html in the Solr index, so at display time it would just have to be retrieved and inserted on the page. Really, even if only takes 10ms to format a citation, wouldn’t it be better to not add 10ms to the page delivery time? (Granted, 10ms may be nothing to many slow sufia/hyrax apps).

However, to generate access to citations in our context, we need access to both the container collection (for archival arrangement/location when an archival item), and the parent work, for “container” for citation purposes. These are very slow to get out of fedora. (Changed/improved for fetching parent collections but not parent works in hyrax; we’re still sufia). Like, with our data and infrastructure, it was taking multiple seconds to get the answer from fedora to “what are the parent work(s) for this item-in-hand” (even trying to use the fedora API feature that seemed suited for this, whose name I now forget).  While one can accommodate more slowness at index-time than display-time, several-seconds-per-item was outside our tolerance — when re-indexing our ~20K item collection already can take many hours on an empty solr index.

So you want to get that info from the Solr index instead of fedora, but trying to access the Solr index in the indexing operation leads you to all sorts of problems when generating an initial index, with whether there’s already enough in the index to answer your question you need to index the item-in-hand. We want our indexing operation to always be usable starting from an empty index, for fault recovery purposes among others.  And even ignoring this issue, I found that the sufia ‘actor stack’ info actually led to the right info not being in the Solr index at the right time for a particular item-in-hand-to-index when changing the parent or collection membership for item(s).

Stopping myself as I got into trying to debug the actor stack yet again, I decided to switch to a pure display-time approach.  Just generate the citation on-demand, from the solr index.  At this point I already had a map-metadata-to-csl-json implementation based on doing it at index-time with info from fedora.  I had actually forgotten when I wrote that that I wasn’t leaving my options open to switch to display-time — so I had to rewrite the thing to retrieve the slightly different info in slightly different ways from the Solr index at display time using a sufia “show presenter”.

Also had to add some things to our Solr index so they could be used at display time — we were including in our solr index only the dates-of-work as strings we wanted to display to user on our pages, but the citation metadata transformer needed all our original structured metadata so it could determine how best to convert them (differently) to dates for inclusion in citation. (I stored our original data objects serialized to json, and then have the presenter “re-hydrate” them to our original ruby model objects without touching fedora).

Premature Abstraction

In our original implementation, I tried to provide a sort of generic “serialize to RIS”  base class, thinking it would make our code more readable, and potentially be of general use.

However, even originally it didn’t end up working quite as well as I’d hoped (needed custom logic more often than using the “built in” automatic mappings in the base class), and in fact this new implementation abandons it entirely. Instead, it first maps to CSL-json schema/format, and then the RIS serializer mostly just extracts the needed fields from there. (We wanted to take advantage of our improved citation data for on-screen human-readable to improve the RIS export too, of course).

No harm, no foul in our local codebase. You learn more about your requirements and you learn more about how particular architectural solutions work out, and you change your mind about implementation decisions and change them. This is a normal thing.

But if I had jumped to, say, add my “RIS Serializer base” abstraction to some shared codebase (say the hyrax gem, or even some kind of samvera-citations gem), it probably would have ended up not as generally useful as I thought at the time (it’s not even a good match for our needs/use case, it turns out!).  And it’s much harder to change your mind about an abstraction in a shared codebase, that many people may be relying upon, and can’t be changed without backwards incompatability problems. (That in a local codebase aren’t nearly as problematic, you just change all your code in your repo and commit it and you’re done, no need to worry about versioning or coordinating the work of various developers using the shared code).

It’s good to remember to be even more cautious with abstractions in shared code in general.  Ideally, abstractions in shared code (ie, a gem) should be based on a good understanding of the domain from some experience, and have been proven in one (or better more) individual app(s) over some amount of time, before being enshrined into a shared codebase. The first abstraction that seems to be working well for you in a particular codebase may not stand the test of time and diverse requirements/use cases, and “the wrong abstraction can be worse than no abstraction at all”—and the wrong abstraction can be very expensive and painful to undo in a gem/shared codebase.

Our implementation

You can see the Pull Request here.  (It’s possible there were some subsequent bug fixes postdating the PR).

We have a class called CitableAttributes, which takes a display-time ‘work show presenter’ (which as above has been customized to have access to some original component models), and formats it into data compatible with csl-data-json (retrievable via individual public accessors), as well as an actual JSON document that is csl-data-json.

Our RISSerializer uses a CitableAttributes object to extract individual metadata fields, and put them in the right place in an RIS document. It also needs it’s own logic for some things that aren’t quite the same in RIS and csl-data-json (different ‘type’ vocabulary, no ability to describe dates ranges machine-readably).  We wanted to take advantage of all the logic we had for transforming the metadata to something applicable to citations, to improve the RIS exports too.

Oh, one more interesting thing. We decided for photographs of “realia” (largely from our Museum‘s collection), it was more appropriate and useful to cite them as photographs (taken by us, dated the date of the photo), rather than try to cite “realia” itself, which most citation styles aren’t really set up to do, and some here thought was inappropriate for these objects as seen in our website anyhow. So we have some custom logic to determine when an item in our collection is such, and cite appropriately using some clever OO polymorphism. This logic now carries over to the RIS export, hooray.

And a simple Rails helper just uses a CitableAttributes to get a csl-data-json, and then feeds it to citeproc-ruby objects to convert to the human-readable Chicago-style citation we want on screen.

There are definitely still a variety of idiosyncratic edge cases it gets not quite right, from weird punctuation to semantics. But I believe it’s still actually one of the best on-screen automatically-generated human-readable citation implementations around!

Some live diverse examples:

brief intermezzo of library fanfic

Screenshot 2018-03-27 00.53.02

Answer:

Team Alpha, enter through loading dock, proceed via stairway 3 to level B, and secure Recent Arrivals Fiction. Team Omega from main entrance, vault over the turnstiles, and take Circulation. If there’s any resistance, try not to hurt anyone, but as always, the books come first — but I don’t think there will be, we have some friends on the inside.

Comrades, once we’re in, we’re holding it and not leaving. We can run this library better than those bastards ever did, and read all the authors we want — for the rest of our lives! Shortened though those lives may now be, we know they will be more fulfilling than even centuries upon centuries with only one author.

Okay, synchronize your watches, on 3, 2, 1, mark.

another round of hydra/samvera community dependency analysis

It’s time for another round of running my tool to see what community dependencies and versions  Samvera community apps are using.  (Last done in August 2017).

This time I’m adding any samvera community apps I can find, not limited to sufia/hyrax or even valkyrie.  Now 43 apps total analyzed, significant increase over the 28 we had before, so numbers between the two reports are not directly comparable.

Still, the majority of apps analyzed use Sufia, Hyrax, or Valkyrie. Of the 43 apps analyzed, 17 (40%) use sufia, 11 (26%) use hyrax, and 2 (5%) use valkyrie.  (Might have one or two blacklight-only apps that snuck into the corpus too).

Dates of Last Commit

As before, just because a public repo exists doesn’t necessarily mean it’s in production. It could be an old version no longer in production, or an experiment that never went anywhere or was meant for production, or an in-progress intended to eventually be in production. While my “research question” is really about apps actually in production (or perhaps in progress to get there), I don’t know of any good way to limit to this set without lots and lots of out-of-band research.

But to provide a bit more context, I’ve added a feature to summarize the last time an app in a given dependency-version-use category was updated.  Just because an app hasn’t been updated in years doesn’t necessarily mean it’s not in production — some people (for better or worse) may have apps in production they haven’t touched in years. But an app that has been touched recently we at least know is “current”, whether in production, in-development with a production goal, or an experiment.

It’s only giving summary statistics right now, but we can see that there are definitely apps that received commits in 2018 which are still using old dependencies, including:

  • Sufia 6.6
  • hydra-editor 1.x (2.0 was released two years ago)
  • hydra-head/hydra-core 6.4 (latest release 10.5, a 6.x release last made in 2014)
  • active-fedora 6.7.x, 7.0.x, and 9.11.x (at least 3 apps, latest active-fedora is 11.x)

There are definitely apps out there currently being developed and using pretty old dependencies (not a surprise) , but I’m not sure how many apps this is total, and this makes me curious to learn more about the apps.

I could write more sophisticated aggregate analysis, but this isn’t the first time I’ve kind of wanted to see the list of apps using, say, active-fedora 7.x, so I could go investigate them and learn more about them — what are they, what other dependencies do they have, etc?

But for now, my tool still reports only aggregate info, never listing specific repo URLs (not even to me).  I don’t want anyone to feel individually shamed for their old dependencies, so I’m avoiding any non-aggregate data for now. I may eventually add it though when I really want to learn more in a way it would make easier.

Major Version Bumps

I’m really curious about how often community apps upgrade to a new major version of dependencies like Sufia, ActiveFedora, RSolr, Blacklight, or even Rails.  Of the apps using, say, Sufia 7.x, how many were created with Sufia 7.x initially, and how many were created with a 6.x or previous version and then upgraded?

I started on tooling to answer this, which we can do by fetching every single commit that touched a Gemfile.lock and analyzing them, but it requires an awful lot of requests to Github api and some analysis code. I haven’t gotten the tool to the point it can answer exactly my questions yet, but I do have a raw count of how many apps have in their history at least one major-version upgrade of an “interesting” gem.

Number of apps that did a major version bump of the listed dependency at least once:

active-fedora: 20
hydra-head: 15
hydra-access-controls: 15
solrizer: 4
blacklight: 11
blacklight_advanced_search: 7
sufia: 8
hydra-core: 15
hydra-batch-edit: 6
rails: 15
hydra-derivatives: 6
hydra-editor: 13
active_fedora-noid: 14
curation_concerns: 8
rsolr: 6
active-triples: 9
blacklight_range_limit: 2
qa: 9
hyrax: 5
hydra-role-management: 5
riiif: 5
linkeddata: 2
pul_uv_rails: 1

Doesn’t tell us a huge amount, but tells us a little.

20 of the 42 apps that use active-fedora updated it least once — 25 of the 42 apps that use a-f are on 11.x, so I’d suspect the 20 upgraders come largely from within these ranks.

About half, 8 of the 17 sufia-using apps have done a major version bump at least once. Only 7 sufia-using apps are on the latest/last 7.x; I don’t have analysis of the cross-over, but we know at least one app has done a sufia major version bump in it’s history, but still hasn’t made it to 7.x. (Of course, others could have gone on to hyrax).  (Exploring this kind of thing is what tempts me to reveal the actual repo ids/urls, to make it easier to manually explore ).

And, 15 of the apps have done a Rails major version bump. All 43 apps analyzed use Rails. This is actually a bit smaller then I might have guessed. I suspect many of the apps not upgraded are apps that were created on Rails 4.x and remain there. Rails 4.2 (33% of analyzed apps) is still receiving patches for “major security updates” (but not “minor security” or other bugs); I think this will remain true even after Rails 5.2 is  released, up until Rails 6.0 is released.  26% of apps analyzed are on rails 4.1 or earlier, which does not even receive updates for major security vulnerabilities.  46% of apps are on Rails 5.x, which appears to be up from the August analysis, although since we increased our corpus they aren’t directly comparable.

Now vs. August

The corpus is different so we can’t compare directly (we added more apps, which may have dependencies that aren’t like the ones we had before), but we can still do a bit of comparison careful to remember limitations.

Sufia versions remain dispersed. In both sets, Sufia-using apps are about split between 1/3rd 7.x, 1/3rd 6.x, and 1/3rd earlier.

active-fedora use is still fairly dispersed, but the number of apps using the most recent 11.x has gone up to 60% from 48%. Because of the different corpii, that isn’t directly comparable, but it seems like a good sign. Still plenty of apps using earlier active-fedoras of course, including a substantial number using 7.x and earlier. Zero apps under analysis use the latest active-fedora 12.0.x.

The ldp gem, used by 74% of apps analyzed, still has a latest release of 0.7.0, no 1.0 release.

I don’t seem to have included rsolr in the August analysis for some reason, but have here.  Rsolr usage is still predominantly (72%) 1.x, rather than 2.x (2.0.0 was released in May 2017). I think sufia may not be compatible with rsolr 2.x.

Amongst the corpus, there are now 11 apps using hyrax, and 17 using sufia. In August’s analysis, we had 8 apps using hyrax and 17 using sufia. So there doesn’t appear to have been anything like a massive migration to hyrax  from sufia apps in the past 6 months.

The full results

Still in ugly ascii format, getting perhaps hard to interpret with so much data. What we really need is some fancy visualizations (with various cross-tabs), but not sure when/if I’ll get there. I did try to make the output more clear about some things I think were misleading/confusing some people before.

57 total input URLs, 43 with fetchable Gemfile.lock
total apps analyzed: 43
with dependencies on non-release (git or path) gem versions: 22
  with git checkouts: 22
  with local path deps: 0
Date of report: 2018-03-06 17:09:56 -0500


Repos analyzed:

https://github.com/psu-stewardship/scholarsphere
https://github.com/VTUL/data-repo
https://github.com/gwu-libraries/gw-sufia
https://github.com/gwu-libraries/scholarspace
https://github.com/duke-libraries/course-assets
https://github.com/ualbertalib/HydraNorth
https://github.com/ualbertalib/Hydranorth2
https://github.com/aic-collections/aicdams-lakeshore
https://github.com/osulp/Scholars-Archive
https://github.com/durham-university/collections
https://github.com/OregonShakespeareFestival/osf_digital_archives
https://github.com/cul/ac3_sufia
https://github.com/galterlibrary/digital-repository
https://github.com/sciencehistory/chf-sufia
https://github.com/vecnet/vecnet-dl
https://github.com/vecnet/dl-discovery
https://github.com/osulibraries/dc
https://github.com/uclibs/scholar_uc
https://github.com/uvalib/Libra2
https://github.com/samvera-labs/hyku
https://github.com/pulibrary/plum
https://github.com/curationexperts/laevigata
https://github.com/csuscholarworks/bravado
https://github.com/UVicLibrary/Vault
https://github.com/mlibrary/heliotrope
https://github.com/pulibrary/figgy
https://github.com/psu-libraries/cho
https://github.com/OregonDigital/oregondigital
https://github.com/uohull/archivesphere
https://github.com/ndlib/curax
https://github.com/nulib/donut
https://github.com/WGBH/hydradam2-app
https://github.com/KelvinSmithLibrary/absolute
https://github.com/TuftsUniversity/tdl_on_hyrax
https://github.com/TuftsUniversity/tdl_f4
https://github.com/TuftsUniversity/tdl
https://github.com/TuftsUniversity/tufts-image-library
https://github.com/TuftsUniversity/mira_ng
https://github.com/digital-york/dlibingest
https://github.com/wulib-wustl-edu/avalon
https://github.com/Digital-Repository-of-Ireland/dri-app
https://github.com/ucsblibrary/alexandria
https://github.com/avalonmediasystem/avalon


Gems analyzed:

rails
hyrax
sufia
valkyrie
curation_concerns
qa
hydra-editor
hydra-head
hydra-core
hydra-works
hydra-derivatives
hydra-file_characterization
hydra-pcdm
hydra-role-management
hydra-batch-edit
browse-everything
solrizer
blacklight-access_controls
hydra-access-controls
blacklight
blacklight-gallery
blacklight_range_limit
blacklight_advanced_search
active-fedora
active_fedora-noid
active-triples
ldp
linkeddata
riiif
iiif_manifest
pul_uv_rails
mirador_rails
osullivan
bixby
orcid
rsolr



rails:
  apps without dependency: 0
  apps with dependency: 43 (100%)
  latest release: 5.2.0.rc1 (2018-01-30)

  git checkouts: 0
  local path dep: 0

  3.x: 2 (5%)
    first 3.x release: 2010-08-29 (3.0.0)
    latest app commits: min=Mar-2016 median=Mar-2017 max=Mar-2018
    3.2.x: 2 (5%)
      first 3.2.x release: 2012-01-20 (3.2.0)
      latest app commits: min=Mar-2016 median=Mar-2017 max=Mar-2018

  4.x: 21 (49%)
    first 4.x release: 2013-06-25 (4.0.0)
    latest app commits: min=Feb-2014 median=Aug-2017 max=Mar-2018
    4.0.x: 4 (9%)
      first 4.0.x release: 2013-06-25 (4.0.0)
      latest app commits: min=Feb-2014 median=Apr-2015 max=Oct-2015
    4.1.x: 3 (7%)
      first 4.1.x release: 2014-04-08 (4.1.0)
      latest app commits: min=Jan-2015 median=Jan-2016 max=Aug-2017
    4.2.x: 14 (33%)
      first 4.2.x release: 2014-12-20 (4.2.0)
      latest app commits: min=Mar-2015 median=Dec-2017 max=Mar-2018

  5.x: 20 (47%)
    first 5.x release: 2016-06-30 (5.0.0)
    latest app commits: min=Feb-2017 median=Feb-2018 max=Mar-2018
    5.0.x: 11 (26%)
      first 5.0.x release: 2016-06-30 (5.0.0)
      latest app commits: min=Feb-2017 median=Feb-2018 max=Mar-2018
    5.1.x: 9 (21%)
      first 5.1.x release: 2017-04-27 (5.1.0)
      latest app commits: min=Jul-2017 median=Mar-2018 max=Mar-2018



hyrax:
  apps without dependency: 32 (74%)
  apps with dependency: 11 (26%)
  latest release: 2.1.0.beta1 (2018-02-28)

  git checkouts: 4 (36%)
  local path dep: 0

  1.x: 3 (27%)
    first 1.x release: 2017-05-24 (1.0.1)
    latest app commits: min=Dec-2017 median=Feb-2018 max=Mar-2018
    1.0.x: 3 (27%)
      first 1.0.x release: 2017-05-24 (1.0.1)
      latest app commits: min=Dec-2017 median=Feb-2018 max=Mar-2018

  2.x: 8 (73%)
    first 2.x release: 2017-11-09 (2.0.0)
    latest app commits: min=Jul-2017 median=Feb-2018 max=Mar-2018
    2.0.x: 6 (55%)
      first 2.0.x release: 2017-11-09 (2.0.0)
      latest app commits: min=Jul-2017 median=Feb-2018 max=Mar-2018
    2.1.x: 2 (18%)
      first 2.1.x release: 2018-02-28 (2.1.0.beta1)
      latest app commits: min=Jan-2018 median=Feb-2018 max=Mar-2018



sufia:
  apps without dependency: 26 (60%)
  apps with dependency: 17 (40%)
  latest release: 7.4.1 (2017-10-10)

  git checkouts: 6 (35%)
  local path dep: 0

  0.x: 1 (6%)
    first 0.x release: 2012-11-15 (0.0.1.pre1)
    latest app commits: Mar-2016
    0.1.x: 1 (6%)
      first 0.1.x release: 2013-02-04 (0.1.0)
      latest app commits: Mar-2016

  3.x: 2 (12%)
    first 3.x release: 2013-07-22 (3.0.0)
    latest app commits: min=Feb-2014 median=Jul-2014 max=Nov-2014
    3.5.x: 1 (6%)
      first 3.5.x release: 2013-12-05 (3.5.0)
      latest app commits: Feb-2014
    3.7.x: 1 (6%)
      first 3.7.x release: 2014-02-07 (3.7.0)
      latest app commits: Nov-2014

  4.x: 2 (12%)
    first 4.x release: 2014-08-21 (4.0.0)
    latest app commits: min=Jan-2015 median=Jun-2015 max=Oct-2015
    4.1.x: 1 (6%)
      first 4.1.x release: 2014-10-31 (4.1.0)
      latest app commits: Jan-2015
    4.2.x: 1 (6%)
      first 4.2.x release: 2014-11-25 (4.2.0)
      latest app commits: Oct-2015

  6.x: 5 (29%)
    first 6.x release: 2015-03-27 (6.0.0)
    latest app commits: min=Mar-2015 median=Aug-2017 max=Feb-2018
    6.0.x: 1 (6%)
      first 6.0.x release: 2015-03-27 (6.0.0)
      latest app commits: Mar-2015
    6.2.x: 1 (6%)
      first 6.2.x release: 2015-07-09 (6.2.0)
      latest app commits: Dec-2017
    6.3.x: 1 (6%)
      first 6.3.x release: 2015-08-12 (6.3.0)
      latest app commits: Sep-2016
    6.6.x: 2 (12%)
      first 6.6.x release: 2016-01-28 (6.6.0)
      latest app commits: min=Aug-2017 median=Nov-2017 max=Feb-2018

  7.x: 7 (41%)
    first 7.x release: 2016-08-01 (7.0.0)
    latest app commits: min=Feb-2017 median=Feb-2018 max=Mar-2018
    7.0.x: 1 (6%)
      first 7.0.x release: 2016-08-01 (7.0.0)
      latest app commits: Feb-2017
    7.2.x: 3 (18%)
      first 7.2.x release: 2016-10-01 (7.2.0)
      latest app commits: min=Feb-2017 median=May-2017 max=Mar-2018
    7.4.x: 3 (18%)
      first 7.4.x release: 2017-09-07 (7.4.0)
      latest app commits: min=Feb-2018 median=Feb-2018 max=Mar-2018



valkyrie:
  apps without dependency: 41 (95%)
  apps with dependency: 2 (5%)
  latest release: 1.0.0.rc1 (2018-03-02)

  git checkouts: 2 (100%)
  local path dep: 0

  0.x: 2 (100%)
    first 0.x release: 2017-07-06 (0.0.0)
    latest app commits: min=Mar-2018 median=Mar-2018 max=Mar-2018
    0.1.x: 2 (100%)
      first 0.1.x release: 2017-09-26 (0.1.0)
      latest app commits: min=Mar-2018 median=Mar-2018 max=Mar-2018



curation_concerns:
  apps without dependency: 31 (72%)
  apps with dependency: 12 (28%)
  latest release: 2.0.0 (2017-04-20)

  git checkouts: 1 (8%)
  local path dep: 0

  1.x: 12 (100%)
    first 1.x release: 2016-06-22 (1.0.0)
    latest app commits: min=Feb-2017 median=Jan-2018 max=Mar-2018
    1.3.x: 1 (8%)
      first 1.3.x release: 2016-08-03 (1.3.0)
      latest app commits: Feb-2017
    1.6.x: 3 (25%)
      first 1.6.x release: 2016-09-14 (1.6.0)
      latest app commits: min=Feb-2017 median=May-2017 max=Mar-2018
    1.7.x: 8 (67%)
      first 1.7.x release: 2016-12-09 (1.7.0)
      latest app commits: min=Feb-2017 median=Feb-2018 max=Mar-2018



qa:
  apps without dependency: 16 (37%)
  apps with dependency: 27 (63%)
  latest release: 2.0.1 (2018-02-22)

  git checkouts: 1 (4%)
  local path dep: 0

  0.x: 15 (56%)
    first 0.x release: 2013-10-04 (0.0.1)
    latest app commits: min=Nov-2014 median=Jan-2018 max=Mar-2018
    0.0.x: 1 (4%)
      first 0.0.x release: 2013-10-04 (0.0.1)
      latest app commits: Feb-2018
    0.3.x: 1 (4%)
      first 0.3.x release: 2014-06-20 (0.3.0)
      latest app commits: Nov-2014
    0.5.x: 1 (4%)
      first 0.5.x release: 2015-04-17 (0.5.0)
      latest app commits: Aug-2017
    0.8.x: 1 (4%)
      first 0.8.x release: 2016-07-07 (0.8.0)
      latest app commits: Feb-2017
    0.10.x: 1 (4%)
      first 0.10.x release: 2016-08-16 (0.10.0)
      latest app commits: Mar-2018
    0.11.x: 10 (37%)
      first 0.11.x release: 2017-01-04 (0.11.0)
      latest app commits: min=Feb-2017 median=Jan-2018 max=Mar-2018

  1.x: 5 (19%)
    first 1.x release: 2017-03-22 (1.0.0)
    latest app commits: min=Jul-2017 median=Feb-2018 max=Mar-2018
    1.2.x: 5 (19%)
      first 1.2.x release: 2017-06-23 (1.2.0)
      latest app commits: min=Jul-2017 median=Feb-2018 max=Mar-2018

  2.x: 7 (26%)
    first 2.x release: 2017-10-05 (2.0.0)
    latest app commits: min=Dec-2017 median=Mar-2018 max=Mar-2018
    2.0.x: 7 (26%)
      first 2.0.x release: 2017-10-05 (2.0.0)
      latest app commits: min=Dec-2017 median=Mar-2018 max=Mar-2018



hydra-editor:
  apps without dependency: 11 (26%)
  apps with dependency: 32 (74%)
  latest release: 3.4.0.beta (2018-03-05)

  git checkouts: 2 (6%)
  local path dep: 0

  0.x: 3 (9%)
    first 0.x release: 2013-06-13 (0.0.1)
    latest app commits: min=Jan-2015 median=Oct-2015 max=Aug-2017
    0.5.x: 3 (9%)
      first 0.5.x release: 2014-08-27 (0.5.0)
      latest app commits: min=Jan-2015 median=Oct-2015 max=Aug-2017

  1.x: 5 (16%)
    first 1.x release: 2015-01-30 (1.0.0)
    latest app commits: min=Mar-2015 median=Aug-2017 max=Feb-2018
    1.0.x: 3 (9%)
      first 1.0.x release: 2015-01-30 (1.0.0)
      latest app commits: min=Mar-2015 median=Sep-2016 max=Dec-2017
    1.2.x: 2 (6%)
      first 1.2.x release: 2016-01-21 (1.2.0)
      latest app commits: min=Aug-2017 median=Nov-2017 max=Feb-2018

  2.x: 1 (3%)
    first 2.x release: 2016-04-28 (2.0.0)
    latest app commits: Feb-2017
    2.0.x: 1 (3%)
      first 2.0.x release: 2016-04-28 (2.0.0)
      latest app commits: Feb-2017

  3.x: 23 (72%)
    first 3.x release: 2016-08-09 (3.1.0)
    latest app commits: min=Feb-2017 median=Feb-2018 max=Mar-2018
    3.1.x: 7 (22%)
      first 3.1.x release: 2016-08-09 (3.1.0)
      latest app commits: min=Feb-2017 median=Feb-2018 max=Mar-2018
    3.2.x: 1 (3%)
      first 3.2.x release: 2017-04-13 (3.2.0)
      latest app commits: Apr-2017
    3.3.x: 15 (47%)
      first 3.3.x release: 2017-05-04 (3.3.1)
      latest app commits: min=Jul-2017 median=Feb-2018 max=Mar-2018



hydra-head:
  apps without dependency: 2 (5%)
  apps with dependency: 41 (95%)
  latest release: 11.0.0.rc1 (2018-01-17)

  git checkouts: 2 (5%)
  local path dep: 0

  5.x: 1 (2%)
    first 5.x release: 2012-12-11 (5.0.0)
    latest app commits: Mar-2016
    5.4.x: 1 (2%)
      first 5.4.x release: 2013-02-06 (5.4.0)
      latest app commits: Mar-2016

  6.x: 4 (10%)
    first 6.x release: 2013-03-28 (6.0.0)
    latest app commits: min=Feb-2014 median=Jul-2016 max=Mar-2018
    6.4.x: 3 (7%)
      first 6.4.x release: 2013-10-17 (6.4.0)
      latest app commits: min=Feb-2014 median=Feb-2018 max=Mar-2018
    6.5.x: 1 (2%)
      first 6.5.x release: 2014-02-18 (6.5.0)
      latest app commits: Nov-2014

  7.x: 4 (10%)
    first 7.x release: 2014-03-31 (7.0.0)
    latest app commits: min=Jan-2015 median=Dec-2015 max=Aug-2017
    7.2.x: 4 (10%)
      first 7.2.x release: 2014-07-18 (7.2.0)
      latest app commits: min=Jan-2015 median=Dec-2015 max=Aug-2017

  8.x: 1 (2%)
    first 8.x release: 2015-02-26 (8.0.0)
    latest app commits: Sep-2015
    8.1.x: 1 (2%)
      first 8.1.x release: 2015-03-27 (8.1.0)
      latest app commits: Sep-2015

  9.x: 6 (15%)
    first 9.x release: 2015-01-30 (9.0.1)
    latest app commits: min=Mar-2015 median=Oct-2017 max=Mar-2018
    9.1.x: 1 (2%)
      first 9.1.x release: 2015-03-06 (9.1.0)
      latest app commits: Mar-2015
    9.2.x: 2 (5%)
      first 9.2.x release: 2015-07-08 (9.2.0)
      latest app commits: min=Sep-2016 median=Apr-2017 max=Dec-2017
    9.5.x: 2 (5%)
      first 9.5.x release: 2015-11-11 (9.5.0)
      latest app commits: min=Aug-2017 median=Nov-2017 max=Feb-2018
    9.10.x: 1 (2%)
      first 9.10.x release: 2016-04-19 (9.10.0)
      latest app commits: Mar-2018

  10.x: 25 (61%)
    first 10.x release: 2016-06-08 (10.0.0)
    latest app commits: min=Feb-2017 median=Jan-2018 max=Mar-2018
    10.0.x: 1 (2%)
      first 10.0.x release: 2016-06-08 (10.0.0)
      latest app commits: Feb-2017
    10.3.x: 2 (5%)
      first 10.3.x release: 2016-09-02 (10.3.0)
      latest app commits: min=Jan-2018 median=Feb-2018 max=Mar-2018
    10.4.x: 5 (12%)
      first 10.4.x release: 2017-01-25 (10.4.0)
      latest app commits: min=Feb-2017 median=Apr-2017 max=Mar-2018
    10.5.x: 17 (41%)
      first 10.5.x release: 2017-06-09 (10.5.0)
      latest app commits: min=Jul-2017 median=Feb-2018 max=Mar-2018



hydra-core:
  apps without dependency: 2 (5%)
  apps with dependency: 41 (95%)
  latest release: 11.0.0.rc1 (2018-01-17)

  git checkouts: 2 (5%)
  local path dep: 0

  5.x: 1 (2%)
    first 5.x release: 2012-12-11 (5.0.0)
    latest app commits: Mar-2016
    5.4.x: 1 (2%)
      first 5.4.x release: 2013-02-06 (5.4.0)
      latest app commits: Mar-2016

  6.x: 4 (10%)
    first 6.x release: 2013-03-28 (6.0.0)
    latest app commits: min=Feb-2014 median=Jul-2016 max=Mar-2018
    6.4.x: 3 (7%)
      first 6.4.x release: 2013-10-17 (6.4.0)
      latest app commits: min=Feb-2014 median=Feb-2018 max=Mar-2018
    6.5.x: 1 (2%)
      first 6.5.x release: 2014-02-18 (6.5.0)
      latest app commits: Nov-2014

  7.x: 4 (10%)
    first 7.x release: 2014-03-31 (7.0.0)
    latest app commits: min=Jan-2015 median=Dec-2015 max=Aug-2017
    7.2.x: 4 (10%)
      first 7.2.x release: 2014-07-18 (7.2.0)
      latest app commits: min=Jan-2015 median=Dec-2015 max=Aug-2017

  8.x: 1 (2%)
    first 8.x release: 2015-02-26 (8.0.0)
    latest app commits: Sep-2015
    8.1.x: 1 (2%)
      first 8.1.x release: 2015-03-27 (8.1.0)
      latest app commits: Sep-2015

  9.x: 6 (15%)
    first 9.x release: 2015-01-30 (9.0.0)
    latest app commits: min=Mar-2015 median=Oct-2017 max=Mar-2018
    9.1.x: 1 (2%)
      first 9.1.x release: 2015-03-06 (9.1.0)
      latest app commits: Mar-2015
    9.2.x: 2 (5%)
      first 9.2.x release: 2015-07-08 (9.2.0)
      latest app commits: min=Sep-2016 median=Apr-2017 max=Dec-2017
    9.5.x: 2 (5%)
      first 9.5.x release: 2015-11-11 (9.5.0)
      latest app commits: min=Aug-2017 median=Nov-2017 max=Feb-2018
    9.10.x: 1 (2%)
      first 9.10.x release: 2016-04-19 (9.10.0)
      latest app commits: Mar-2018

  10.x: 25 (61%)
    first 10.x release: 2016-06-08 (10.0.0)
    latest app commits: min=Feb-2017 median=Jan-2018 max=Mar-2018
    10.0.x: 1 (2%)
      first 10.0.x release: 2016-06-08 (10.0.0)
      latest app commits: Feb-2017
    10.3.x: 2 (5%)
      first 10.3.x release: 2016-09-02 (10.3.0)
      latest app commits: min=Jan-2018 median=Feb-2018 max=Mar-2018
    10.4.x: 5 (12%)
      first 10.4.x release: 2017-01-25 (10.4.0)
      latest app commits: min=Feb-2017 median=Apr-2017 max=Mar-2018
    10.5.x: 17 (41%)
      first 10.5.x release: 2017-06-09 (10.5.0)
      latest app commits: min=Jul-2017 median=Feb-2018 max=Mar-2018



hydra-works:
  apps without dependency: 20 (47%)
  apps with dependency: 23 (53%)
  latest release: 0.17.0 (2018-02-15)

  git checkouts: 0
  local path dep: 0

  0.x: 23 (100%)
    first 0.x release: 2015-06-05 (0.0.1)
    latest app commits: min=Feb-2017 median=Jan-2018 max=Mar-2018
    0.12.x: 1 (4%)
      first 0.12.x release: 2016-05-24 (0.12.0)
      latest app commits: Feb-2017
    0.14.x: 1 (4%)
      first 0.14.x release: 2016-09-06 (0.14.0)
      latest app commits: Mar-2018
    0.15.x: 2 (9%)
      first 0.15.x release: 2016-11-30 (0.15.0)
      latest app commits: min=Feb-2017 median=Feb-2017 max=Feb-2017
    0.16.x: 16 (70%)
      first 0.16.x release: 2017-03-02 (0.16.0)
      latest app commits: min=Apr-2017 median=Jan-2018 max=Mar-2018
    0.17.x: 3 (13%)
      first 0.17.x release: 2018-02-15 (0.17.0)
      latest app commits: min=Mar-2018 median=Mar-2018 max=Mar-2018



hydra-derivatives:
  apps without dependency: 6 (14%)
  apps with dependency: 37 (86%)
  latest release: 3.4.1 (2018-01-25)

  git checkouts: 2 (5%)
  local path dep: 0

  0.x: 6 (16%)
    first 0.x release: 2013-07-23 (0.0.1)
    latest app commits: min=Feb-2014 median=Jun-2015 max=Feb-2018
    0.0.x: 2 (5%)
      first 0.0.x release: 2013-07-23 (0.0.1)
      latest app commits: min=Feb-2014 median=Jul-2014 max=Nov-2014
    0.1.x: 4 (11%)
      first 0.1.x release: 2014-05-10 (0.1.0)
      latest app commits: min=Jan-2015 median=Dec-2015 max=Feb-2018

  1.x: 6 (16%)
    first 1.x release: 2015-01-30 (1.0.0)
    latest app commits: min=Mar-2015 median=Oct-2017 max=Mar-2018
    1.0.x: 1 (3%)
      first 1.0.x release: 2015-01-30 (1.0.0)
      latest app commits: Mar-2015
    1.1.x: 2 (5%)
      first 1.1.x release: 2015-03-27 (1.1.0)
      latest app commits: min=Sep-2016 median=Apr-2017 max=Dec-2017
    1.2.x: 3 (8%)
      first 1.2.x release: 2016-05-18 (1.2.0)
      latest app commits: min=Aug-2017 median=Feb-2018 max=Mar-2018

  3.x: 25 (68%)
    first 3.x release: 2015-10-07 (3.0.0)
    latest app commits: min=Feb-2017 median=Feb-2018 max=Mar-2018
    3.1.x: 2 (5%)
      first 3.1.x release: 2016-05-10 (3.1.0)
      latest app commits: min=Feb-2017 median=Jul-2017 max=Dec-2017
    3.2.x: 8 (22%)
      first 3.2.x release: 2016-11-17 (3.2.0)
      latest app commits: min=Feb-2017 median=Sep-2017 max=Mar-2018
    3.3.x: 10 (27%)
      first 3.3.x release: 2017-06-15 (3.3.0)
      latest app commits: min=Jul-2017 median=Feb-2018 max=Mar-2018
    3.4.x: 5 (14%)
      first 3.4.x release: 2018-01-11 (3.4.0)
      latest app commits: min=Jan-2018 median=Mar-2018 max=Mar-2018



hydra-file_characterization:
  apps without dependency: 8 (19%)
  apps with dependency: 35 (81%)
  latest release: 0.3.3 (2015-10-15)

  git checkouts: 0
  local path dep: 0

  0.x: 35 (100%)
    first 0.x release: 2013-09-17 (0.0.1)
    latest app commits: min=Feb-2014 median=Jan-2018 max=Mar-2018
    0.3.x: 35 (100%)
      first 0.3.x release: 2013-10-24 (0.3.0)
      latest app commits: min=Feb-2014 median=Jan-2018 max=Mar-2018



hydra-pcdm:
  apps without dependency: 20 (47%)
  apps with dependency: 23 (53%)
  latest release: 0.11.0 (2018-01-11)

  git checkouts: 0
  local path dep: 0

  0.x: 23 (100%)
    first 0.x release: 2015-06-05 (0.0.1)
    latest app commits: min=Feb-2017 median=Jan-2018 max=Mar-2018
    0.8.x: 1 (4%)
      first 0.8.x release: 2016-05-12 (0.8.0)
      latest app commits: Feb-2017
    0.9.x: 9 (39%)
      first 0.9.x release: 2016-08-31 (0.9.0)
      latest app commits: min=Feb-2017 median=Jul-2017 max=Mar-2018
    0.10.x: 8 (35%)
      first 0.10.x release: 2017-09-06 (0.10.0)
      latest app commits: min=Dec-2017 median=Feb-2018 max=Mar-2018
    0.11.x: 5 (22%)
      first 0.11.x release: 2018-01-11 (0.11.0)
      latest app commits: min=Jan-2018 median=Mar-2018 max=Mar-2018



hydra-role-management:
  apps without dependency: 24 (56%)
  apps with dependency: 19 (44%)
  latest release: 1.0.0 (2017-11-02)

  git checkouts: 0
  local path dep: 0

  0.x: 14 (74%)
    first 0.x release: 2013-04-18 (0.0.1)
    latest app commits: min=Jan-2016 median=Dec-2017 max=Mar-2018
    0.1.x: 3 (16%)
      first 0.1.x release: 2013-09-24 (0.1.0)
      latest app commits: min=Jan-2016 median=Feb-2018 max=Mar-2018
    0.2.x: 11 (58%)
      first 0.2.x release: 2014-06-25 (0.2.0)
      latest app commits: min=Sep-2016 median=Dec-2017 max=Mar-2018

  1.x: 5 (26%)
    first 1.x release: 2017-11-02 (1.0.0)
    latest app commits: min=Feb-2018 median=Mar-2018 max=Mar-2018
    1.0.x: 5 (26%)
      first 1.0.x release: 2017-11-02 (1.0.0)
      latest app commits: min=Feb-2018 median=Mar-2018 max=Mar-2018



hydra-batch-edit:
  apps without dependency: 26 (60%)
  apps with dependency: 17 (40%)
  latest release: 2.1.0 (2016-08-17)

  git checkouts: 0
  local path dep: 0

  0.x: 1 (6%)
    first 0.x release: 2012-06-15 (0.0.1)
    latest app commits: Mar-2016
    0.1.x: 1 (6%)
      first 0.1.x release: 2012-12-21 (0.1.0)
      latest app commits: Mar-2016

  1.x: 9 (53%)
    first 1.x release: 2013-05-10 (1.0.0)
    latest app commits: min=Feb-2014 median=Oct-2015 max=Feb-2018
    1.1.x: 9 (53%)
      first 1.1.x release: 2013-10-01 (1.1.0)
      latest app commits: min=Feb-2014 median=Oct-2015 max=Feb-2018

  2.x: 7 (41%)
    first 2.x release: 2016-04-20 (2.0.2)
    latest app commits: min=Feb-2017 median=Feb-2018 max=Mar-2018
    2.0.x: 1 (6%)
      first 2.0.x release: 2016-04-20 (2.0.2)
      latest app commits: Feb-2017
    2.1.x: 6 (35%)
      first 2.1.x release: 2016-08-17 (2.1.0)
      latest app commits: min=Feb-2017 median=Feb-2018 max=Mar-2018



browse-everything:
  apps without dependency: 9 (21%)
  apps with dependency: 34 (79%)
  latest release: 0.15.1 (2017-12-09)

  git checkouts: 3 (9%)
  local path dep: 0

  0.x: 34 (100%)
    first 0.x release: 2013-09-24 (0.1.0)
    latest app commits: min=Jan-2015 median=Jan-2018 max=Mar-2018
    0.6.x: 2 (6%)
      first 0.6.x release: 2014-07-31 (0.6.0)
      latest app commits: min=Jan-2015 median=May-2015 max=Sep-2015
    0.7.x: 1 (3%)
      first 0.7.x release: 2014-12-10 (0.7.0)
      latest app commits: Oct-2015
    0.8.x: 3 (9%)
      first 0.8.x release: 2015-02-27 (0.8.0)
      latest app commits: min=Mar-2015 median=Sep-2016 max=Dec-2017
    0.10.x: 4 (12%)
      first 0.10.x release: 2016-04-04 (0.10.0)
      latest app commits: min=Feb-2017 median=Nov-2017 max=Mar-2018
    0.11.x: 2 (6%)
      first 0.11.x release: 2016-12-31 (0.11.0)
      latest app commits: min=Feb-2017 median=Feb-2017 max=Feb-2017
    0.12.x: 1 (3%)
      first 0.12.x release: 2017-03-01 (0.12.0)
      latest app commits: Apr-2017
    0.13.x: 3 (9%)
      first 0.13.x release: 2017-04-30 (0.13.0)
      latest app commits: min=May-2017 median=Jul-2017 max=Jan-2018
    0.14.x: 6 (18%)
      first 0.14.x release: 2017-07-07 (0.14.0)
      latest app commits: min=Dec-2017 median=Feb-2018 max=Mar-2018
    0.15.x: 12 (35%)
      first 0.15.x release: 2017-10-11 (0.15.0)
      latest app commits: min=Jan-2018 median=Mar-2018 max=Mar-2018



solrizer:
  apps without dependency: 1 (2%)
  apps with dependency: 42 (98%)
  latest release: 4.1.0 (2017-11-07)

  git checkouts: 2 (5%)
  local path dep: 0

  2.x: 1 (2%)
    first 2.x release: 2012-11-30 (2.0.0)
    latest app commits: Mar-2016
    2.1.x: 1 (2%)
      first 2.1.x release: 2013-01-18 (2.1.0)
      latest app commits: Mar-2016

  3.x: 39 (93%)
    first 3.x release: 2013-03-28 (3.0.0)
    latest app commits: min=Feb-2014 median=Jan-2018 max=Mar-2018
    3.1.x: 4 (10%)
      first 3.1.x release: 2013-05-03 (3.1.0)
      latest app commits: min=Feb-2014 median=Jul-2016 max=Mar-2018
    3.3.x: 7 (17%)
      first 3.3.x release: 2014-07-17 (3.3.0)
      latest app commits: min=Jan-2015 median=Oct-2015 max=Dec-2017
    3.4.x: 28 (67%)
      first 3.4.x release: 2016-03-14 (3.4.0)
      latest app commits: min=Feb-2017 median=Jan-2018 max=Mar-2018

  4.x: 2 (5%)
    first 4.x release: 2017-01-26 (4.0.0)
    latest app commits: min=Mar-2018 median=Mar-2018 max=Mar-2018
    4.0.x: 1 (2%)
      first 4.0.x release: 2017-01-26 (4.0.0)
      latest app commits: Mar-2018
    4.1.x: 1 (2%)
      first 4.1.x release: 2017-11-07 (4.1.0)
      latest app commits: Mar-2018



blacklight-access_controls:
  apps without dependency: 16 (37%)
  apps with dependency: 27 (63%)
  latest release: 0.7.0.rc1 (2018-01-12)

  git checkouts: 0
  local path dep: 0

  0.x: 27 (100%)
    first 0.x release: 2015-12-01 (0.1.0)
    latest app commits: min=Feb-2017 median=Feb-2018 max=Mar-2018
    0.2.x: 1 (4%)
      first 0.2.x release: 2015-12-04 (0.2.0)
      latest app commits: Mar-2018
    0.5.x: 1 (4%)
      first 0.5.x release: 2016-06-08 (0.5.0)
      latest app commits: Feb-2017
    0.6.x: 25 (93%)
      first 0.6.x release: 2016-09-01 (0.6.0)
      latest app commits: min=Feb-2017 median=Feb-2018 max=Mar-2018



hydra-access-controls:
  apps without dependency: 1 (2%)
  apps with dependency: 42 (98%)
  latest release: 11.0.0.rc1 (2018-01-17)

  git checkouts: 2 (5%)
  local path dep: 0

  5.x: 1 (2%)
    first 5.x release: 2012-12-11 (5.0.0)
    latest app commits: Mar-2016
    5.4.x: 1 (2%)
      first 5.4.x release: 2013-02-06 (5.4.0)
      latest app commits: Mar-2016

  6.x: 4 (10%)
    first 6.x release: 2013-03-28 (6.0.0)
    latest app commits: min=Feb-2014 median=Jul-2016 max=Mar-2018
    6.4.x: 3 (7%)
      first 6.4.x release: 2013-10-17 (6.4.0)
      latest app commits: min=Feb-2014 median=Feb-2018 max=Mar-2018
    6.5.x: 1 (2%)
      first 6.5.x release: 2014-02-18 (6.5.0)
      latest app commits: Nov-2014

  7.x: 4 (10%)
    first 7.x release: 2014-03-31 (7.0.0)
    latest app commits: min=Jan-2015 median=Dec-2015 max=Aug-2017
    7.2.x: 4 (10%)
      first 7.2.x release: 2014-07-18 (7.2.0)
      latest app commits: min=Jan-2015 median=Dec-2015 max=Aug-2017

  8.x: 1 (2%)
    first 8.x release: 2015-02-26 (8.0.0)
    latest app commits: Sep-2015
    8.1.x: 1 (2%)
      first 8.1.x release: 2015-03-27 (8.1.0)
      latest app commits: Sep-2015

  9.x: 6 (14%)
    first 9.x release: 2015-01-30 (9.0.0)
    latest app commits: min=Mar-2015 median=Oct-2017 max=Mar-2018
    9.1.x: 1 (2%)
      first 9.1.x release: 2015-03-06 (9.1.0)
      latest app commits: Mar-2015
    9.2.x: 2 (5%)
      first 9.2.x release: 2015-07-08 (9.2.0)
      latest app commits: min=Sep-2016 median=Apr-2017 max=Dec-2017
    9.5.x: 2 (5%)
      first 9.5.x release: 2015-11-11 (9.5.0)
      latest app commits: min=Aug-2017 median=Nov-2017 max=Feb-2018
    9.10.x: 1 (2%)
      first 9.10.x release: 2016-04-19 (9.10.0)
      latest app commits: Mar-2018

  10.x: 26 (62%)
    first 10.x release: 2016-06-08 (10.0.0)
    latest app commits: min=Feb-2017 median=Feb-2018 max=Mar-2018
    10.0.x: 1 (2%)
      first 10.0.x release: 2016-06-08 (10.0.0)
      latest app commits: Feb-2017
    10.3.x: 2 (5%)
      first 10.3.x release: 2016-09-02 (10.3.0)
      latest app commits: min=Jan-2018 median=Feb-2018 max=Mar-2018
    10.4.x: 5 (12%)
      first 10.4.x release: 2017-01-25 (10.4.0)
      latest app commits: min=Feb-2017 median=Apr-2017 max=Mar-2018
    10.5.x: 18 (43%)
      first 10.5.x release: 2017-06-09 (10.5.0)
      latest app commits: min=Jul-2017 median=Feb-2018 max=Mar-2018



blacklight:
  apps without dependency: 0
  apps with dependency: 43 (100%)
  latest release: 6.14.1 (2018-01-30)

  git checkouts: 0
  local path dep: 0

  4.x: 5 (12%)
    first 4.x release: 2012-11-30 (4.0.0)
    latest app commits: min=Feb-2014 median=Mar-2016 max=Mar-2018
    4.0.x: 1 (2%)
      first 4.0.x release: 2012-11-30 (4.0.0)
      latest app commits: Mar-2016
    4.4.x: 1 (2%)
      first 4.4.x release: 2013-09-17 (4.4.0)
      latest app commits: Feb-2018
    4.5.x: 2 (5%)
      first 4.5.x release: 2013-10-24 (4.5.0)
      latest app commits: min=Feb-2014 median=Feb-2016 max=Mar-2018
    4.7.x: 1 (2%)
      first 4.7.x release: 2014-02-05 (4.7.0)
      latest app commits: Nov-2014

  5.x: 12 (28%)
    first 5.x release: 2014-02-05 (5.0.0)
    latest app commits: min=Jan-2015 median=Feb-2017 max=Mar-2018
    5.5.x: 3 (7%)
      first 5.5.x release: 2014-07-07 (5.5.0)
      latest app commits: min=Jan-2015 median=Oct-2015 max=Jan-2016
    5.7.x: 1 (2%)
      first 5.7.x release: 2014-08-28 (5.7.0)
      latest app commits: Aug-2017
    5.10.x: 1 (2%)
      first 5.10.x release: 2015-03-06 (5.10.0)
      latest app commits: Sep-2015
    5.11.x: 1 (2%)
      first 5.11.x release: 2015-03-17 (5.11.0)
      latest app commits: Mar-2015
    5.12.x: 1 (2%)
      first 5.12.x release: 2015-03-24 (5.12.0)
      latest app commits: Aug-2017
    5.14.x: 2 (5%)
      first 5.14.x release: 2015-07-02 (5.14.0)
      latest app commits: min=Sep-2016 median=Apr-2017 max=Dec-2017
    5.18.x: 2 (5%)
      first 5.18.x release: 2016-01-21 (5.18.0)
      latest app commits: min=Aug-2017 median=Nov-2017 max=Feb-2018
    5.19.x: 1 (2%)
      first 5.19.x release: 2016-08-30 (5.19.0)
      latest app commits: Mar-2018

  6.x: 26 (60%)
    first 6.x release: 2016-01-21 (6.0.0)
    latest app commits: min=Feb-2017 median=Feb-2018 max=Mar-2018
    6.3.x: 1 (2%)
      first 6.3.x release: 2016-07-01 (6.3.0)
      latest app commits: Feb-2017
    6.7.x: 6 (14%)
      first 6.7.x release: 2016-09-27 (6.7.0)
      latest app commits: min=Feb-2017 median=Aug-2017 max=Mar-2018
    6.9.x: 2 (5%)
      first 6.9.x release: 2017-05-02 (6.9.0)
      latest app commits: min=Feb-2018 median=Feb-2018 max=Feb-2018
    6.10.x: 3 (7%)
      first 6.10.x release: 2017-05-17 (6.10.0)
      latest app commits: min=May-2017 median=Jul-2017 max=Jan-2018
    6.11.x: 4 (9%)
      first 6.11.x release: 2017-08-10 (6.11.0)
      latest app commits: min=Dec-2017 median=Feb-2018 max=Mar-2018
    6.12.x: 3 (7%)
      first 6.12.x release: 2017-11-14 (6.12.0)
      latest app commits: min=Dec-2017 median=Jan-2018 max=Mar-2018
    6.13.x: 2 (5%)
      first 6.13.x release: 2017-12-06 (6.13.0)
      latest app commits: min=Feb-2018 median=Feb-2018 max=Mar-2018
    6.14.x: 5 (12%)
      first 6.14.x release: 2018-01-09 (6.14.0)
      latest app commits: min=Jan-2018 median=Mar-2018 max=Mar-2018



blacklight-gallery:
  apps without dependency: 16 (37%)
  apps with dependency: 27 (63%)
  latest release: 0.9.0 (2017-11-28)

  git checkouts: 0
  local path dep: 0

  0.x: 27 (100%)
    first 0.x release: 2014-02-05 (0.0.1)
    latest app commits: min=Jan-2015 median=Jan-2018 max=Mar-2018
    0.1.x: 3 (11%)
      first 0.1.x release: 2014-09-05 (0.1.0)
      latest app commits: min=Jan-2015 median=Oct-2015 max=Aug-2017
    0.3.x: 1 (4%)
      first 0.3.x release: 2015-03-18 (0.3.0)
      latest app commits: Mar-2015
    0.4.x: 4 (15%)
      first 0.4.x release: 2015-04-10 (0.4.0)
      latest app commits: min=Sep-2016 median=Oct-2017 max=Feb-2018
    0.6.x: 2 (7%)
      first 0.6.x release: 2016-07-07 (0.6.0)
      latest app commits: min=Feb-2017 median=Aug-2017 max=Mar-2018
    0.7.x: 1 (4%)
      first 0.7.x release: 2017-01-24 (0.7.0)
      latest app commits: Feb-2017
    0.8.x: 9 (33%)
      first 0.8.x release: 2017-02-07 (0.8.0)
      latest app commits: min=May-2017 median=Feb-2018 max=Mar-2018
    0.9.x: 7 (26%)
      first 0.9.x release: 2017-11-28 (0.9.0)
      latest app commits: min=Jan-2018 median=Mar-2018 max=Mar-2018



blacklight_range_limit:
  apps without dependency: 37 (86%)
  apps with dependency: 6 (14%)
  latest release: 6.3.2 (2017-12-21)

  git checkouts: 1 (17%)
  local path dep: 0

  5.x: 1 (17%)
    first 5.x release: 2014-02-11 (5.0.0)
    latest app commits: Aug-2017
    5.0.x: 1 (17%)
      first 5.0.x release: 2014-02-11 (5.0.0)
      latest app commits: Aug-2017

  6.x: 5 (83%)
    first 6.x release: 2016-01-26 (6.0.0)
    latest app commits: min=Feb-2017 median=Mar-2018 max=Mar-2018
    6.0.x: 2 (33%)
      first 6.0.x release: 2016-01-26 (6.0.0)
      latest app commits: min=Feb-2017 median=Aug-2017 max=Mar-2018
    6.1.x: 1 (17%)
      first 6.1.x release: 2017-02-17 (6.1.0)
      latest app commits: Mar-2018
    6.2.x: 1 (17%)
      first 6.2.x release: 2017-08-29 (6.2.0)
      latest app commits: Mar-2018
    6.3.x: 1 (17%)
      first 6.3.x release: 2017-12-07 (6.3.0)
      latest app commits: Mar-2018



blacklight_advanced_search:
  apps without dependency: 23 (53%)
  apps with dependency: 20 (47%)
  latest release: 6.3.1 (2017-06-15)

  git checkouts: 1 (5%)
  local path dep: 0

  2.x: 5 (25%)
    first 2.x release: 2012-11-30 (2.0.0)
    latest app commits: min=Feb-2014 median=Mar-2016 max=Mar-2018
    2.1.x: 4 (20%)
      first 2.1.x release: 2013-07-22 (2.1.0)
      latest app commits: min=Feb-2014 median=Jul-2015 max=Mar-2018
    2.2.x: 1 (5%)
      first 2.2.x release: 2014-03-05 (2.2.0)
      latest app commits: Feb-2018

  5.x: 8 (40%)
    first 5.x release: 2014-03-18 (5.0.0)
    latest app commits: min=Jan-2015 median=May-2016 max=Feb-2018
    5.1.x: 6 (30%)
      first 5.1.x release: 2014-06-05 (5.1.0)
      latest app commits: min=Jan-2015 median=Dec-2015 max=Dec-2017
    5.2.x: 2 (10%)
      first 5.2.x release: 2015-10-12 (5.2.0)
      latest app commits: min=Aug-2017 median=Nov-2017 max=Feb-2018

  6.x: 7 (35%)
    first 6.x release: 2016-01-22 (6.0.0)
    latest app commits: min=Feb-2017 median=Feb-2018 max=Mar-2018
    6.0.x: 1 (5%)
      first 6.0.x release: 2016-01-22 (6.0.0)
      latest app commits: Feb-2017
    6.1.x: 1 (5%)
      first 6.1.x release: 2016-09-28 (6.1.0)
      latest app commits: Mar-2018
    6.2.x: 3 (15%)
      first 6.2.x release: 2016-12-13 (6.2.0)
      latest app commits: min=Feb-2017 median=May-2017 max=Feb-2018
    6.3.x: 2 (10%)
      first 6.3.x release: 2017-06-13 (6.3.0)
      latest app commits: min=Feb-2018 median=Feb-2018 max=Mar-2018



active-fedora:
  apps without dependency: 1 (2%)
  apps with dependency: 42 (98%)
  latest release: 12.0.1 (2018-01-12)

  git checkouts: 3 (7%)
  local path dep: 0

  5.x: 1 (2%)
    first 5.x release: 2012-11-30 (5.0.0)
    latest app commits: Mar-2016
    5.6.x: 1 (2%)
      first 5.6.x release: 2013-02-02 (5.6.0)
      latest app commits: Mar-2016

  6.x: 3 (7%)
    first 6.x release: 2013-03-28 (6.0.0)
    latest app commits: min=Feb-2014 median=Nov-2014 max=Mar-2018
    6.7.x: 3 (7%)
      first 6.7.x release: 2013-10-29 (6.7.0)
      latest app commits: min=Feb-2014 median=Nov-2014 max=Mar-2018

  7.x: 5 (12%)
    first 7.x release: 2014-03-31 (7.0.0)
    latest app commits: min=Jan-2015 median=Jan-2016 max=Feb-2018
    7.0.x: 1 (2%)
      first 7.0.x release: 2014-03-31 (7.0.0)
      latest app commits: Feb-2018
    7.1.x: 4 (10%)
      first 7.1.x release: 2014-07-18 (7.1.0)
      latest app commits: min=Jan-2015 median=Dec-2015 max=Aug-2017

  8.x: 1 (2%)
    first 8.x release: 2015-01-14 (8.0.0)
    latest app commits: Sep-2015
    8.0.x: 1 (2%)
      first 8.0.x release: 2015-01-14 (8.0.0)
      latest app commits: Sep-2015

  9.x: 6 (14%)
    first 9.x release: 2015-01-30 (9.0.0)
    latest app commits: min=Mar-2015 median=Oct-2017 max=Mar-2018
    9.0.x: 1 (2%)
      first 9.0.x release: 2015-01-30 (9.0.0)
      latest app commits: Mar-2015
    9.4.x: 1 (2%)
      first 9.4.x release: 2015-09-03 (9.4.0)
      latest app commits: Dec-2017
    9.7.x: 2 (5%)
      first 9.7.x release: 2015-11-30 (9.7.0)
      latest app commits: min=Aug-2017 median=Nov-2017 max=Feb-2018
    9.8.x: 1 (2%)
      first 9.8.x release: 2016-02-05 (9.8.0)
      latest app commits: Sep-2016
    9.11.x: 1 (2%)
      first 9.11.x release: 2016-04-15 (9.11.0)
      latest app commits: Mar-2018

  10.x: 1 (2%)
    first 10.x release: 2016-06-08 (10.0.0)
    latest app commits: Feb-2017
    10.0.x: 1 (2%)
      first 10.0.x release: 2016-06-08 (10.0.0)
      latest app commits: Feb-2017

  11.x: 25 (60%)
    first 11.x release: 2016-09-13 (11.0.0)
    latest app commits: min=Feb-2017 median=Feb-2018 max=Mar-2018
    11.1.x: 4 (10%)
      first 11.1.x release: 2017-01-13 (11.1.0)
      latest app commits: min=Feb-2017 median=Mar-2017 max=Mar-2018
    11.2.x: 2 (5%)
      first 11.2.x release: 2017-05-18 (11.2.0)
      latest app commits: min=May-2017 median=Aug-2017 max=Dec-2017
    11.3.x: 4 (10%)
      first 11.3.x release: 2017-06-13 (11.3.0)
      latest app commits: min=Jul-2017 median=Feb-2018 max=Mar-2018
    11.4.x: 2 (5%)
      first 11.4.x release: 2017-06-28 (11.4.0)
      latest app commits: min=Jan-2018 median=Jan-2018 max=Jan-2018
    11.5.x: 13 (31%)
      first 11.5.x release: 2017-10-12 (11.5.0)
      latest app commits: min=Dec-2017 median=Mar-2018 max=Mar-2018



active_fedora-noid:
  apps without dependency: 16 (37%)
  apps with dependency: 27 (63%)
  latest release: 2.2.0 (2017-05-25)

  git checkouts: 0
  local path dep: 0

  0.x: 1 (4%)
    first 0.x release: 2015-02-14 (0.0.1)
    latest app commits: Dec-2017
    0.3.x: 1 (4%)
      first 0.3.x release: 2015-07-14 (0.3.0)
      latest app commits: Dec-2017

  1.x: 4 (15%)
    first 1.x release: 2015-08-06 (1.0.1)
    latest app commits: min=Sep-2016 median=Nov-2017 max=Mar-2018
    1.0.x: 1 (4%)
      first 1.0.x release: 2015-08-06 (1.0.1)
      latest app commits: Sep-2016
    1.1.x: 3 (11%)
      first 1.1.x release: 2016-05-10 (1.1.0)
      latest app commits: min=Aug-2017 median=Feb-2018 max=Mar-2018

  2.x: 22 (81%)
    first 2.x release: 2016-11-29 (2.0.0)
    latest app commits: min=Feb-2017 median=Jan-2018 max=Mar-2018
    2.0.x: 14 (52%)
      first 2.0.x release: 2016-11-29 (2.0.0)
      latest app commits: min=Feb-2017 median=Jan-2018 max=Mar-2018
    2.2.x: 8 (30%)
      first 2.2.x release: 2017-05-25 (2.2.0)
      latest app commits: min=Jul-2017 median=Feb-2018 max=Mar-2018



active-triples:
  apps without dependency: 6 (14%)
  apps with dependency: 37 (86%)
  latest release: 1.0.0 (2017-11-08)

  git checkouts: 0
  local path dep: 0

  0.x: 26 (70%)
    first 0.x release: 2014-04-29 (0.0.1)
    latest app commits: min=Jan-2015 median=Aug-2017 max=Mar-2018
    0.2.x: 4 (11%)
      first 0.2.x release: 2014-07-01 (0.2.0)
      latest app commits: min=Jan-2015 median=Dec-2015 max=Aug-2017
    0.4.x: 1 (3%)
      first 0.4.x release: 2014-10-24 (0.4.0)
      latest app commits: Sep-2015
    0.6.x: 1 (3%)
      first 0.6.x release: 2015-01-14 (0.6.0)
      latest app commits: Mar-2015
    0.7.x: 6 (16%)
      first 0.7.x release: 2015-05-14 (0.7.0)
      latest app commits: min=Sep-2016 median=Oct-2017 max=Mar-2018
    0.11.x: 14 (38%)
      first 0.11.x release: 2016-08-25 (0.11.0)
      latest app commits: min=Feb-2017 median=Jan-2018 max=Mar-2018

  1.x: 11 (30%)
    first 1.x release: 2017-11-08 (1.0.0)
    latest app commits: min=Dec-2017 median=Mar-2018 max=Mar-2018
    1.0.x: 11 (30%)
      first 1.0.x release: 2017-11-08 (1.0.0)
      latest app commits: min=Dec-2017 median=Mar-2018 max=Mar-2018



ldp:
  apps without dependency: 11 (26%)
  apps with dependency: 32 (74%)
  latest release: 0.7.0 (2017-06-12)

  git checkouts: 0
  local path dep: 0

  0.x: 32 (100%)
    first 0.x release: 2013-07-31 (0.0.1)
    latest app commits: min=Mar-2015 median=Jan-2018 max=Mar-2018
    0.2.x: 1 (3%)
      first 0.2.x release: 2014-12-11 (0.2.0)
      latest app commits: Mar-2015
    0.4.x: 4 (12%)
      first 0.4.x release: 2015-09-18 (0.4.0)
      latest app commits: min=Sep-2016 median=Oct-2017 max=Feb-2018
    0.5.x: 2 (6%)
      first 0.5.x release: 2016-03-08 (0.5.0)
      latest app commits: min=Feb-2017 median=Aug-2017 max=Mar-2018
    0.6.x: 6 (19%)
      first 0.6.x release: 2016-08-11 (0.6.0)
      latest app commits: min=Feb-2017 median=May-2017 max=Mar-2018
    0.7.x: 19 (59%)
      first 0.7.x release: 2017-06-12 (0.7.0)
      latest app commits: min=Jul-2017 median=Feb-2018 max=Mar-2018



linkeddata:
  apps without dependency: 19 (44%)
  apps with dependency: 24 (56%)
  latest release: 3.0.1 (2018-02-10)

  git checkouts: 0
  local path dep: 0

  1.x: 13 (54%)
    first 1.x release: 2013-01-22 (1.0.0)
    latest app commits: min=Jan-2015 median=Feb-2017 max=Mar-2018
    1.1.x: 9 (38%)
      first 1.1.x release: 2013-12-06 (1.1.0)
      latest app commits: min=Jan-2015 median=Jan-2016 max=Feb-2018
    1.99.x: 4 (17%)
      first 1.99.x release: 2015-10-31 (1.99.0)
      latest app commits: min=Sep-2016 median=Aug-2017 max=Mar-2018

  2.x: 11 (46%)
    first 2.x release: 2016-04-11 (2.0.0)
    latest app commits: min=Jul-2017 median=Mar-2018 max=Mar-2018
    2.0.x: 1 (4%)
      first 2.0.x release: 2016-04-11 (2.0.0)
      latest app commits: Mar-2018
    2.2.x: 10 (42%)
      first 2.2.x release: 2017-01-23 (2.2.0)
      latest app commits: min=Jul-2017 median=Feb-2018 max=Mar-2018



riiif:
  apps without dependency: 32 (74%)
  apps with dependency: 11 (26%)
  latest release: 2.0.0 (2018-02-23)

  git checkouts: 0
  local path dep: 0

  0.x: 3 (27%)
    first 0.x release: 2013-11-14 (0.0.1)
    latest app commits: min=Jan-2016 median=Feb-2018 max=Feb-2018
    0.0.x: 1 (9%)
      first 0.0.x release: 2013-11-14 (0.0.1)
      latest app commits: Jan-2016
    0.2.x: 2 (18%)
      first 0.2.x release: 2015-11-10 (0.2.0)
      latest app commits: min=Feb-2018 median=Feb-2018 max=Feb-2018

  1.x: 7 (64%)
    first 1.x release: 2017-02-01 (1.0.0)
    latest app commits: min=Jul-2017 median=Feb-2018 max=Mar-2018
    1.2.x: 1 (9%)
      first 1.2.x release: 2017-04-07 (1.2.0)
      latest app commits: Mar-2018
    1.4.x: 5 (45%)
      first 1.4.x release: 2017-04-11 (1.4.0)
      latest app commits: min=Jul-2017 median=Jan-2018 max=Mar-2018
    1.5.x: 1 (9%)
      first 1.5.x release: 2017-07-20 (1.5.0)
      latest app commits: Mar-2018

  2.x: 1 (9%)
    first 2.x release: 2018-02-23 (2.0.0)
    latest app commits: Mar-2018
    2.0.x: 1 (9%)
      first 2.0.x release: 2018-02-23 (2.0.0)
      latest app commits: Mar-2018



iiif_manifest:
  apps without dependency: 36 (84%)
  apps with dependency: 7 (16%)
  latest release: 0.4.0 (2018-02-28)

  git checkouts: 1 (14%)
  local path dep: 0

  0.x: 7 (100%)
    first 0.x release: 2016-05-13 (0.1.0)
    latest app commits: min=Jul-2017 median=Feb-2018 max=Mar-2018
    0.1.x: 1 (14%)
      first 0.1.x release: 2016-05-13 (0.1.0)
      latest app commits: Mar-2018
    0.2.x: 1 (14%)
      first 0.2.x release: 2017-05-03 (0.2.0)
      latest app commits: Jul-2017
    0.3.x: 4 (57%)
      first 0.3.x release: 2017-10-02 (0.3.0)
      latest app commits: min=Dec-2017 median=Feb-2018 max=Mar-2018
    0.4.x: 1 (14%)
      first 0.4.x release: 2018-02-28 (0.4.0)
      latest app commits: Mar-2018



pul_uv_rails:
  apps without dependency: 39 (91%)
  apps with dependency: 4 (9%)
  latest release: 2.0.1 (2017-12-15)

  git checkouts: 2 (50%)
  local path dep: 0

  2.x: 4 (100%)
    first 2.x release: 2017-12-15 (2.0.1)
    latest app commits: min=Dec-2017 median=Feb-2018 max=Mar-2018
    2.0.x: 4 (100%)
      first 2.0.x release: 2017-12-15 (2.0.1)
      latest app commits: min=Dec-2017 median=Feb-2018 max=Mar-2018



mirador_rails:
  apps without dependency: 43 (100%)
  apps with dependency: 0
  latest release: 0.7.0 (2017-10-12)

  git checkouts: 0
  local path dep: 0



osullivan:
  apps without dependency: 42 (98%)
  apps with dependency: 1 (2%)
  latest release: 0.0.3 (2015-01-21)

  git checkouts: 0
  local path dep: 0

  0.x: 1 (100%)
    first 0.x release: 2015-01-16 (0.0.2)
    latest app commits: Feb-2018
    0.0.x: 1 (100%)
      first 0.0.x release: 2015-01-16 (0.0.2)
      latest app commits: Feb-2018



bixby:
  apps without dependency: 39 (91%)
  apps with dependency: 4 (9%)
  latest release: 1.0.0 (2018-02-13)

  git checkouts: 0
  local path dep: 0

  0.x: 4 (100%)
    first 0.x release: 2017-03-30 (0.1.0)
    latest app commits: min=Jan-2018 median=Mar-2018 max=Mar-2018
    0.2.x: 1 (25%)
      first 0.2.x release: 2017-03-30 (0.2.0)
      latest app commits: Mar-2018
    0.3.x: 3 (75%)
      first 0.3.x release: 2017-10-03 (0.3.0)
      latest app commits: min=Jan-2018 median=Mar-2018 max=Mar-2018



orcid:
  apps without dependency: 42 (98%)
  apps with dependency: 1 (2%)
  latest release: 0.9.1 (2014-12-09)

  git checkouts: 1 (100%)
  local path dep: 0

  0.x: 1 (100%)
    first 0.x release: 2014-02-21 (0.0.1.pre)
    latest app commits: Feb-2018
    0.9.x: 1 (100%)
      first 0.9.x release: 2014-10-27 (0.9.0)
      latest app commits: Feb-2018



rsolr:
  apps without dependency: 0
  apps with dependency: 43 (100%)
  latest release: 2.1.0 (2017-11-15)

  git checkouts: 2 (5%)
  local path dep: 0

  1.x: 31 (72%)
    first 1.x release: 2011-01-06 (1.0.0)
    latest app commits: min=Feb-2014 median=Aug-2017 max=Mar-2018
    1.0.x: 16 (37%)
      first 1.0.x release: 2011-01-06 (1.0.0)
      latest app commits: min=Feb-2014 median=Jun-2016 max=Mar-2018
    1.1.x: 15 (35%)
      first 1.1.x release: 2016-02-11 (1.1.1.pre1)
      latest app commits: min=Feb-2017 median=Jan-2018 max=Mar-2018

  2.x: 12 (28%)
    first 2.x release: 2017-05-01 (2.0.0)
    latest app commits: min=May-2017 median=Feb-2018 max=Mar-2018
    2.0.x: 4 (9%)
      first 2.0.x release: 2017-05-01 (2.0.0)
      latest app commits: min=May-2017 median=Nov-2017 max=Mar-2018
    2.1.x: 8 (19%)
      first 2.1.x release: 2017-11-15 (2.1.0)
      latest app commits: min=Dec-2017 median=Feb-2018 max=Mar-2018

 

yes, product owner and technical lead need to be different people

I used to disagree with this conventional wisdom, and think I could be both. I now realize in retrospect that’s because I was in an environment where I basically had no choice but to be both.  Or at least where I didn’t trust anyone who might step in to be product owner to actually take responsibility for it and do it right.

Having had the experience of being de facto the technical lead (only engineer or most experienced engineer of a very very small team) on a project/program with a very responsible and effective de facto product owner, I see I was totally wrong.

The typical argument in favor of these roles being separated is that the technical lead/engineer is too close to the code to really understand the needs of stakeholders (customers, the organization, politics within the organization, whatever) and fill a product owner role. I think that argument definitely has a lot of merit, but if a technical lead also has a lot of domain knowledge, and spends a lot of time with stakeholders (and/or hearing from UX people), and has a lot of skill, couldn’t they maybe do both things effectively in the same person?  And might I be such a person who can pull it off?  It’s a challenge, but maybe.

But. The real reason I’ve seen that this is no good, is that there’s no way for me to stay sane doing that.  There are just too many things to worry about. Instead: One person (“product owner”) to decide what is to be done (in large part consisting of prioritization, and deciding when a feature is “good enough” and when it is not), and another separate person to decide how to do it (the technical lead).

Then I, the technical lead, can spend my time worrying about (or we could say ‘planning’ if I wasn’t such a worrier!) the technical decisions — whether we’re really doing it right, whether this is the most efficient or lowest TCO way to accomplish something, technical debt, choice of dependencies, how to set up our current work to provide the right platform/abstractions for future work, etc.   Without having to worry about if we’ve chosen to do the right things, or being responsible for those decisions, or being called to account for making them “wrong” by internal stakeholders.

I just can’t do both and stay sane. And I suspect this isn’t just me.  I realize now that in the former position where I was doing both and thought I was doing okay at it — I was not staying sane, and my increasing feelings of loss of control over things effected team and organizational dynamics negatively. It wasn’t healthy for anyone.

To be sure, the product owner and technical lead should be in close communication, and feeding back on each other. I still don’t believe it should be a one-way power dynamic, where the product owner simply sets down the plan and the technical team implements it. The product owner’s decisions should be influenced by feedback from the technical team/lead, on feasibility, estimated costs/time, and even just their own ideas about how to meet stakeholder needs, among other things.  And the product owner should ideally have some high-level conceptual understanding of how the engineering works. But they’ve got to be different people in different roles, so they can each focus on their area of responsibility

improving citation export in a sufia/hyrax app

Our app is based on Sufia 7, but I believe the relevant parts are mostly true for Hyrax as well; if I know they aren’t, I’ll try to make a note of it.

The out of the box Sufia app offers three citation export links on an individual item (ie Work) page, for: Endnote, Zotero, and Mendeley.

The Zotero and Mendeley links just take you to a page that says:

Exporting to Zotero[Mendeley] is supported via embedded metadata. If Zotero[Mendeley] does not automatically pick up metadata for deposited files, please report the issue via the <%= link_to ‘Contact Form’, sufia.contact_form_index_path %>.

I believe the automatic metadata pickup is supposed to be via COinS.  Putting aside that that is a bit weird UX there, Zotero’s “Save to Zotero” button did do something with “Embedded Metadata”, but didn’t really pick up all the metadata we’d want. I think this is because we hadn’t properly configured all our local custom metadata fields to work with COinS, which I believe in Sufia is done via Rails i18n, and in hyrax by a different mechanism.

I didn’t get to the bottom of this, because either way, COinS isn’t really granular/specific enough to get all the metadata we have as good as it can be for a reference management application — there’s no way to say type “Manuscript”, or provide archival arrangement/location (box/folder).  I’m not sure if there’s a way to send abstract or subject/keywords (which users appreciate included in their export to reference manager, even though they aren’t part of a citation) — and the link I used to use to check what fields are available in standard OpenURL metadata (on which COinS is based) are giving me 404 errors from OCLC today.  Oh, and did I mention that COinS (if not OpenURL itself) is kind of an abandonware standard, the site that documents the standard is currently only available in internet archive wayback machine.

The EndNote export was also not including all of our possible metadata as well as it could be. I’m not sure where I’d customize this for our local fields, perhaps I need to override the Sufia::SolrDocument::Export class; not really sure what’s going on there. But looking at that class suggests that the format it’s calling “EndNote” is this one , which I think is now more commonly called “Endnote Tagged Format” (although I can’t find a reference for that), as distinct from Endnote XML, which I’m also having trouble finding documentation for.

Rather than trying to get each of these existing logic paths working, we decided to initially replace with…

Replace with RIS for everyone

RIS is the closest thing to a “lingua franca” among reference management software. While it is also an abandoned standard (wikipedia links to this archive.org capture), pretty much every reference management software can handle it, and in fairly compatible/standard ways — I think mainly due to every new reference management software trying to be compatible with the current market leader at the point it was introduced, all the way back to the no-longer-existing software that originated RIS.

For the same reasons, it seems to be relatively close to the internal data models of most reference management software.  It’s annoying in some ways, including (did we mention) that it’s an unmaintained abandonware standard, there are (undocumented) minor differences between how different software handles it on import, and the same ‘tag’ in RIS can be interpreted differently depending on the ‘type’ of the reference. (Oh, and there’s a limited number of ‘types’, not suitable to the full diversity of the modern digital archive, or even for all types found in modern reference management software!).

But it’s way more expressive than COinS, and close to as expressive as Endnote Tagged Format (probably just as good for the actual metadata we have), and there’s not much better.

And it’s super convenient to be able to write one export which will work with all reference management software, rather than spend extra time (we can’t necessarily afford) to do a custom export for every possible software (and over the past decade the “popular” software has changed several times, and may vary in different disciplines — but they all do RIS).

When I asked in the Zotero forums (the Zotero people are great and tend understand the ecosystem way beyond just their software, as domain experts in a way many of us don’t)  if there was a better format to use for a ‘generic’ import to multiple reference management systems, or even a better format to use just for Zotero, @adamsmith replied:

There is indeed no useful bibliographic exchange format. It’s a fairly ridiculous situation. You’ll get the best import into Zotero using Zotero RDF, but a) that isn’t well documented and b) it’ll probably be replaced with a JSON-LD/schema.org based schema in the not-too-distant future, so I wouldn’t invest heavily in implementing it. Endnote XML is marginally better documented and, by virtue of being XML, more robust, so that might be worth it. BibLaTeX is very precise and exceedingly well documented, but I don’t think many tools other than Zotero do very well importing it (and I don’t know _how_ well Zotero does — most people use this the other way from Zotero to BibLaTeX).

(EndNote XML didn’t look to me significantly more powerful or convenient than RIS for the sorts of data we have, although it’s more straightforward in some ways. Not sure if it has as universal adoption).

In general, if you download an RIS file, and double-click on it, it will open in your installed reference software of choice (or, as in Firefox, depending on your browser and browser preferences, open immediately in your reference manager software without having to find the file and double-click on it). If you have the Zotero Chrome extension installed, it will (at first ask to) “intercept” an RIS download (with proper MIME/IANA content-type header) and immediately send it to Zotero, even though Chrome doesn’t ordinarily do that.

So, rather than figure out how the current Sufia citation export stuff worked to make it work better for us and/or try to improve or expand it, we decided to try replacing the built-in stuff with our own RIS implementation.

Our implementation

I basically just created a ruby class that can take one of our Sufia ‘work’ models, and translate it to RIS — not really all that hard.  Thinking of working towards something shareable, I did split my implementation into a base class that sets up some tools for defining mappings, and a concrete sub-class that defines the mappings.

I originally intended to allow the mappings to look up attributes based on RDF predicates, which might theoretically make it possible to share mappings with more likely chance of working across projects. But I see now I never actually implemented that feature, oops. (And it’s unclear how/if this kind of rdf-predicate-to-model-attribute lookup would work in a valkyrie-based app like planned hyrax 3.0, or if it would be possible to make it work in a standard way).

Then just register the RIS mime type; hook into a CurationConcerns method to have the work show method deliver the RIS using our serializer; generate an on-page link to that action in our already customized view; and that’s pretty much it.

Some interesting parts:

  • In our mappings, we put archival location information in both “AV” and “VL” tags, because in my experimentation different software seemed to at least sometimes use each.
  • In RIS “M2” field (“Miscellaneous 2” says RIS), which Zotero imports as “Extra” (and Endnote I think something similar), we put our recommended “Courtesy of Science History Institute” statement, as well as any rights information we have.
  • When we can’t determine a great RIS “type” for the citation, we default to “MANSCPT” (Manuscript), some advice I found suggested this tends to be the one that will most reliably get archival-relevant fields and output citation formats in reference management software, and much/most of our content is unpublished in a mass edition for general distribution (whether technically a ‘manuscript’ or not).
  • We create a filename for the downloaded RIS file that includes the first three words of the title as well as the internal ID.  Users validate they appreciate this, so they can figure out what the file is on their disk if needed. Refactored some of the code I was previously using for derivative download names to do similar, to be reusable in this context.

You can take a look at the PR with initial implementation of this feature in our app if you like.  Reviewing it now, looks like that PR accidentally ended up with a new file that is unrelated, and really from a different feature, at `app/views/application/_query_constraint_as_form.html.erb`, oops sorry.  I see now too there is only a limited “smoke test” spec for “converts without raising any exceptions”, so it goes.

How did it turn out? Future improvement?

As I write this, we just now deployed to production, but we earlier did some user testing with several users in a feature demo.  In general, we found out that users have pretty low expectations when it comes to citation export, they are used to it not working perfectly, and most users asked found our system to work at least as well as their expectations of an automated reference export, and often better.  I feel good about the RIS direction as an efficient use of developer time to get pretty decent citation export feature.

There are a couple of outstanding issues:

Child works

We have some things that are ‘works’ in sufia, but are really excerpts from the “work” that should be cited in the reference.  At the moment sometimes we have that ‘parent’ work stored in our Sufia repo, we sometimes don’t.  Our RIS export feature never takes it into account though, it always exports the citation as if it’s a standalone thing based on the title of the ‘work’ in sufia, even if there’s really a parent ‘container’ work that the reference should be based on.  This is a bit hard to get right for both metadata reasons (we might not have sufficient machine-readable metadata in all cases to determine correct citation), and technical reasons (sufia doesn’t make it super easy to get access to parent information in an efficient/performant way).

Zotero toolbar button

If you actually click on the “Export citation” button, it generally gets into Zotero fine. (On Chrome, need Zotero plugin installed; on Firefox with plugin or need to tell Firefox the first time to open .ris with Zotero). But if you have the Zotero browser plugin installed, you have a “Save to Zotero” button in toolbar.  Using this one imports into Zotero as a “web page” (rather than correct citation type for the reference; our users generally wanted reference types based on the original item, not ‘web page’), and with stunted/limited metadata.  (In our case Zotero is picking up “Embedded Metadata” from somewhere, not sure in what format, it was not intentional by me; but if it were not the metadata would be no better).

One of our test users tried this, and was disappointed.

Zotero supports a couple generic options for getting the “Save to Zotero” button to pick up embedded metadata.  COinS, as mentioned, isn’t really expressive enough for our metadata. I’m not sure what they mean about “META tags”, but possibly only applies to RDF? (And I would not be thrilled about figuring out right RDF vocab for Zotero to pick up, and doing the translation). That seems to leave unAPI, from which we could actually expose/re-use our now-existing RIS, great. UnAPI is another kind of abandoned standard, and based on kind of a mis-use of HTML too with possible accessibility concerns. ☹️  It wouldn’t be that hard to implement, but even easier would be if Zotero would just pick up HTML <link rel="alternate" type="type"> tags for Zotero-recognized types. Zotero doesn’t do that at present, but when I asked, there seems to be some support for the idea of it it doing so, with some details (as well as implementation!) to be worked out. (Also, can I say again I love how responsive Zotero devs are on the Zotero forums?).

Of course, if we had no “Export citation” button deployed, the “Save to Zotero” button provided by Zotero plugin would still be there, and still behave unsatisfactorily.

But Deployed

Based on consultation with potential users, we didn’t consider either of these problems severe enough to delay release of our RIS export button, although we’ve made a note of them as possible future improvement to prioritize.  You can see the RIS export feature in action in our current production system, on any individual item page, such as this one, look for the “Export citation” button.

 

attachment filename downloads in non-ascii encodings, ruby, s3

You tell the browser to force a download, and pick a filename for the browser to ‘save as’ with a Content-Disposition header that looks something like this:

Content-Disposition: attachment; filename="filename.tiff"

Depending on the browser, it might open up a ‘Save As’ dialog with that being the default, or might just go ahead and save to your filesystem with that name (Chrome, I think).

If you’re having the user download from S3, you can deliver an S3 pre-signed URL that specifies this header — it can be a different filename than the actual S3 key, and even different for different users, for each pre-signed URL generated.

What if the filename you want is not strictly ascii? You might just stick it in there in UTF-8, and it might work just fine with modern browsers — but I was doing it through the S3 content-disposition download, and it was resulting in S3 delivering an XML error message instead of the file, with the message “Header value cannot be represented using ISO-8859-1.response-content-disposition”.

Indeed, my filename in this case happened to have a Φ (greek phi) in it, and indeed this does not seem to exist as a codepoint in ISO-8859-1 (how do I know? In ruby, try `”Φ”.encode(“ISO-8859-1”)`, which perhaps is the (standard? de facto?) default for HTTP headers, as well as what S3 expects. If it was unicode that could be trans-coded to ISO-8859-1, would S3 have done that for me? Not sure.

But what’s the right way to do this?  Googling/Stack-overlowing around, I got different answers including “There’s no way to do this, HTTP headers have to be ascii (and/or ISO-8859-1)”, “Some modern browsers will be fine if you just deliver UTF-8 and change nothing else” [maybe so, but S3 was not], and a newer form that looks like filename*=UTF-8''#{uri-encoded ut8} [no double quotes allowed, even though they ordinarily are in a content-disposition filename] — but which will break older browsers (maybe just leading to them ignoring the filename rather than actually breaking hard?).

The golden answer appears to be in this stackoverflow answer — you can provide a content-disposition header with both a filename=$ascii_filename (where $filename is ascii or maybe can be ISO-8859-1?), followed by a filename*=UTF-8'' sub-header. And modern browsers will use the UTF-8 one, and older browsers will use the ascii one. At this point, are any of these “older browsers” still relevant? Don’t know, but why not do it right.

Here’s how I do it in ruby, taking input and preparing a) a version that is straight ascii, replacing any non-ascii characters with _, and b) a version that is UTF-8, URI-encoded.

ascii_filename = file_name.encode("US-ASCII", undef: :replace, replace: "_")
utf8_uri_encoded_filename = URI.encode(filename)

something["Content-Disposition"] = "attachment; filename=\"#{ascii_filename}\"; filename*=UTF-8''#{utf8_uri_encoded_filename}"

Seems to work. S3 doesn’t complain. I admit I haven’t actually tested this on an “older browser” (not sure how old one has to go, IE8?), but it does the right thing (include the  “Φ ” in filename) on every modern browser I tested on MacOS, Windows (including IE10 on Windows 7), and Linux.

One year of the rubyland.news aggregator

It’s been a year since I launched rubyland.news, my sort of modern take on a “planet” style aggregator of ruby news and blog RSS/atom feeds.

Is there still a place for an RSS feed aggregator in a social media world? I think I like it, and find it a fun hobby/side project regardless. And I’m a librarian by training and trade, and just feel an inner urge to collect, aggregate, and distribute information, heh. But do other people find it useful? Not sure!  You can (you may or may not have known) follow rubyland.news on twitter instead, and it’s currently got 86 followers, that’s probably a good sign. I don’t currently track analytics on visits to the http rubyland.news page. It’s also possible to follow rubyland.news through it’s own aggregated RSS feed, which would be additionally hard to track.

Do you use it or like it? I’d love for you to let me know.

Thoughts on a year of developing/maintaining rubyland.news

I haven’t actually done too much maintenance, it kind of just keeps on chugging. Which is great.  I had originally planned to add a bunch of features, mainly including an online form to submit suggested feeds to include, and an online admin interface for me to approve and otherwise manage feeds. Never got to it, haven’t really needed it — it would take a lot of work over the no-login-no-admin-screen thing that’s there now, and adding feeds with a rake task has worked out fine. heroku run rake feeds:add[http://some/feed.rss], no problem.  So just keep feeling free to email me if you have a suggestion please. So far, I don’t get too many such suggestions, but I myself keep an eye on /r/reddit and add blogs when I see an interesting post from one of them there. I haven’t yet removed any feeds, but maybe I should; inactivity doesn’t matter too much, but feeds sometimes drift to no longer be so much about ruby.

If I was going to do anything at this point, it’d probably trying to abstract the code a bit so I can use it for other aggregators, with their own names and CSS etc.

It’s kind of fun to have a very simple Rails app for a change. I’m not regretting using Rails here, I know Rails, and it works fine here (no performance problems, I’m just caching everything aggressively with Rails fragment caching, I don’t even bother with a CDN. Unless I set up cloudflare and forgot? I forget. The site only has like 4 pages!). I can do things like my first upgrade of an app to Rails 5.1 in a very simple but real testbed. (It was surprisingly not quite as trivial as I thought even to upgrade this very simple app from rails 5.0 to 5.1. Of course, that ended up not being just Rails 5.1, but doing things like switching to heroku’s supported free-for-hobby-dyno SSL endpoint (the hacky way it was doing it before no longer worked with rails 5.1), and other minor deferred maintenance.  Took a couple hours probably.

It’s fun working with RSS/Atom feeds, I enjoy it. Remember that dream of a “Web 2.0” world that was all about open information sharing through APIs?  We didn’t really get that, we got walled garden social media instead. (More like gated plantations than walled gardens actually, a walled garden sounds kind of nice and peaceful).

But somehow we’ve still got RSS and Atom, and they are still in fairly widespread use. So I get to kind of pretend I’m still in that world. They are in fairly widespread use… but usually as a sort of forgotten unmaintained stepchild.  There are lacks of specification in the specifications that will never be filled in, and we get to deal with it. (Can a ‘title’ be HTML, or must it be plain text?  If it’s HTML, is there any way to know it is? Nope, not really). I run into all kinds of weirdness — can links in a feed be relative urls? If so, they are supposed to be… relative to what? You might think the feed url… but that’s not always how they go. I get to try to work around them all, which is kinda fun. Or sometimes ‘fun’.

I wish people would offer more tagged/subsection feeds, those seem pretty rare still. I wish medium would offer feeds that worked at all, they don’t really — medium has feeds for a person, but they include both posts and comments with no ways to distinguish, and are thus pretty useless for an aggregator. (I don’t want your out of context two-line comments in my aggregator).

I also get to do fun HTTP/REST kind of stuff — one of the reasons I chose to use Rails with a database as a backend, so I can keep state, is so I can actually do conditional GET requests of feeds and only fetch if a feed has changed. Around 66% of the feed URLs actually provide etags or last-modified so I can try. Then every once in a while I see a feed which reports “304 Not Modified” but it’s a lie, there is new content, the server is just broken. I usually just ignore em.

Keeping state also lets me refuse to let a site post-date it’s entries to keep em at the top of the list, and generally lets me keep the aggregated list in a consistent and non-changing order even if people change their dates on their posts. Oh, dealing with dates is another ‘fun’ thing, people deliver dates in all sorts of formats, with and without timezones, with and without times (just dates), I got to try to normalize them all somewhat to keep things in a somewhat expected and persistent newest-on-top order. (in which state is also helpful, because I can know when I last fetched a feed, and what entries are actually new since then, to help me guess a “real” timestamp for screwy or timestamp-missing entries).

Anyway, it’s both fun and “fun”.

Modest Sponsorship from Honeybadger

Rubyland.news is hosted on heroku, cause it’s easy, and even fun, and this is a side project. It’s costs are low (one hobby dyno, a free postgres that I might upgrade to the lowest tier paid one at some point). Costs are low, but there are costs.

Fortunately covered by a modest $20/month sponsorship from Honeybadger. I think it’s important to be open about exactly how much they are paying, so you can decide for yourself if it’s likely influencing rubyland.news’s editorial decisions or whatever, and just everything is transparent. I don’t think it is, I do include honeybadger’s Developer Blog in the aggregator, but I think I’d stop if it started looking spammy.

When they first offered the modest sponsorship, I had no experience with honeybadger. But since then I’ve been using it both for rubyland.news (which has very few approaching zero uncaught exceptions) and a day job project (which has plenty). I’ve liked using it, I definitely recommend checking it out.  Honeybadger definitely keeps developing, adding and refining features, if there’s any justice I think it’ll be as successful in the market as bugsnag.  I think I like it better than bugsnag, although it’s been a while since I used bugsnag now. I think honeybadger pricing tends to be better than bugsnag’s, although it depends on your needs and sizes. They also offer a free “micro” plan for projects that are non-commercial open source, although you gotta email them to ask for it. Check em out!