Large collections in JS in the browser

Developers from the New York Times have released some open source software meant for displaying and managing large digital content collections, and doing so client-side, in the browser with JS.

Developed for journalism, this has some obvious potential relevance to the business of libraries too, right?  Large collections (increasingly digital), that’s what we’re all about, ain’t it?

Pourover and Tamper

Today we’re open-sourcing two internal projects from The Times:

  • PourOver.js, a library for fast filtering, sorting, updating and viewing large (100k+ item) categorical datasets in the browser, and
  • Tamper, a companion protocol for compressing categorical data on the server and decompressing in your browser. We’ve achieved a 3–5x compression advantage over gzipped JSON in several real-world applications.

Collections are important to developers, especially news developers. We are handed hundreds of user submitted snapshots, thousands of archive items, or millions of medical records. Filtering, faceting, paging, and sorting through these sets are the shortest paths to interactivity, direct routes to experiences which would have been time-consuming, dull or impossible with paper, shelves, indices, and appendices….

…The genesis of PourOver is found in the 2012 London Olympics. Editors wanted a fast, online way to manage the half a million photos we would be collecting from staff photographers, freelancers, and wire services. Editing just hundreds of photos can be difficult with the mostly-unimproved, offline solutions standard in most newsrooms. Editing hundreds of thousands of photos in real-time is almost impossible.

Yep, those sorts of tasks sound like things libraries are involved in, or would like to be involved in, right?

The actual JS does some neat things with figuring out how to incrementally and just-in-time send delta’s of data, etc., and some good UI tools. Look at the page for more.

I am increasingly interested in what ‘digital journalism’ is up to these days. They are an enterprise with some similarities to libraries, in that they are an information-focused business which is having to deal with a lot of internet-era ‘disruption’.    Journalistic enterprises are generally for-profit (unlike most of the libraries we work in), but still with a certain public service ethos.  And some of the technical problems they deal with overlap heavily with our area of focus.

It may be that the grass is always greener, but I think the journalism industry is rising to the challenges somewhat better than ours is, or at any rate is putting more resources into technical innovation. When was the last time something that probably took as many developer-hours as this stuff, and is of potential interest outside the specific industry, came out of libraries?

Leave a comment