Update Aug 2010: I’ve done some time-distribution visualization with Flot, Solr, and Blacklight, not quite what’s contemplated in this post, but it’s kinda neat. https://bibwild.wordpress.com/2010/07/29/cool-range-limitprofile-function-in-blacklight/
So I’ve been thinking for a while about visualizing time distribution in an OPAC view. Things in our catalog generally have a year they were published, or a range of years for a serial; and sometimes are about a particular time period too.
The MIT Simile timeline widget is one way to do this, and the way I’ve heard people think about using. But I can’t figure out how the timeline widget could scale very well to a set of thousands or hundreds of thousands or millions of ‘points’ — either visually or technologically. And I’m not sure how flexible it’s javascript api is for tweaking the way we’d want to customize things for our use case — simile seems to have a lot of cool features for when you have points in time that are very granular (days or even seconds), which isn’t really our use case here. (Although they do have an example of a somewhat less granular data set. I find that example somewhat klunkier than their front page example though — and it still doens’t have all that many data points on it. ). But Simile is certainly one option.
The I noticed that the new google interface has made more prominent it’s own timeline visualization. This one is a bit more suited for low-granularity data like years (although it will also display high-granularity time data). But…. it’s really pretty clunky. You can click on a division from that timeline to ‘drill down’, but it gets kind of confusing, seems to me, when you do that. I’m honestly kind of surprised that Google couldn’t/didn’t do better. (Maybe they were trying to absolutely minimize the javascript required?).
(Also, Google seems to mix together date of web page publishing as data point with dates mentioned in the web page as data points, which seems kind of an odd choice, but that’s a different topic, here I’m mostly thinking about interfaces for visualizing a timeline of dates, not how you choose what dates to put on the timeline).
But thinking of how I might duplicate the google-style timeline visualization, I went searching for JQuery plugins (or other javascript libraries) for timeline visualization, that could achieve what google does, more or less.
What I wound up finding was flot. Which is not for timeline visualization specifically, it’s a general purpose data visualization jQuery plugin. And man is it super neat! Incredibly powerful and flexible, but with a very simple concise and easy to use to API, and incredibly slick looking visualizations too. It’s super neat! (I think a good principle of any kind of API design (or really any kind of system design at all) is that simple things should be simple to do; more complicated things can be more complicated to do, although should still be as simple as you can make them. Flot does well here).
Imagine this type of visualization (seriously, click on that link, it’s pretty sweet) of catalog timeline data. I like the two linked charts (overview, and zoom-in; similar to the Simile version and what Google kind of sort of klunkily does), and you can make selections in either one (click and drag to make a selection; also drag-panning). And view source to see how amazingly few and simple lines of JS were required to draw that, wow!
Just add some labelled vertical lines (which flot is quite capable) of. Now, when you make a selection, you could get an immediately changed list of bib results in another part of the screen (bottom or side). And/or, when you mouseover (or click) on a particular year (or range, depending on zoom level), you could get a pop-up window listing the bibs in the time you clicked on.
Totally do-able with flot. Wow, flot is neat!
It’s not entirely clear to me how you’d deal with items that have a range of dates instead of one particular date in that visualization though. (Like a serial, or a book about the 18th century). An ‘item’ with a range instead of a fixed date is one thing that the Simile widget is set up for, but neither the Google version nor any of the flot examples show. But if you can think of how to do it visually, I bet flot is probably flexible enough to let you do it.
Maybe some day I’ll get to play around with that. No day any time soon I don’t think, sadly. Sometimes I feel like I am continually building the basic boring parts of my systems to bare level of competence — and just when I think I’ve got that done and can finally start doing some really cool stuff on the platform I’ve built, nope, there’s a different system that I’ve got to work on getting to the level of basic robust competent platform. Oh well, some day.
As soon as I read this I got excited and rewrote the engine that harvest subjects in my OPAC search results to generate a “tag” cloud, so that it also harvests the publication dates. Now my problem is, I have the data, but only for the search results visible on the screen. I’m limited to client-side manipulations.
Best case scenario, I’ve got 50 dates in an array to work with. Is that a useful data set to generate a timeline? What if there were 800 search results, then what good is a timeline representing only the 50 results on my screen? D’oh.
So wait, your tag cloud is just a cloud of tags on that particular screen? So if there were 800 results, you only get tags from the first 10-50 in the cloud? Do you have an idea of how/whether people find this useful? Ah, III, huh? You certainly could make a little mini “spark line” bar chart or something on the page for just that page of results, but I’m not sure how useful it would be either.
Treating subjects as tags is already problematic, then you add the fact that I’m doing it in javascript and limited to what’s on screen, and you’d think it might end up useless. If fact it turns out to work rather well. If there are 50 or more results my first screen of relevance ranked keyword search results (say what you will about III, their relevance ranking algorithm is surprisingly good) I get hundreds of subjects. When de-duped, weighted, and presented as a tag cloud, they actually offer a useful path from keyword searches to (usually) relevant subject headings.
I think this is a pretty nice example [Keyword = Boeing] of what you get when all the results fit on a single results screen. In this example of multi-page results [Keyword = Ford] you can really see that the subject content of each page of results varies greatly from the last. This is where having the whole data set could help.
That is essentially what III does in Encore, though I suppose they get to harvest the subjects from the whole results set. I can think of a few ways to call up and scrape more pages of results, but the performance hit would be terrible.
The spark line presentation of the dates is an interesting idea, I think I’ll have to play with a generator.
I am actively attempting (thanks to summer break!) to code up a timline with flot right now. It sounds very very similar (with tags/etc) to all these ideas floating about. However, I’m failing (at this moment) to zoom on timelines. Whenever I get a working demo, I’m sure I’ll post, and hopefully trackback here..
Have you had any luck thus far?
FYI, my data-set is all life-based activities: from weather temperatures, RSS-feeds, files edited, email counts, etc. I hope to eventually use a spam-filter/AI to gain the content-topics on all of ’em. Ambitious, I know.
I haven’t actually written any code yet; but did you try to model your code on the example I linked to? It does provide a zoom example, and is only a few lines of code.
Tip: http://vis.stanford.edu/protovis/
JavaScript-backed visualizations that try to work up some abstractions, and not just be another game programming library. I kind of like this concept.
(Of course I chimed in without fully comprehending your problem. But Protovis does an admirable job of doing a new sort of API for JS dataviz.)