vendor optical disc format promoted as ‘archival’?

Anyone in the digital archivist community want to weigh in on this, or provide  citations to reviews or evaluations?

I’m not sure exactly who the market actually is for these “Archival Discs.” If it was actually those professionally concerned with long-term reliable storage, I would think the press release would include some information on what leads them to believe the media will be especially reliable long-term, compared to other optical media. Which they don’t seem to.

Which makes me wonder how much of the ‘archival’ is purely marketing. I guess the main novelty here is just the larger capacity?

Press Release: “Archival Disc” standard formulated for professional-use next-generation optical discs

Tokyo, Japan – March 10, 2014 – Sony Corporation (“Sony”) and Panasonic Corporation (“Panasonic”) today announced that they have formulated “Archival Disc”, a new standard for professional-use, next-generation optical discs, with the objective of expanding the market for long-term digital data storage*.

Optical discs have excellent properties to protect themselves against the environment, such as dust-resistance and water-resistance, and can also withstand changes in temperature and humidity when stored. They also allow inter-generational compatibility between different formats, ensuring that data can continue to be read even as formats evolve. This makes them robust media for long-term storage of content. Recognizing that optical discs will need to accommodate much larger volumes of storage going forward, particularly given the anticipated future growth in the archive market, Sony and Panasonic have been engaged in the joint development of a standard for professional-use next-generation optical discs.

Posted in General | Leave a comment

A Proquest platform API

We subscribe to a number of databases via Proquest.

I wanted an API for having my software execute fielded searches against a Proquest a database — specifically Dissertations and Theses in my current use ase — and get back structured machine-interpretable results.

I had vaguely remembered hearing about such an API, but was having trouble finding any info about it.

It turns out, while you’ll have trouble finding any documentation about it, or even any evidence it exists on the web, and you’ll have trouble getting information about it from Proquest support too — such an api does exist.  Hooray.

You may occasionally see it called the “XML Gateway” in some Proquest documentation materials (although Proquest support doesn’t neccesarily know this term). And it was probably intended for and used by federated search products — which makes me realize, oh yeah, if I have any database that’s used by a federated search product, then it’s probably got some kind of API.

And it’s an SRU endpoint.

(Proquest may also support z39.50, but at least some Proquest docs suggest they recommend you transition to the “XML Gateway” instead of z39.50, and I personally find it easier to work with then z39.50).

Here’s an example query:

http://fedsearch.proquest.com/search/sru/pqdtft?operation=searchRetrieve&version=1.2&maximumRecords=30&startRecord=1&query=title%3D%22global%20warming%22%20AND%20author%3DCastet

For me, coming from an IP address recognized as ‘on campus’ for our general Proquest access, no additional authentication is required to use this API. I’m not sure if we at some point prior had them activate the “XML Gateway” for us, likely for a federated search product, or if it’s just this way for everyone.

The path component after “/sru”, “pqdtft” is the database code for Proquest Dissertations and Theses. I’m not sure where you find a list of these database codes in general; if you’ve made a succesful API request to that endpoint, there will be a <diagnosticMessage> element near the end of the response listing all database codes you have access to (but without corresponding full English names, you kind of have to guess).

The value of the ‘query’ parameter is a valid CQL query, as usual for SRU. It can be a bit tricky figuring out how to express what you want in CQL, but the CQL standard docs are decent, if you spend a bit of time with them to learn CQL.

Unfortunately, there seems to be no SRU “explain” response available from Proquest to tell you what fields/operators are available. But guessing often works, “title”, “author”, and “date” are all available — I’m not sure exactly how ‘date’ works, need to experiment more — although doing things like `date > 1990 AND date <= 2010` appears initially to work.

The CQL query param above un-escaped is:

title="global warming" AND author=Castet

Responses seem to be in MARCXML, and that seems to be the only option.

It looks like you can tell if a full text is available (on Proquest platform) for a given item, based on whether there’s an 856 field with second indicator set to “0” — that will be a URL to full text. I think. It looks like.

Did I mention if there are docs for any of this, I don’t have them?

So, there you go, a Proquest search API!

I also posted this to the code4lib listserv, and got some more useful details and hints from Andrew Anderson.

Oh, and if you want to link to a document you found this way, one way that seems to work is to take the Proquest document ID from the marc 001 field in the response, and construct a URL like `http://search.proquest.com/pqdtft/docview/$DOCID$`.  Seems to work. Linking to fulltext if it’s available otherwise a citation page.  Note the `pqdtft` code in the URL, again meaning ‘Proquest Dissertations and Theses’ — the same db I was searching to find to the doc id.

Posted in General | 1 Comment

Job in Systems department here at JHU

We have a job open where I work. The position will support ILL (ILLiad), reserves (ARES), and EZProxy software,  as well as do programming to integrate and improve UX for those areas of library workflow as well as others.


Johns Hopkins University has an immediate opening for a Software Engineer position in the Sheridan Libraries and Museums.  This exciting opportunity is located at the Homewood Campus in Baltimore, Maryland.  The incumbent will primarily be responsible for administering, developing and maintaining library systems to support three main services for all of the Johns Hopkins libraries: electronic reserves, inter-library loan and access to licensed resources.  The incumbent integrates supported library systems, such as ILLiad, Ares and EZproxy, with other systems at the university (ie. JHED directory, Shibboleth and Blackboard), in the library (ie. Library management system, Horizon) and with 3rd party licensed resources (ie. Ebscohost and JSTOR).  The incumbent works as a member of the enterprise applications team in the Library Systems department.

For additional information about the position and to apply, visit http://jobs.jhu.edu .  Locate Job # 60195  and click “Apply.”  To be considered for this position, you must complete an online application.

Qualifications:
– Bachelor’s degree required
– Five years of related work experience with computer systems and applications.
– Experience with Windows Server, IIS, MS SQL Server
– Progressive experience with programming language
– Knowledge of library systems, such as ILLiad, Ares, EZproxy, Horizon, etc.

Johns Hopkins University is an equal opportunity/affirmative action employer committed to recruiting, supporting, and fostering a diverse community of outstanding faculty, staff, and students.  All applicants who share this goal are encouraged to apply.

Posted in General | Leave a comment

‘hamburger’ button usability

There’s a UI element that seems to have really caught on, which I’m skeptical of.

A button in a navbar that looks like three horizontal lines, sometimes called the ‘hamburger’ button.  (thanks @dbs).

It often  opens a sidebar of additional options and was possibly first used to open a sidebar,  so is sometimes also referred to simply as ‘side navigation’ pattern.

As used prominently on Bloomberg.com:

bloomberg-expando

Or the new (and generally awesome) redesign of nytimes.com:

nytimes-navbar

Or the facebook mobile app (but, as far as I can tell, not the facebook website)

facebook-mobile

However, bootstrap uses this same icon  for something other than sliding out a sidebar of options. It’s still in the navbar — on small screens only and typically on the right side rather than left side as in previous examples — and it still makes some additional options appear, but by default not really in a sidebar style. Make your browser window narrow on getbootstrap.com to see and play with:

bootstrap-small-screen-expando

Whether used for a sidebar pullout or not, I am a bit skeptical of the usability here. Do users really know what this means, do they click on it?

Of course, users might come to recognize the ‘hamburger’ as more and more sites use it — certainly expert users like most of my readers are already quite familiar with it, but I suspect that many less sophisticated users haven’t caught on yet.

I haven’t been able to find any actual user-testing of the usability of these devices — I wonder why Nielsen hasn’t tackled it yet. But there are some other surveys of use and personal musings on it (made hard to find by lack of consensus term for the ‘hamburger’), here’s one good one from smashingmagazine, with some variations too, from over a year ago.

It may or may not be a problem for achieving recognition that the ‘hamburger’ is sometimes on the left and sometimes on the right. (Or that apparently Android has standardized on three-dots-with-swipe instead of three horizontal lines?)

Today I noticed the nytimes.com was providing popup hints on first page load (until a cookie says you’ve seen it), suggesting that nytimes has some reasons to believe users don’t notice the link or know what to do with it on their fairly new page design:

nytimes-prompt

A popup prompt like that seems at best a workaround to unclear UX, not a good solution. (I am also amused by “O.K.” rather than the more usual “OK” — following the nytimes style book, I’d guess?)

If you are scrolled all the way up on nytimes.com, the navbar changes in several ways, one of which is including a label on the ‘hamburger’ button:

nytimes-navbar-scrolledup

I wonder why they thought they didn’t have room for labelling it “Sections” ordinarily, but did when you are scrolled all the way up?

It might make sense to always include the label — if the screen is wide enough.

Of course, on narrow screens down to the several hundred pixels you get on a smartphone, you’ve got to do some things to compress your navbars as much as possible, which is perhaps the origin of the hamburger, and why bootstrap uses it.

I am particularly not fond of the bootstrap pattern, as far as what happens after you click on it, the weird expansion of  buttons in the navbar, rather than an actual sliding out sidebar. This is a different issue than “will the users know to click on the button”, I think what happens when you click on the button is just aesthetically displeasing — in some bootstrap-using apps, where the designers haven’t done enough to make sure it’s clean, I think it can be so aesthetically displeasing that it reaches the point of confusing, although I’m not finding a good example.

I’m not sure what the alternatives are. In some cases, there may be very application-specific solutions that can serve as alternatives to the hamburger — maybe you don’t need so many options accessible from the navbar at all, or can use completely different devices to get them?

Otherwise, I’d say try to always label your ‘hamburger’ if possible nytimes style — it seems that even the smallest screen should have room for a one-word label, no? But if you need to remove it tiny screens, it seems safer and clearer to still include it on larger screens, for designs where the ‘hamburger’ is present even on larger screens.

And as far as what happens when you click it, I much prefer an actual sidebar slide-out to bootstrap’s navbar-expand-slidedown, but you don’t get a sidebar-slide-out for free with bootstrap; it may be tricky to implement reliably in CSS/JS at all?  Anyone know of any good open source reusable implementations of actual sidebar slideout?

Posted in General | 1 Comment

Royal Library of Denmark goes live with Umlaut

The Royal Library of Denmark has gone live with an Umlaut implementation.

They’ve done some local UI customizations, including multi-lingualization. (We hope to get the i18n stuff merged into Umlaut core).

You can see their start page here.

Although at most libraries, users more often use Umlaut as a target of OpenURL linking from search platforms, rather than starting at the Umlaut start page. I’m not sure if the Royal Library’s use cases are typical in that way or not.  Royal Library’s Google Scholar preferences still seem to point directly to their SFX instance, not to their Umlaut instance. (And Google Scholar makes it increasingly hard for users to find and use this preference anyhow, honestly).

Posted in General | Leave a comment

Anecdote for when metrics go bad

More and more parts of our society are metric-obsessed these days.

While I’m in theory in favor of science, scientific approaches based on observation, data-driven decisions, etc.—I am also more and more cautious of the ways that metrics can be abused, be gamed, be interpreted incorrectly, and generally be insufficient proxies for the thing meant to be measured.   There are a bunch of possible reasons this can happen.

Just an anecdote, here’s a real phone call I just got:

“Hi, this is [X] calling from [Honda Dealership]. We’d like to thank you for bringing your car in for service the other day. You may be receiving a survey from Honda about your satisfaction. We just wanted to let you know that if you rate us 100% in all categories, then your next oil change will be free.”

Ah, yeah. Clearly the survey is from corporate Honda America or whatever; the phone call was from the dealer; and I’d guess that both the dealer as a whole and the individual staff at the dealer have various kinds of compensation pegged to these survey results.

(Incidentally, after seeing how over-priced the service was for the oil/filter change, air filter change in both cabin and engine, wiper refills, and checks of various lines and fluids — it’s very unlikely i’ll be going to the dealer again. It was at least 2x what it probably should have been, I am feeling serious pain in the wallet).

This kind of blatant manipulation certainly isn’t the only way that metrics-based approaches can end up misleading.  Talk to any teacher about “School Reform”, for issues with similar “performance-based compensation” — the problem isn’t just intentional abuse, but that the operationalized measurements may simply not validly measure what you really want to measure, for all sorts of reasons.  Inept application of statistical methods (which I think is awfully common even among scientists, let alone among Car Companies and Libraries) can compound the issue.

Then a larger philosophical issue is how choosing to look at things only in terms of quantitative measurements effects our judgements and perspectives. Someone recently recommended Michel Foucault to me on this topic, although I’m not sure which work of his would be most pertinent.

Use metrics for sure, but don’t be ruled by them, and don’t assume that just because something is reported to you in a quantitative fashion that automatically means it’s objective, accurate, or valid. (Or that any of those categories, ‘objective’, ‘accurate’, or ‘valid’, are simple yes-or-no, rather than questions of degree).

Posted in General | Leave a comment

An items out page

Our ‘items out’ page is one of the most viewed pages on our ‘catalog’. This makes sense, people need to see when their items are due, as well as to (try and) renew their items, which is also done on this page.

Since it’s one of the most used, it makes sense to spend some time trying to make it nice and usable.

We are on in control of the items out page in our Rails (Blacklight-powered) ‘catalog’ app, as a pretty thin UI layer on top of our ILS.  We use a combination of screen-scraping-like techniques and other horrible esoterica to provide our own UI layer. I think it’s worth it, due to how important this page is to our users, and it’s mostly worked out.

Well, at least we’re mostly in control of the UI, we are constricted in many ways by the underlying ILS, as well as by policies and business processes (that are, as a rule, much harder to change than technology, as most of my readers know!).

Anyhow, one of the first things we added to this page, which I’m still very proud of, is the addition of ‘relative time’ due dates like “1 week”, “6 months”, or “2 days”, using the Rails distance_of_time_in_words_to_now helper.

Here is a very early preliminary iteration of a someday-in-the-future new version of the page, part of an upgrade of our app to use a new version of Blacklight, and Bootstrap 3.x for the styling.

Screenshot 2014-01-16 14.32.14

That is actually using an html <table> there for layout, which I think is probably appropriate in this case, it is essentially tabular data. (It would be fairly straightforward to do it without a <table>, especially if you are willing to use CSS “display:table” (if you don’t care about IE7), but I think a table is actually probably right.)

Ah, but part of this redesign (again, for the benefit of my local colleagues reading this, I will emphasize very early preliminary work) is to make sure the website works well on small screens and touch screens and small touch screens — you know, ‘mobile’. And I fully expect that the ‘items out’ screen will be just as, or more, used on mobile as on desktop.

What do you do with that page on a very small screen? How about something along these basic lines:

Screenshot 2014-01-16 14.36.43

The first screenshot and the second screenshot are actually the same HTML, only transformed by responsive CSS, using basically the technique described in this blog post. 

The HTML starts out <table> and is responsively transformed to block and inline-block display by CSS at small screen widths.

In retrospect, it would probably have been a better idea to start out with the block-display ‘mobile’ version, and use CSS media queries to responsively change to a table at large sizes, and it may iterate in that direction. (There’s a note at the end of the responsive table blog post with a link to an example of a mobile-first responsive approach).

That wouldn’t work in IE7 as it requires CSS “display:table” to responsively make something into a table display — or at least it’d be  harder, you’d have to emulate a table with just sized divs. But once you’ve gone Bootstrap3, IE7 is pretty much a lost cause anyway. (Bootstrap3 says while IE7 isn’t officially supported, it ‘should look and behave well enough’ in IE7. But I suspect if you develop a site in bootstrap3 without testing in IE7 constantly as you iterate, most such sites are going to end up not working in IE7 and needing very significant reworking to do so).

But there would be some benefits to ‘mobile first’ here. It would mean that any browser which can’t handle media queries, or can’t handle setting “display:block” on <tr>’s and <td>’s — would get the more basic ‘small screen version’.  Which is probably a better fallback than getting the table version even if you have a small screen, and definitely better than completing messing up a browser that can’t handle “display:block” on table elements, which the responsive table blog post warns can be a problem in IE9 and down. (Yeah, we still got to support IE9!)

It would also be more consistent with bootstrap3, which is also designed ‘mobile first’, with initial CSS appropriate for small screens, and CSS media queries changing display on larger screens. Bootstrap probably chose to go this way for similar reasons; but once it has, and you are building out your own CSS on top of Bootstrap, it pays to try and be consistent with Bootstrap. I’ve found that, when developing on a bootstrap3 framework,  following my previous habits of designing for a large screen and responsively changing for smaller screens, can produce weird interactions with the bootstrap3 mobile-first CSS. Best to follow boostrap3’s lead when using bootstrap3.

Posted in General | Leave a comment

ruby 1.9.3 end-of-lifed

The ruby community has a pretty extreme forward-looking stance. Generally, the ruby development community (on ruby itself as well as open source ruby code) seems to prioritize innovation and improvement over backwards compatibility and maintenance of old code. It would be nice to think you can do both, but it’s unrealistic, development resources and ingenuity is limited, you have to dial one side back to dial the other side up — the ruby community tends to move the dial relatively far to the ‘innovation over maintenance’ end.

This has plusses and minuses.

On the positive side, we get to keep moving forward with better API’s and better code — in the stdlib, in our open source dependencies. The fact that pretty much everyone made the somewhat difficult jump from ruby 1.8.7 to 1.9.3 within just a couple years is pretty amazing. Compare to Python 2->3. (A separate post could be written about what led to ruby’s 1.8->1.9 success; it’s not only a matter of will). At this point, while there are still people with legacy 1.8.7 apps for various reasons — one of the reasons is hardly ever that an open source dependency they have won’t work on any later rubies. That’s pretty amazing.

The negative, of course,  is that sometimes it seems like I spend way too much time running on a treadmill just keeping my existing code current, as new versions of ruby or dependencies are released that are (in large or small) not entirely backwards compatible. It feels like just yesterday that I updated all my apps from ruby 1.8.7 to 1.9.3 (a challenging and ‘expensive’ process), and now 1.9.3’s got a year of life left.   And some people who can’t afford that time are still stuck on old codebases which are no longer compatible with current stdlib or open source releases, and don’t receive upstream security patches anymore. (This can especially be an issue with large ‘enterprise’ infrastucture. For instance, Twitter still runs at least some ruby 1.8.7-based code.)

Fortunately, at least, I think lots of people did take some lessons from how time-consuming and painful the ruby 1.8.7->1.9 migration was (and the similarly timed Rails2->3 migration), and are trying to make future migrations less painful.

I think it likely that the migration from 1.9.3 to 2.0 or 2.1 (I’ll prob skip all the way to 2.1 why not) will be relatively painless.   In fact, my biggest concerns areabout the task of smoothly managing the transition of installed/used ruby on my deployment and staging machines, rather than in the application software itself.  I guess I’ll probably go to Rails4 at the same time. (I can’t figure out when Rails 3.x will stop receiving even security patches, maybe when Rails5 comes out? Not sure when that is either. But it worries me.)

https://www.ruby-lang.org/en/news/2014/01/10/ruby-1-9-3-will-end-on-2015/

Support for Ruby version 1.9.3 will end on February 23, 2015.
Today we are announcing our plans for the future of Ruby version 1.9.3.

Currently this branch is in maintenance mode, and will remain so until February 23, 2014.

After February 23 2014, we will only provide security fixes for 1.9.3 until February 23 2015, after which all support will end for 1.9.3.

We highly recommend you upgrade to Ruby 2.1 or 2.0.0 as soon as possible.

Posted in General | Leave a comment

HackerNews down, unwisely returning http 200 for outage message

So HackerNews is currently down. It happens, we’ll probably find out why when it returns.

For a while yesterday, all HN URLs were returning error messages from CloudFlare, apparently their CDN. But today, all HN URLs are returning an apparently intentional outage message, “Sorry for the downtime. We hope to be back soon.”

But they are returning an HTTP 200 “OK” status code with this message. From all URLs.

This seems like a big mistake. You are telling any interested software (such as, say, Google), “All is well, and this is the proper content for the URL you requested.” For every single URL. Google might index this content. Google might decide that since there are a bazillion URLs at your hostname that all have the same content, relevancy/pagerank decisions should be made based on this (probably harming your visibility; why would a big website with a million URLs all of which say “Sorry for the downtime. We hope to be back soon” be given good visibility by a search engine). Etc.

I don’t know if Google in particular really does this; perhaps Google is smart enough to deal with improper 200’s for error pages, somehow using heuristics to guess that it’s really a temporary error that should be ignored. Not sure how that would work, but Google is often clever, I dunno.

But it’s still a bad idea. Don’t return 200’s for error pages. Use an appropriate response code for temporary outages. Google itself seems to suggest using 503 Service Temporarily Unavailable, which makes a lot of sense. If you can’t do this for some reason, perhaps you could use a 307 Temporary Redirect to redirect to an outage message — you’re saying it’s a ‘temporary’ redirect which shouldn’t be considered long-term content by indexers and such. (A 301 Permanent Redirect, or a 404 Not Found, seems just as bad as a 200).

In HackerNews’s case, it may actually be  CloudFlare returning the 200, through misconfiguration or poorly thought out feature from CloudFlare.  Either way, it seems like a bad idea.

Use HTTP response codes responsibly, and software agents consuming your web page will be happier!  And there are some software agents (like Google), you really want to keep happy.

Actually, now that I look at those headers — those cache control headers seem unwise too. Am I wrong, or is the response telling agents they can cache the “Sorry for the downtime” message for 10 years? That doesn’t seem wise either, does it?

$ curl -i https://news.ycombinator.com
HTTP/1.1 200 OK
Server: cloudflare-nginx
Date: Mon, 06 Jan 2014 15:22:04 GMT
Content-Type: text/html
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: __cfduid=[ommitted]; expires=Mon, 23-Dec-2019 23:50:00 GMT; path=/; domain=.ycombinator.com; HttpOnly
Last-Modified: Mon, 06 Jan 2014 13:14:48 GMT
Vary: Accept-Encoding
Expires: Thu, 04 Jan 2024 13:14:48 GMT
Cache-Control: max-age=315352364
Cache-Control: public
CF-RAY: [omitted]

<html>
<head>
  <link rel="stylesheet" type="text/css" href="/news.css">
  <link rel="shortcut icon" href="/favicon.ico">
  <title>Hacker News</title>
</head>
<body>
  <center>
    <table border="0" cellpadding="0" cellspacing="0" width="85%" bgcolor="#f6f6ef">
      <tr>
        <td bgcolor="#ff6600">
          <table border="0" cellpadding="0" cellspacing="0" width="100%" style="padding:2px">
            <tr>
              <td style="width:18px;padding-right:4px">
                <a href="http://ycombinator.com">
                  <img src="/y18.gif" width="18" height="18" style="border:1px #ffffff solid;" />
                </a>
              </td>
              <td style="line-height:12pt; height:10px;">
                <span class="pagetop"> <b><a href="/news">Hacker News</a></b>
                </span>
              </td>
            </tr>
          </table>
        </td>
      </tr>
      <tr style="height:10px"></tr>
      <tr>
        <td>
          Sorry for the downtime. We hope to be back soon.
        </td>
      </tr>
      <tr>
        <td>
          <img src="s.gif" height="10" width="0" />
          <table width="100%" cellspacing="0" cellpadding="1">
            <tr>
              <td bgcolor="#ff6600"></td>
            </tr>
          </table>
          <br />
        </td>
      </tr>
    </table>
  </center>
</body>
</html>
Posted in General | 31 Comments

CC 4.0 updated for use with data?

I had a previous post on how Creative Commons licenses aren’t (weren’t?) suitable for licensing data, for several reasons. 

Charles Nepote helpfully commented on that post linking to a December announcement on changes in CC 4.0 related to data licensing.

It appears that CC agreed that past CC licenses were problematic for data use, and are attempting to address that.  Mainly by explicitly addressing ‘database rights’ in addition to copyright — CC licenses legality was previously based on the licensor having copyright in the thing licensed, and the rights to grant licenses and enforce restrictions through copyright. But in many jurisdictions (including the U.S.), there may not be any copyright existing over ‘data’,  but in some jurisdictions (but not the U.S.)  there may (instead or in addition) be certain legal ‘database rights’.

So it looks like CC 4.0 tries to mention database rights to use those as the basis for licensing, in contexts where database rights may exist but not copyright.

Additionally, according to the announcement, CC 4.0 tries to be more flexible with how ‘attribution’ requirements can be complied with, in ways that will make it more reasonable for data uses. I’m not sure if this is represented in the actual license legal language, or just in the FAQ’s and other supporting documents.

I haven’t spent a lot of time looking over the changes myself and have no opinion on how effective or suitable they are.  I continue to have some concerns about data ‘licensing’ in the U.S. where some things we think of as ‘data’ will not be copyrightable (those that are considered by the courts to be mainly ‘factual’ information — which may or may not be what you or I would consider to be ‘mainly factual information’).  And in the U.S., there is no such thing as distinct ‘database rights’ at all.

If you have neither copyright nor ‘database rights’ over data, then you really have no legal ability to enforce restrictions on it’s use at all, and trying to convince people you do anyway is really a form of copyfraud, over-reaching by content ‘owners’ (or controllers) trying to restrict the rights of the social public beyond what is legally intended.  I think we should be encouraging more widespread recognition of existing public rights to use certain things (like data which is not copyrightable) without permission, rather than encouraging content controllers to try and convince people they need permission when the law doesn’t support that.

While the CC data FAQ does  try to recognize this, with statements like “If you are not exercising an exclusive right held by the database maker, then you do not need to rely on the license to mine.”  This is great. As is their stated effort to make sure that “CC license terms and conditions are not triggered by uses permitted under any applicable exceptions and limitations to copyright, nor do license terms and conditions apply to elements of a licensed work that are in the public domain. This also means that CC licenses do not contractually impose restrictions on uses of a work where there is no underlying copyright.”  This is great too!

But CC doesn’t offer much practical guidance on figuring out when this is the case.  Nor is it probably feasible to offer such guidance, as it’s a complicated legal question which can differ by jurisdiction. But I’d rather we were encouraging and supporting people to expand their use of legally unencumbered data, rather than providing tools which encourage treating possibly unencumbered data as if it were legally controlled.

For that reason, I continue to support strongly considering CC0 or equivalent releases or dedications (rather than strictly licenses) for data use, that simply release the data into the public domain in jurisdictions where that’s required, while acknolwedging in some jurisdictions it may not have been required at all; I think it’s better for all of us.

However, I’m glad that CC is at recognizing some of the issues and attempting to address them. Previous warnings about previous versions of CC being unsuitable for data didn’t seem to impede it’s widespread use for data anyway!  It’s definitely worth reading the post about CC 4.0 and open data, as well as the Creative Commons Data Guidance page linked to.

I’m also quite pleased to see in that guidance page that “CC does not recommend use of its NonCommercial (NC) or NoDerivatives (ND) licenses on databases intended for scholarly or scientific use.” — I guess that leaves attribution (BY) and “share alike” (SA)?  As well as recommending against trying to license “some rather than all of the rights they have in a database.” If people stick to these and similar recommendations, using only a “BY” license and expecting the more liberal conceptions of attribution compliance, my concerns will be ameliorated. (“SA” is still very tricky, and could create license incompatibilities where you can’t combine data that at some point came from a CC-SA licensed database with data from other sources with other licenses — if the data is controlled by copyright or database rights in the first place, which it may not be even if originators try to tell you it is!)

In general, I appreciate the intentions of CC towards data with the CC 4.0 licenses, and consider is a step forward, although I’d still strongly advise considering if you can simply allow your data to be used unencumbered rather than attempting to impose restrictions on it’s use — recognizing it may have no legal protection anyway, and it may be difficult to determine whether it does or not, especially in the U.S.  But if you do decide you need to try and impose restrictions, following CC’s advice on how to do this, and using CC 4.0 to do so (preferably just “BY”), seems probably a good way to go.

Posted in General | Leave a comment