So, according the Chronicle of Higher Education, a vendor that sold cloud storage of research data to researchers had a crash where they lost a bunch of data. (Thanks Dr. Sarah Roberts for the pointer.)
This is a disaster for many of the researchers.
I think the Chronicle tells exactly the wrong story with their emphasis and headline though: “Hazards of the Cloud: Data-Storage Service’s Crash Sets Back Researchers.”
Hazards of the cloud, you think? The alternative to storing your research data in ‘the cloud’ is…. researchers keeping it themselves on local file storage?
Your typical person, even your typical scientist, just keeping their own files on their own Hard Drives… I do not think they are capable of doing this with a higher level of reliability than a competent IT organization or business specializing in this. Of course, some organizations are more competent than others, it sounds like SocioCultural Research Consultants with their Dedoose product is especially incompetent, if they don’t have any recoverable backups of their customers’ data.
But even though trusting a third party with your data is scary because they might be incompetent… leaving individual researchers to fend for themselves as far as storing research data is a recipe for disaster. Storing data reliably is something it takes skilled experts to do right, researchers in other fields are not qualified to do this on their own. (It turns out evaluating a third-party vendor’s competence is also tricky!)
And it’s not just reliability. It’s security. 2013-2014 are like the years of the IT industry collectively realizing that security is really hard to do right. And when we’re talking research data, “security” means confidentiality and privacy of research subjects. Depending on the nature and risk of the research, a security breach can mean embarrassment or much much worse for your research participants.
I’m an IT professional, which just means I know enough to know I wouldn’t even trust myself to keep high-risk research data secure. I’d want storage and security specialists involved. Individual researchers? Entrusting this task to overworked grad students? Forget it.
This is not a hazard of the cloud. This is a hazard of digital research data. It doesn’t go away if everyone avoids “the cloud.” I absolutely think with confidence that research data stored on local hard drives on research team members’ desks or laptops — possibly multiple copies on multiple team members’ laptops — is, by and large, going to be less secure than research data stored by a competent professional third-party entity specializing in this task.
“The cloud” — if that means a remote server managed by someone else (and that’s pretty much all ‘the cloud’ means in this context) — is part of the solution, not the nature of the problem. When that ‘someone else’ is a competent expert entity.
Ideally, I think, universities should be providing this service for their affiliated researchers. Rather than leaving them to fend for themselves, whether in local storage or in individual agreements with vendors. In fact, it would make a lot of sense for university libraries in particular to be providing this service. University libraries have started thinking about how to play a role in preserving research data for archival/historical purposes. The best way to be positioned to do this, is to play a role in storing the data in the first place, a service that researchers have an immediate need and direct interest in.
I’m not sure I trust universities or university libraries to be able to provide secure and reliable data storage either though. Universities have a tendency to underspend and under-provision IT projects, compared to what’s really necessary for a high quality reliable product. It would probably make sense for universities to pool their resources in consortiums to create data storage services architected and staffed by competent professionals (compensated enough to get highly skilled professionals). So we’d be back to ‘the cloud’ after all, if perhaps university-owned ‘cloud service’. But it’s not the cloud that’s a hazard; the hazard is that storing data reliably and securely is a non-trivial task that takes professional specialists to get right.