A few years ago, it seemed as if everyone was talking about the semantic web as the next big thing. What happened? Are there still startups working in that space? Are people still interested?
Note that “linked data” is basically talking about the same technologies as “semantic web”, it’s sort of the new branding for “semantic web”, with some minor changes in focus.
The top-rated comment in the discussion says, in part:
A bit of background, I’ve been working in environments next to, and sometimes with, large scale Semantic Graph projects for much of my career — I usually try to avoid working near a semantic graph program due to my long histories of poor outcomes with them.
I’ve seen uncountably large chunks of money put into KM projects that go absolutely nowhere and I’ve come to understand and appreciate many of the foundational problems the field continues to suffer from. Despite a long period of time, progress in solving these fundamental problems seem hopelessly delayed.
The semantic web as originally proposed (Berners-Lee, Hendler, Lassila) is as dead as last year’s roadkill, though there are plenty out there that pretend that’s not the case. There’s still plenty of groups trying to revive the original idea, or like most things in the KM field, they’ve simply changed the definition to encompass something else that looks like it might work instead.
The reasons are complex but it basically boils down to: going through all the effort of putting semantic markup with no guarantee of a payoff for yourself was a stupid idea.
The entire comment, and, really the entire thread, are worth a read. There seems to be a lot of energy in libraryland behind trying to produce “linked data”, and I think it’s important to pay attention to what’s going on in the larger world here.
Especially because much of the stated motivation for library “linked data” seems to have been: “Because that’s where non-library information management technology is headed, and for once let’s do what everyone else is doing and not create our own library-specific standards.” It turns out that may or may not be the case, if your motivation for library linked data was “so we can be like everyone else,” that simply may not be an accurate motivation, everyone else doesn’t seem to be heading there in the way people hoped a few years ago.
On the other hand, some of the reasons that semantic web/linked data have not caught on are commercial and have to do with business models.
One of the reasons that whole thing died was that existing business models simply couldn’t be reworked to make it make sense. If I’m running an ad driven site about Cat Breeds, simply giving you all my information in an easy to parse machine readable form so your site on General Pet Breeds can exist and make money is not something I’m particularly inclined to do. You’ll notice now that even some of the most permissive sites are rate limited through their API and almost all require some kind of API key authentication scheme to even get access to the data.
It may be that libraries and other civic organizations, without business models predicated on competition, may be a better fit for implementation of semantic web technologies. And the sorts of data that libraries deal with (bibliographic and scholarly) may be better suited for semantic data as well compared to general commercial business data. It may be that at the moment libraries, cultural heritage, and civic organizations are the majority of entities exploring linked data.
Still, the coarsely stated conclusion of that top-rated HN comment is worth repeating:
going through all the effort of putting semantic markup with no guarantee of a payoff for yourself was a stupid idea.
Putting data into linked data form simply because we’ve been told that “everyone is doing it” without carefully understanding the use cases such reformatting is supposed to benefit and making sure that it does — risks undergoing great expense for no payoff. Especially when everyone is not in fact doing it.
Taking the same data you already have and reformatting as “linked data” does not neccesarily add much value. If it was poorly controlled, poorly modelled, or incomplete data before — it still is even in RDF. You can potentially add a lot more value and more additional uses of your data by improving the data quality than by working to reformat it as linked data/RDF. The idea that simply reformatting it as RDF would add significant value was predicated on the idea of an ecology of software and services built to use linked data, software and services exciting enough that making your data available to them would result in added value. That ecology has not really materialized, and it’s hardly clear that it will (and to the extent it does, it may only be if libraries and cultural heritage organizations create it; we are unlikely to get a free ride on more general tools from a wider community).
But please do share your data
To be clear, I still highly advocate taking the data you do have and making it freely available under open (or public domain) license terms. In whatever formats you’ve already got it in. If your data is valuable, developers will find a way to use it, and simply making the data you’ve already got available is much less expensive than trying to reformat it as linked data. And you can find out if anyone is interested in it. If nobody’s interested in your data as it is — I think it’s unlikely the amount of interest will be significantly greater after you model it as ‘linked data’. The ecology simply hasn’t arisen to make using linked data any easier or more valuable than using anything else (in many contexts and cases, it’s more troublesome and challenging than less abstract formats, in fact).
Following the bandwagon vs doing the work
Part of the problem is that modelling data is inherently a context-specific act. There is no universally applicable model — and I’m talking here about the ontological level of entities and relationships, what objects you represent in your data as distinct entities and how they are related. Whether you model it as RDF or just as custom XML, the way you model the world may or may not be useful or even usable by those in different contexts, domains and businesses. See “Schemas aren’t neutral” in the short essay by Cory Doctorow linked to from that HN comment. But some of the linked data promise is premised on the idea that your data will be both useful and integrate-able nearly universally with data from other contexts and domains.
These are not insoluble problems, they are interesting problems, and they are problems that libraries as professional information organizations rightly should be interested in working on. Semantic web/linked data technologies may very well play a role in the solutions (although it’s hardly clear that they are THE answer).
It’s great for libraries to be interested in working on these problems. But working on these problems means working on these problems, it means spending resources on investigation and R&D and staff with the right expertise and portfolio. It does not mean blindly following the linked data bandwagon because you (erroneously) believe it’s already been judged as the right way to go by people outside of (and with the implication ‘smarter than’) libraries. It has not been.
For individual linked data projects, it means being clear about what specific benefits they are supposed to bring to use cases you care about — short and long term — and what other outside dependencies may be necessary to make those benefits happen, and focusing on those too. It means understanding all your technical options and considering them in a cost/benefit/risk analysis, rather than automatically assuming RDF/semantic web/linked data and as much of it as possible.
It means being aware of the costs and the hoped for benefits, and making wise decisions about how best to allocate resources to maximize chances of success at those hoped for benefits. Blindly throwing resources into taking your same old data and sharing it as “linked data”, because you’ve heard it’s the thing to do, does not in fact help.