“Is the semantic web still a thing?”

A post on Hacker News asks:

A few years ago, it seemed as if everyone was talking about the semantic web as the next big thing. What happened? Are there still startups working in that space? Are people still interested?

Note that “linked data” is basically talking about the same technologies as “semantic web”, it’s sort of the new branding for “semantic web”, with some minor changes in focus.

The top-rated comment in the discussion says, in part:

A bit of background, I’ve been working in environments next to, and sometimes with, large scale Semantic Graph projects for much of my career — I usually try to avoid working near a semantic graph program due to my long histories of poor outcomes with them.

I’ve seen uncountably large chunks of money put into KM projects that go absolutely nowhere and I’ve come to understand and appreciate many of the foundational problems the field continues to suffer from. Despite a long period of time, progress in solving these fundamental problems seem hopelessly delayed.

The semantic web as originally proposed (Berners-Lee, Hendler, Lassila) is as dead as last year’s roadkill, though there are plenty out there that pretend that’s not the case. There’s still plenty of groups trying to revive the original idea, or like most things in the KM field, they’ve simply changed the definition to encompass something else that looks like it might work instead.

The reasons are complex but it basically boils down to: going through all the effort of putting semantic markup with no guarantee of a payoff for yourself was a stupid idea.

The entire comment, and, really the entire thread, are worth a read. There seems to be a lot of energy in libraryland behind trying to produce “linked data”, and I think it’s important to pay attention to what’s going on in the larger world here.

Especially because much of the stated motivation for library “linked data” seems to have been: “Because that’s where non-library information management technology is headed, and for once let’s do what everyone else is doing and not create our own library-specific standards.”  It turns out that may or may not be the case, if your motivation for library linked data was “so we can be like everyone else,” that simply may not be an accurate motivation, everyone else doesn’t seem to be heading there in the way people hoped a few years ago.

On the other hand, some of the reasons that semantic web/linked data have not caught on are commercial and have to do with business models.

One of the reasons that whole thing died was that existing business models simply couldn’t be reworked to make it make sense. If I’m running an ad driven site about Cat Breeds, simply giving you all my information in an easy to parse machine readable form so your site on General Pet Breeds can exist and make money is not something I’m particularly inclined to do. You’ll notice now that even some of the most permissive sites are rate limited through their API and almost all require some kind of API key authentication scheme to even get access to the data.

It may be that libraries and other civic organizations, without business models predicated on competition, may be a better fit for implementation of semantic web technologies.  And the sorts of data that libraries deal with (bibliographic and scholarly) may be better suited for semantic data as well compared to general commercial business data.  It may be that at the moment libraries, cultural heritage, and civic organizations are the majority of entities exploring linked data.

Still, the coarsely stated conclusion of that top-rated HN comment is worth repeating:

going through all the effort of putting semantic markup with no guarantee of a payoff for yourself was a stupid idea.

Putting data into linked data form simply because we’ve been told that “everyone is doing it” without carefully understanding the use cases such reformatting is supposed to benefit and making sure that it does — risks undergoing great expense for no payoff. Especially when everyone is not in fact doing it.

GIGO

Taking the same data you already have and reformatting as “linked data” does not neccesarily add much value. If it was poorly controlled, poorly modelled, or incomplete data before — it still is even in RDF.   You can potentially add a lot more value and more additional uses of your data by improving the data quality than by working to reformat it as linked data/RDF.  The idea that simply reformatting it as RDF would add significant value was predicated on the idea of an ecology of software and services built to use linked data, software and services exciting enough that making your data available to them would result in added value.  That ecology has not really materialized, and it’s hardly clear that it will (and to the extent it does, it may only be if libraries and cultural heritage organizations create it; we are unlikely to get a free ride on more general tools from a wider community).

But please do share your data

To be clear, I still highly advocate taking the data you do have and making it freely available under open (or public domain) license terms. In whatever formats you’ve already got it in.  If your data is valuable, developers will find a way to use it, and simply making the data you’ve already got available is much less expensive than trying to reformat it as linked data.  And you can find out if anyone is interested in it. If nobody’s interested in your data as it is — I think it’s unlikely the amount of interest will be significantly greater after you model it as ‘linked data’. The ecology simply hasn’t arisen to make using linked data any easier or more valuable than using anything else (in many contexts and cases, it’s more troublesome and challenging than less abstract formats, in fact).

Following the bandwagon vs doing the work

Part of the problem is that modelling data is inherently a context-specific act. There is no universally applicable model — and I’m talking here about the ontological level of entities and relationships, what objects you represent in your data as distinct entities and how they are related. Whether you model it as RDF or just as custom XML, the way you model the world may or may not be useful or even usable by those in different contexts, domains and businesses.  See “Schemas aren’t neutral” in the short essay by Cory Doctorow linked to from that HN comment.  But some of the linked data promise is premised on the idea that your data will be both useful and integrate-able nearly universally with data from other contexts and domains.

These are not insoluble problems, they are interesting problems, and they are problems that libraries as professional information organizations rightly should be interested in working on. Semantic web/linked data technologies may very well play a role in the solutions (although it’s hardly clear that they are THE answer).

It’s great for libraries to be interested in working on these problems. But working on these problems means working on these problems, it means spending resources on investigation and R&D and staff with the right expertise and portfolio. It does not mean blindly following the linked data bandwagon because you (erroneously) believe it’s already been judged as the right way to go by people outside of (and with the implication ‘smarter than’) libraries. It has not been.

For individual linked data projects, it means being clear about what specific benefits they are supposed to bring to use cases you care about — short and long term — and what other outside dependencies may be necessary to make those benefits happen, and focusing on those too.  It means understanding all your technical options and considering them in a cost/benefit/risk analysis, rather than automatically assuming RDF/semantic web/linked data and as much of it as possible.

It means being aware of the costs and the hoped for benefits, and making wise decisions about how best to allocate resources to maximize chances of success at those hoped for benefits.   Blindly throwing resources into taking your same old data and sharing it as “linked data”, because you’ve heard it’s the thing to do,  does not in fact help.

8 thoughts on ““Is the semantic web still a thing?”

  1. Understanding RDF was a part of my job early on at OCLC. I wasn’t able to convince myself that the use cases/value propositions were there. There has been a continuing stream of nifty things that people have done with RDF/Linked Data, but I don’t recall ever seeing an example where the RDF approach worked better/faster than alternatives. (Except in narrowly constrained cases.) None-the-less, I have relented to the tide of Linked Data, because I really don’t have a choice.

  2. (copied from a comment on the y-combinator thread, where the poster complained about the notion of hierarchies. I did not mention that even Watson used ontologies to some extent, though the bulk of the tech is statistical + free text parsing using sophisticated adjacency grammars. )

    Yes, and no, Minister.

    Hierarchy will not die off because it is a core part of the way that the mind organizes concepts- for example, prototype effects imply several different levels.

    Folk Taxonomies are hierarchical, though the depth is usually smaller than that of scientific taxonomies and the principles of organization is usually different.

    There can be many ways of arranging the same concepts, and although it *is* possible to show that some are incorrect, it is in general impossible to show that one and only one Ontology is correct; this follows from the indeterminacy of translation (Google Gavagai!)

    Attempts to force an alien Ontology onto subject matter experts breaks them. They just stop performing at an expert level.

    It is usually possible to develop a suite of ontologies that are logically interoperable, but this requires experts who have skills from a variety of disciplines AND who are capable of deferring to the SMEs on how they see the world. It may be necessary to have intermediate mapping ontologies, but if the ontologists working with the different communities of interest are careful, these mappings can avoid losing meaning.

    Tagging as ad-hoc keywords does not work for data interoperability; it also usually fails to achieve good recall. Flat lists of controlled terms are usually difficult to apply unless they are very small. When Thomas Vander Wal coined the term Folksonomy, it was intended to cover the same kinds of structures as folk taxonomies. Its subsequent application to describe unstructured lists of tags was a misappropriation.

    RDF and OWL added some extra problems. They were designed without much input from people with actual use cases, and optimised for the wrong things.
    Some things were dumbed down because some people didn’t understand what had gone before, and could not understand why the old folk were kicking up such a fuss.
    Other things were constrained in order to make OWL DL decidable, even though the resulting worst case complexity of 2-NEXPTIME means that implementations have to work on special cases, check for too slow results, and / or limit expressivity to sub profiles just in order to work.

    Other design decisions did not consider the human factors of using OWL.
    It is very difficult to explain to people why restricting the range of a property for a specific class is handled by adding an unnamed superclass. It is also difficult to explain the Open World Assumption, and the non unique name assumption.
    It is also hard to explain why, in a web environment with no standard world closing mechanism, making everything monotonic is necessary.

    It is especially hard to justify the restriction of RDF to binary predicates- some predicates are intrinsically of higher arity; just because higher arity predicates can be misused, and it is possible to reduce everything to binary does not make it desirable.

    Having a model that does not match the existing experience of UML modelers, database designers, or users of other KR systems causes real problems for real users.

    Nevertheless there is baby in the bathwater, and it can become soup. It just might not look the same.

    The schema.org efforts are limited to the extent that they are barely semantic (a conscious decision by danbri and guha); unfortunately some of the discussions by others show a visceral distain for any questions as to what vocabulary choices would actually mean that is almost anti-semantism.

  3. Linked data dominates cataloging and bib standard discussion lists as if it’s a forgone conclusion. Yet other content providers like journal databases pretend the conversation doesn’t exist. It would be a good idea for libraries to work with journal publishers for seamless access in the catalog. It does happen, but only rarely. It’s expensive. People look to libraries for this access but libraries have no money. Libraries are thought of as backward be cause they don’t offer this. I wondered too if the semantic web still is a thing. Maybe it’s not because it involves sharing, which many entities don’t like to do.

Leave a comment