So a true story example.
OCLC has the really cool xID services, xISSN, xOCLCnum, etc. Xiaoming Liu has done a really great job with it.
One thing the xID service doesn’t really tell us about though is relationships between collected-works and individual items. “The Lord of the Rings” is one workset-group in Worldcat, and “The Two Towers” (published individually) is another. I think this is the correct way to model this. But it would be so useful if xID could tell you, when you asked for an identifier correspnding to LOTR, that that workset group has a relationshp to another one, the one for The Two Towers.
It turns out this is kind of tricky, because we don’t have quite right structured metadata in our inherited AACR2-MARC. But it turns out, fortunately, we do have SOME info in here, becuase as Xiaoming mentions in a comment here:
Especially by taking advantage of a cataloging rule about ISBN numbers in sets/volume: “If you are cataloging a multivolume monograph, enter both the set number and the individual volume numbers, if available. Enter the number for the set first.”
Now, I know this is something that Xiaoming is interested in improving, and probably OCLC institutionally is too. But OCLC has limited resources, xID doesn’t actually bring in much if any income of it’s own, they have to prioritize things. It’s hard to predict when OCLC might get to this, could be a while. Can’t fault OCLC for having limited resources, everyone does.
But maybe someone else wanted to work on an algorithm for doing this. Maybe they come up with something good. Maybe they want to provide such a service. As test data for their development, and to make such a finished service useful, they’d need a big corpus, like, WorldCat. Maybe OCLC would give them permission to use the WorldCat corpus like that–if they are willing to sign away certain rights on what to do with it, and if OCLC doens’t think it threatens WorldCat’s business model. But even having to ask and negotiate is a barrier to agile experimentation and innovation–there are plenty of people doing interesting stuff with not enough time, they don’t have time for legal negotiations with OCLC, and shouldn’t need to engage in them, it doesn’t serve us.
Open Source vs. Open Data
To take things a step further, beyond open data to open source: If OCLC’s own code behind xID was open source, then other people could collaborate with Xiaoming on the code itself, not have to re-invent Xiaoming’s work, even if they did have access to the WorldCat corpus.
This is the benefit of openness, and the cost of non-openness. (Is openness even a word?) Non-openness means all our eggs are in OCLC’s basket, and we have to wait for them to innovate (which I know they are genuinely interested in doing, but they are but one organization with limited resources and it’s own necessary business priorities). Openness leads to innovation.
Geese — eggs (golden), and baskets
Now, well-intentioned people in OCLC might worry: Okay, sure, if we make all that stuff open, in the immediate term it might lead to more innovation. But it might also kill OCLC, and then there’s no OCLC to provide that useful code and aggregated data that you want to be open, you kill the goose that lays the golden eggs. The argument would be, better to have the golden eggs in one basket, then not at all. (Metaphor win!).
It is the challenge facing OCLC, and it’s member libraries, to figure out how to make OCLC sustainable with openness. I don’t think it’s impossible. I think OCLC realizes it needs to switch to a service-based business model instead of a model based on monopolized assets, and I think OCLC leaders think it can be done. [And yes, this also means OCLC being sustainable in the presence of actual competitors, instead of an effective monopoly. I'm less sure that OCLC leaders think this can be done, but I do -- and think it is to our benefit]
We just need to do it right quick, if we are going to innovate to stay relevant, becase we need openness to do so. Without openness, the gooses days are numbered anyhow. I really really don’t want to destroy OCLC (really), but if I had to pick between OCLC and open data, it’d be a hard choice. And I don’t think we have a lot of time.