jump to navigation

ONIX For Serials Coverage June 21, 2007

Posted by jrochkind in Practice, cataloging, catalogs.
1 comment so far

The ONIX For Serials Coverage standard is out.

While it was mainly designed to be used within the ONIX SOH and SPS formats, they wisely decided to publish it as a free-standing schema too: “The Coverage Statement may also be used to express holdings or coverage in XML structures other than those specified in ONIX for Serials.”

I think this is a great idea, along the lines of the ‘mix and match’ incipient semantic web we find ourselves in. If you look at the standard, it is really a very nice way way to describe serial holdings coverage, in ways very amenable to machine calculation. For instance, to answer the question: “Is this particular issue X within the holdings?” Or, to combine various holdings statements into a contiguous human-displayable statement. Etc. This is something our current systems have trouble doing, because we don’t store the neccesary data in machine-actionable ways.

While the standard says it’s “designed to convey information about online serial resources from suppliers – such as hosting services, publication access management services, agents or publishers – to end customers in subscribing libraries.”, there’s really nothing about it that’s limited to that context.

If anyone is writing software where they need to store or exchange serials coverage data, I’d encourage them to check out ONIX For Serials Coverage. It’s very elegant, seems to me to be just the right level of complexity and flexibility to do what it needs to do, without being overly abstract/complex/flexible. Should be quite easy to work with. Hats off to the standards writers here.

“Computational thinking” June 18, 2007

Posted by jrochkind in cataloging, programming.
1 comment so far

Yeah, I hate it when people just “me too” something someone else blogged, but I’m doing it anyway, bringing it into a slightly different context.

John Udell talks about Jeannette Wing’s concept of “computational thinking“, and points to a podcast on it (which I haven’t listened to, no. But that’s antoher topic).

This idea of a “computer science perpsective”, based in large part on the foundational idea of ‘abstraction’ (from which, I think, comes ‘refactoring’ and ’seperation of concerns’), is one I’ve been thinking about for a while. I’m pleased to see that Wing has put a name on it too, and is exploring what it means exactly.

I learned this way of looking at things with a computer science degree and some years of experience programming, but I don’t think that’s the only way to learn it, and I think there’s a way to to learn it without actually learning how to program or being a programmer or computer scientist.

And it’s precisely this kind of perpsective (”computational thinking”) that I think the 21st century cataloger or metadata librarian absolutely needs, to be able to understand how what they do does and can fit into the digital landscape. I’ve thought before if it would be possible to design some kind of curriculum in what I thought of as ‘computer science perspective’ that wasn’t in fact particularly technical and was not about teaching programming or computer science itself. I wonder if Wing is exploring that idea with ‘computational thinking’, as she seems to think too that it’s a way of thinking that’s of utility for more than just computer scientists.

Looks like this article is a good place to start on Wing’s “Computational Thinking” idea. I think if you read that article, you will immediately be convinced “Yes! Catalogers do need to be able to think that way in the 21st century!” [Indeed, I'd make clear, that many catalogers do, or can, or _almost_ can think that way already, it's not TOO different from a certain type of 'cataloging thinking' ].

“Computational thinking is a way of solving problems, designing systems, and understanding human behavior that draws on concepts fundamental to computer science. Computational thinking is thinking in terms of abstractions, invariably multiple layers of abstraction at once. Computational thinking is about the automation of these abstractions. The automaton could be an algorithm, a Turing machine, a tangible device, a software system –or the human brain. Recursively, it could be a network of these. Computational thinking gives us the power to scale beyond our imagination.”

Tor and off-campus testing June 12, 2007

Posted by jrochkind in Practice.
1 comment so far

Do you have software that works differently with an IP range recognized as ‘on campus’ than ‘off campus’? Such as, say, your database vendors websites? Or your proxies? Or other local software.

I do. It’s a pain to test, or to reproduce someone’s problem when there’s a problem report.

Tor (the onion router) works great for this!  It’s a package actually intended to allow anonymous web surfing by redirecting your web traffic all over the place, but that means your traffic appears to be from somewhere else, which is what I needed to appear to be ‘off campus’ for testing. The windows installer worked fairly flawlessly (I had to install the Firefox plugin manually for some reason, even though the installer was supposed to do it). It’s great!

Thanks to jaron on #code4lib for the idea.

Ex Libris’ ‘URM strategy’, and the future of library software June 11, 2007

Posted by jrochkind in business, catalogs.
add a comment

I was at the ELUNA conference this past week.

Among other things, Oren Beit-Arie from Ex-Libris gave a presentation on their “Unified Resource Management” strategy for their products. I’m afraid I can’t find even any good marketting materials from EL on this online, let alone Oren’s slideshow, which would be great. But here’s what I took away from it (anyone else feel free to correct me).

A) Get Ex Libris products working together in an integrated way, working with one common data model, as ’services’ layered on top of a common database. Buy what services you want, all the services work well together in an integrated fashion. (There was a nifty boxes-and-lines diagram here about the idea for architecture).

B) An openness to… openness. The individual services should be mix-and-matchable with services from other providers (vendors or open source).

Now, I have no doubt that ‘A’, even by itself, is the right strategy for architecting library software. The current divisions between products and ‘modules’ we have are not always rational, leading to both user interface problems (lack of integration, have to look in different places for things that should be together), and workflow problems (again, staff has to enter things multiple times, and work with multiple products/modules/screens to accomplish what is one task to their workflow).

The architecture Oren was talking about is right from a technical perspective, and it’s indubitably right from a business perspective to give customers what they need/want. In fact, this strategy is no doubt in part a response to competitor Serials Solutions, whose products more-or-less already work like this. SerSol recently rebranded their products as the ‘360 suite’ to emphasize this level of integration. With the important caveat that SerSol’s suite does NOT include a traditional ILS or most of the functions/interfaces traditionally housed there. Which is the most complicated/difficult part, of course.

But it’s Part B that is really exciting. SerSol doesn’t do that.

Now, achieving part A alone will be a technical challenge. But on top of that, making all the pieces mix-and-matchable with other vendors products? It’s an even bigger technical challenge (does that mean that common data schema needs to be some kind of standard common accross vendors?). It’s also a political challenge and a business challenge for Ex Libris.

Will they be able to get other vendors to cooperate? (The incipient emergence of real-world stable open source library software makes this more likely, and provides some projects that are likely to cooperate regardless of what the traditional ones do).

Will they be able to actually make this work for themselves, actually commit to it, not be scared of what it could mean to their own bottom line? Now, Ex Libris has always been one of the vendors that is most comfortable with open-ness, and really trying to provide software that works with their competitors, instead of relying on lock-in. Not to detract from ‘ideological’ motivations in the company which are probably present, this is also due in part to their business position. (There are always ‘material conditions’ helping to determine things, sez this Marxist). Their strongest product was SFX–NOT an ILS–and their ILS product always had a relatively weak market share in North America anyway. So they had no choice but to promote interoperability.

It’s clear that they want Primo, their new ‘front end’ unit, to inter-operate with everyone else’s stuff, so other vendor’s customers can still buy Primo.

But do they really want all their other products–including back-ends–to inter-operate with OTHER vendors front-ends and back-ends too? So I could even–gasp–choose to use a Primo competitor with their back-end products? Are they really going to be willing/able to pull this off? It’s going to take serious technical resources, and in the long term, they’re not going to be able to justify such resources to themselves if it results in lowering their own ‘lock in’–or are they?

What Ex Libris says they want to sell me, is indeed what I want to buy. Will they be able to make it happen? Will they be able to do it in time for it to matter? The URM ’strategy’ [Oren was clear that it's not a 'product', it's a 'strategy'] is just being born, and the timeline was “next 5 years or more”.

Time will tell. But I am encouraged that they seem to have a strategy which makes sense from a technical perspective, not just a business perspective, something we are not used to expecting from our vendors, and which we all desperately need. [Of course a vendor with a great technical idea that goes out of business helps nobody either--I'm not saying the business can be sacrificed to the technical.]

The Purposes of ‘Subject’ Vocabularies June 6, 2007

Posted by jrochkind in Theory, cataloging.
1 comment so far

LCSH, LCC, DDC, Ulhrich’s subject headings, BISAC, Ranganathan’s Colon Classification, Bliss Classification (2), Amazon’s subject headings: All are examples of ’subject’ controlled vocabulary.

I put ’subject’ in quotes because in reality most, if not all, of these examples include terms to capture ‘aboutness’ as well as terms to capture discipline (ie, perspective), and genre (and in some cases form, format, and intended audience). (Yes, Dewey sometimes captures ‘aboutness’ and LCSH sometimes captures disciplinary perspective. Take a look.)

I have been interested for a while in exploring the purposes of these types of vocabularly. I think they are not as clear and simple as we might be used to assuming. I wrote a (too) long paper about it in library school, which I’ll attach here. I actually wrote this before I had seen NCSU’s Endeca implementation; I’d have written it differently after; but I think this discussion is very relevant to understanding effective use of controlled vocabularies in facetted navigation. Recent discussion on NGC4Lib regarding these types of vocabularies further emphasizes, to me, the importance of considering the functions.

In my paper, I argue that in looking at these vocabularies from the perspective of functions or purpose, the traditional line between ‘classification’ and ’subject vocabulary’ isn’t actually that clear, but instead we have a number of purposes (not just two) which a given vocabularly may serve better or worse.

The paper is awfully long, so I’ll also now summarize my suggestion as to an initial draft taxonomy of functions. (These functions admittedly overlap in some ways, but I still think ) (The next step, to determine what features of a vocabularly fit what functions or purposes–is only touched upon in the paper). (more…)

Google’s algorithms June 3, 2007

Posted by jrochkind in General.
3 comments

Very interesting article in today’s NYT Business section (Annoyingly, WordPress.com doesn’t let me put a COinS in my blog post! Argh! Sorry. June 3, 2007. New York Times. “Google Keeps Tweaking Its Search Engine” by Saul Hansell) about Google’s relevancy ranking algorithms.

This article has a sub-text (well, not too sub) about how insanely awesome Google is, how much further ahead than anyone else they are. No doubt getting press like that is part of the reason Google gave the reporter access to this department which is usually instead cloaked in trade-secrecy.

Still, that’s definitley part of the story. It’s important to remember/realize taht Google’s relevancy ranking algorithms are very sophisticated and complex, and getting constantly more so, in order to give us the simplicity of the good results we see. Our simplistic conception of ‘page rank’ is just one increasingly small part of the whole set of algorithms. So, no, we can’t “just copy what Google does” (not least, but not only, because we are dealing with a different data domain than Google).

The solution to what we need isn’t just waiting out there in the open for us to copy. The solution(s) are waiting for us to discover and invent. On the other hand, of course we want to pay attention to what we can learn from Google and what Google does (in broad principles and–where we can figure them out–specific details) in figuring it out.

Some choice quotes: (more…)