We’ve all got a lot of data in MARC (that statement making sense shows that MARC is effectively a data vocabularly, not just a transmission standard, but anyway, moving on), that we need to sling around between applications, including for many of us “next generation” discovery tools that need to index it.
Marc21 binary format is the ‘native’ marc transmission format for our data. It’s got some benefits; it’s a ‘lowest common denominator’ that systems we work with are most likely to produce and consume; it’s fairly fast to de-serialize (I was going to say ‘parse’, but ‘deserialize’ is probably more accurate for a format like Marc21).
However binary Marc21 has got some significant problems too:
- If your programming language of choice doesn’t already have a robust, well-performing, free library for serializing/deserializing Marc21, it’s kind of a bear to write one. It’s a very weird format in some ways (offset data encoded as ascii numerals?), and an overly complex data format for contemporary standards. Just because you think you have a library available doesn’t neccesarily mean that open source library is as robust or well-performing as you might hope.
- Just because an existing system (like an ILS) says it outputs Marc21 doesn’t neccesarily mean it outputs legal Marc21. If some records are structurally illegal in certain ways, they may not be de-serializable on the other end, or may take more complex and less-well-performing de-serialization code on the other end. The weirdness and complexity of the marc21 format (see above) contributes to this prevalence of non-compliant output.
- Perhaps most significantly, binary Marc21 has a maximum length. A legal marc binary Marc21 record can’t be any larger than 99999 bytes (10k). While this must have seemed larger than you’d ever want in the 1960s, currently it’s often not large enough for us — especially when you try to include ‘item’ information in a marc bib record (which isn’t standard, but is often done for various reasons).
To get around these problems, many people choose to work with MarcXML instead of binary Marc21 when they can. And MarcXML does get around the problems listed above pretty well, but involves a couple trade-offs which in some circumstances don’t matter, but in others do:
- A MarcXML file generally has a much larger file size than it’s equivalent Marc21.
- A MarcXML file is often significantly slower to deserialize than it’s equivalent Marc21.
In many cases, those issues don’t matter at all. But in some cases, they are unfortunate. (Like when you are exporting, re-indexing, and re-storing your entire multi-million-record Marc corpus).
So some people came up with the idea of marc in Json. If you can serialize marc in xml, why not do something very similar to serialize marc in Json in a standard way? Json is much more compact than XML, and typically faster to parse. While still being a standard beyond the library world (meaning there are tools to support it and validate it). And without the issues of marc21 binary including length limits.
In fact, I know of a couple people who independently had this idea of marc-json, but Bill Dueber did a little proto- mini- spec for a standard way to do marc in json, so different people writing tools can do it can be inter-operable.
I encourage anyone dealing with these issues to consider marc-json per Bill’s proto-mini-spec. I plan/hope to!