Traject MARC->Solr indexer release

Jonathan Rochkind (Johns Hopkins) and Bill Dueber (University of Michigan), are happy to announce a robust, feature-complete beta release of “traject,” a tool for indexing MARC data to Solr.

traject, in the vein of solrmarc, allows you to define your indexing rules using simple macro and translation files. However, traject runs under JRuby and is “ruby all the way down,” so you can easily provide additional logic by simply requiring ruby files.

There’s a sample configuration file to give you a feel for traject.

You can view the code on github, and easily install it as a (jruby) gem using “gem install traject”.

traject is in a beta release hoping for feedback from more testers prior to a 1.0.0 release, but it is already being used in production to generate the HathiTrust (metadata-lookup) Catalog. traject was developed using a test-driven approach and has undergone both continuous integration and an extensive benchmarking/profiling period to keep it fast. It is also well covered by high-quality documentation.

Feedback is very welcome on all aspects of traject including documentation, ease of getting started, features, any problems you have, etc.

What we think makes traject great:

  • It’s all just well-crafted and documented ruby code; easy to program, easy to read, easy to modify (the whole code base is only 6400 lines of code, more than a third of which is tests)
  • Fast. Traject by default indexes using multiple threads, so you can use all your cores!
  • Decoupled from specific readers/writers, so you can use ruby-marc or marc4j to read, and write to solr, a debug file, or anywhere else you’d like with little extra code.
  • Designed so it’s easy to test your own code and distribute it as a gem

We’re hoping to build up an ecosystem around traject and encourage people to ask questions and contribute code (either directly to the project or via releasing plug-in gems).

This entry was posted in General. Bookmark the permalink.

One Response to Traject MARC->Solr indexer release

  1. Pingback: Extending your Solr indexing via a gem with traject | Bibliographic Wilderness

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s