Architecting for shareabiltiy
I’ve often talked before about the need, when sharing open source projects, to make sure any local configurations or customizations or over-rides are kept cleanly seperate from the shared code. This allows you to more easily update to new versions of the shared code — you’ll still have to test all your localizations to see if any non-backwards-compatible changes were introduced which require changes in your localizations. But you won’t have to confusingly figure out what lines of code WERE local and what lines weren’t in the first place.
And when you as a local developer fix a bug in the shared code, you can easily send it back a patch to the core distribution, because you’ve kept your local customizations clear so you know what changes were the bug fix to core code, seperate from local customizations. Ditto with if you decide one of your local customizations can and should be turned into a generalized feature to contribute back to the shared code.
So this is really important for sustainable collaborative open source. But it can actually be a bit tricky to architect a shared project to support (and ideally strongly encourage) this. It requires factoring out configuration options for things that will commonly vary into isolated areas, for instance. It is, in my opinion, a lot easier (perhaps only really feasible) to do this really flexibly, with object oriented code (I think it’s actually one of the main benefits of OO), and can be easier when you’re working with a framework that gives you features to support it.
Although Rails, while a nice framework, didn’t (until recently) have really clear best practices for seperating shared and local code in complex shared apps, perhaps because most clever Rails developers develop apps for their own “dot com” type businesses, rather than shared apps, so they didn’t really have the use case. Individual Rails libraries can be shared as gems, a generally great system. And you could share some amount of logic in Rails plugins (but not, very easily, MVC code). But until recently the Rails party line was not to even try sharing more integrated functionality. I always disagreed with this — for the kinds of stuff we work on and want to share in the library world — share with people who are not neccesarily experienced developers but still will need to do some local customization — it would just make things too expensive/difficult. The solution is not refraining from sharing high-level integrated code, but instead making sure the high-level integrated code can be very easily and cleanly customized and localized. Thankfully the Rails community seems to be coming around to that position: With Rails3’s awesome support for ‘engines’ as gems, and awesomely improved gem dependency management, Rails now actually gives you some great support for this).
[And Xerxes which I contribute to but which was mainly architected by David Walker, is awesomely architected for keeping local and shared code seperate, even in PHP. If it can be done in PHP, it can be done anywhere? It is object-oriented PHP though.]
But no matter what, it take some real planning and real work to architect your application for clean separation of local and shared code. It’s just that, in my (and not just my) opinion/experience, it’s neccesary if you want a moderately complicated project to actually be sustainable through cross-institution collaboration. It gets easier the more experienced you are as a developer in general, and at doing this kind of architecture in particular.
At this point I’ve participated in three projects started by someone else, but where I joined and spent significant time and thought in helping to improve their architectures for local/shared seperation. (Umlaut, Xerxes, Blacklight). It’s not just altruistic, it’s resources my employer had to invest (in the form of my time) to increase the chances of these open source projects being sustainable for us and others over the long haul.
Me and Bill Dueber tried starting a wiki page with some hints/guidelines/ideas on how to create this kind of architecture, although it didn’t really go anywhere. It may or may not be helpful.
But more recently Tod Olson at uchicago asked me for some details on specifically the mechanics of using revision control systems (svn, git, etc) with projects that have both a local and a shared component.
I think my attempt to answer his question below may prove useful to people trying to wrap their heads around how you actually keep local and shared code separate.
Three case studies, overview and mechanics of revision control
Exactly how I handle things at the revision control level it depends on app, it’s architecture, and version control system involved. But all of these apps are architected to keep local files in seperate directories from shared files, one way or another. That’s the key really, if you’ve done that, the rest becomes plausible.
The slightly different ways effect how I deal with it, although all are similar in having shared code be a checkout from a shared repo, and local code be a checkout from a local repo.
Xerxes is a PHP app. It’s set up so the local config and overrides are isolated in their own directory, seperately from the shared code. I have it set up like this:
- jhsearch/ - jhsearch-app/ [my local files, in my local svn] - xerxes/ [shared xerxes app, checked out from svn]
My xerxes dir is an svn checkout from xerxes shared repo. To update it, I can run “svn up”, or update to a specific tagged release in repo, etc. My jhsearc-app dir is an svn checkout from my local repo; I actually have a dev copy of jhsearch in an entirely seperate location, and I make actual changes to my local jhsearch-app there, then commit them to the local repo when tested, then check them out from the local repo in the production location.
Local information in the jhsearch-app folder includes a config file (which references the location of the xerxes shared dir); some xsl templates that are merged in as local overrides to standard xsl templates by the Xerxes app; and optionally some custom PHP classes that are referenced in the config file to be used to handle actions, on an action by action basis; and an actions ‘controller’ mapping, that Xerxes nicely architects so this too only needs to contain local over-rides of default Xerxes controllers (like if you want to use a forementioned
custom PHP class — that custom PHP class can be and often is a sub-class of something in xerxes shared code).
One downside of this setup is that I don’t actually have in any repo history exactly what snapshot of my local app was tested with exactly what snapshot of shared Xerxes. But it works out okay anyway. Since they are both svn, it’s possible you could use svn externals here instead, but I don’t — since the two directories (local and shared) are siblings rather than one being the parent of the other, it’s easy just to make them two completely different svn checkouts. It also means you could use svn for one of them and git for another, but I use svn for both.
Blacklight is a Rails plugin, that works with Rails 2.x apps. That means it lives in a specific directory of your local rails app.
- jhu-app/ - config/ [local stuff] - app/ [local stuff] - [many more] - vendor/plugins/blacklight/ [this is the shared blacklight code]
jhu-app is my local app, and is a local git repo. The only part of the whole jhu-app tree that is the shared code is the vendor/plugins/blacklight directory (which has lots of it’s own subdirs and files, of course). I use git submodule feature to connect just the vendor/plugins/blacklight to an external repo — the blacklight shared repo. So I can peg vendor/plugins/blacklight to whatever commit point of the blacklight repo I want, master/trunk or a tagged release, and in this case the git submodule feature takes care of maintaining a history
of exactly what snapshot of shared blacklight goes with a given snapshot
of the jhu-app.
Another way to do it would be to have vendor/plugins/blacklight _not_ be an actual live repo checkout, but just be an “export” snapshot, simply files contained in my local repo, that I updated to the desired latest version of BL whenever I wanted. This would probably be neccesary if for instance I wanted to control my local app in svn even though the BL repo is git.
In Blacklight, your local app has at a minimum some configuration (your solr and rdbms connection info; some mappings between your solr index and how you want it to display/search in the app, etc). Optionally, you can over-ride individual Rails view templates (Rails “engines” feature takes care of letting local templates over-ride ones from the plugin). Optionally you can also over-ride blacklight plugin classes on a method-by-method basis, which is how lots of BL local customization happens (also just part of Rails, mostly). The Blacklight project also provides several optional add-on plugins, which you generally install in vendor/plugins too.
Once we upgrade BL to working in Rails3, it’ll be possible to make the Blacklight plugin a ‘gem’, which will make dependency management between your local app, BL, and any other plugins/gems you have (including BL
add-ons) even smoother, using the standard ruby/rails gem facilities.
Umlaut was the first app I seriously worked on architecting like this, and it’s definitely the hackiest, both because I was still learning, and because it was written back in the day when Rails plugins weren’t really powerful enough to do what I needed (and before I became friends with git too). But I still managed to seperate local code from shared code in a pretty workable way (and apparently good enough to make it feasible for Scot Dalton at NYU to share enhancements back with me, and to keep his local install up to date — we have succesfully avoided forking).
In Umlaut, instead of the shared code being isolated to a rails plugin, the entire Umlaut app is in fact the shared code. But certain specific directories within it (and only those directories, generally) contain your local code.
Umlaut/ [shared code] app/ [still shared code] views/ local/ [LOCAL] lib/ [still shared code] local/ [LOCAL code] config/ [still shared code] umlaut_config/ [LOCAL code]
So here we have local code in a bunch of directories throughout the Umlaut directory, which is overall a shared code directory. But it is still a specific set of directories. Here what I do is more complicated. The overall Umlaut directory is an svn checkout from the Umlaut repository. But each local directory is in .svnignore, so it’s not included in the checkout from the shared umlaut repo. But I do want to put each of those local directories in my local repo.
So each local directory, seperately, is an svn checkout from my local svn. Because svn lets you check out just a subdir from the repo, they’re all actually
in one repo, as different subdirs. And I’ve got to go checkout $my_svn/umlaut_config to Umlaut/config/umlaut_config, checkout
$my_svn/lib_local to Umlaut/lib/local ; etc. There are actually some utilities that I bundle with Umlaut shared code to make this easy, letting you just run one command that will then go checkout/update/commit all of your various local directories.
It’s definitely kind of hacky/messy, but the best I could come up with at the time. Now, at some point I want to rejigger it as a Rails3 plugin-gem, as discussed with Blacklight.
When you can’t cleanly seperate…
…you might still be able to figure out some way to handle keeping track of which code is shared and which is local by using svn or git branches. I’ve thought about this a little, but my gut feeling is that this approach would end up a pretty big mess, and I haven’t experimented or thought about it extensively. But it might be the only path you’ve got if you’ve got an existing really big app which is not architected to keep local code and shared code seperate, you don’t have the resources to rearchitect it, but you still want to try your best to keep from forking, and be able to share fixes and enhancements with partners.