Naomi Dushay posted a great essay on the Blacklight dev listserv just now on how you make sure an open source project is amenable to continued collaboration and sharing of code, and doesn’t end up ‘forking’. Some of her concerns go beyond that, to how you build successful stable open source in general.
In a synthesis of her excellently stated issues and some of my own thoughts, I’d divide this into several issues.
1) Code Architecture.
2) Release Management
If we don’t separate general code from site-specific code for our OSS projects, then every site using the software creates an individual implementation silo when they customize the code.
Indeed. I spend most of my development time at the moment on two open source projects, Umlaut and Xerxes. In both cases, I was not the original programmer, and in both cases, I came on as a significant collaborator. (In Umlaut’s case, I am now basically the ‘owner’ of the project; in Xerxes’ case, David Walker is still the owner, thankfully). In both cases, I found myself needing to re-architect the code to support seperation of shared code (which I often call ‘distibution‘ or ‘distro‘ code) and site-specific code (which I generally call ‘localized‘ or ‘local‘ code). (This applies to any files on disk, including templates/views and configuration, not just the strict sense of ‘code’).
Here are some general architectural principles I have applied in those projects:
1. Make as much as possible (where reasonable) configurable in ‘config’ files. (Really, this is about balancing ‘reasonable’ and ‘as much as possible’. If the project is actively developed, more config flexibilty can always be added later.)
2. Make logic locally modifyable using standard Object Oriented design patterns, so you can substitute your own local classes in with local custom logic, where appropriate. If the shared code is well architected, you should be able to do this without having to copy and paste a lot of logic into the ‘local’ code, instead just writing what’s different. By either sub-classing or delegating to distro/shared code. (There are often better ways to do this without going crazy with the Factory pattern as Java tends to, although that’s one way.).
3. Have configuration parameters where you specify the names of your local classes that will be used instead of distro classes, where desired. (This can apply to views/templates, as well as controllers, and possibly models).
4. Have all localized code and configuration be in standard locations in their own _seperate_ directories, to make seperate source control feasible. Ideally, localized code and shared code are each in seperate single directory trees (rather than inter-spersed in a co-mingled directory tree), to support this. The standard locations for localized code should be included in the app’s search paths so it finds stuff there automatically (if this is appropriate to the programming environment).
In Rails specifically, I think the Engines plug-in is of value in implementing this kind of architecture.
Naomi identifies a few principles that I would group under ‘release management’.
1. the trunk is latest *proven working* version of the code.
2. A “release” is a well tested (beyond “proven working”) version warranting a “freeze”
3. A “beta” is a candidate for release – it is “proven working” and we want in situ tests before we consider it a release.
3. All unstable development occurs in a branch, not the trunk.
In general, this is pretty consistent with the approach outlined by Karl Fogel in his excellent chapter on release management in his book on open source development, which is available for free on the web (thanks Karl).
I’m not sure about trunk including only “proven working” though. I think it’s a good idea that trunk doesn’t contain any code that raises exceptions or prevents the app from running, but I think it’s probably okay for it to have new features (or old features being modified) that aren’t neccesarily completely “working” yet. It’s important that what checks out of svn compiles and runs, so other developers can work on it, but I’m not sure it needs to be “proven working”–that’s what the distinction between releases and trunk is for.
But anyway, this general kind of release management is very important for any project with more than one (or maybe two) institutions collaborating on or using.
Automated testing may not be specifically related to the collaborative aspect we’re talking about, but it’s important in general, and Naomi makes a good point that it may be even more important for distributed collaborative development, so you can have a higher standard of trust for collaborators code.
I, for one, am not keen on three-steps-forward-two-steps-backward development we risk without tests. We, as a community, need to practice test driven development from now on. If it’s worth testing once, it’s worth adding it to the test suite to ensure it doesn’t break in the future.
This approach requires excellent automated test suites and continuous integration. (Hmmm … test driven development …)
“Continuous integration”, as I understand it, means running your automated test suite before every commit to the repository, and not committing code that breaks tests. So this is what Naomi was getting at with her #1 principle for “trunk contains a proven working copy”. Again, I’m not sure I agree with taking “proven working” too strongly, but I do agree that we should have tests, and we should not commit code that breaks tests.
It should be noted that doing these things is indeed work. It will take more time to do these things right, than not. But it will ultimately pay off in sustainable shared collaborative open source. It is these things that ultimately will keep your “total cost of ownership” for an open source project reasonable and appropriate. It’s these things that will ultimately seperate succesful open source which thrives and grows and attracts collaborators, and unsuccesful open source which becomes unsustainable, withers, and dies.