Resolving relative URLs to base in ruby

You ever have to resolve relative URLs in ruby?  I did.

It’s not clear if the stdlib URI can do this — I think not, universally.

But the awesome addressable gem can — in tested RFC 3986 way nonetheless!  Although it wasn’t immediately obvious to me what the correct API to use in addressable was. But it’s simply join, also aliased as +.

base = Addressable::URI.parse("")
base + "foo.html"
# => #<Addressable::URI:0x3ff9964aabe4 URI:>

base = Addressable::URI.parse("")
base + "relative_file.xml"
# => #<Addressable::URI:0x3ff99648bc80 URI:>

base = Addressable::URI.parse("")
base + "//newhost/somewhere.jpg"
# => #<Addressable::URI:0x3ff9960c9ebc URI:https://newhost/somewhere.jpg>

base = Addressable::URI.parse("")
base + "../up-one-level.html"
=> #<Addressable::URI:0x3fe13ec5e928 URI:>

It looks like the route_to and route_from methods can be used to go the other way, make a URL into a relative URL, relative to some base. But I haven’t played with them.

Posted in ruby | Tagged | Leave a comment

Rails5 (and earlier?) ActiveRecord::Enum supports strings in db

I’ve often wanted to use ActiveRecord::Enum for the features it provides like query methods and value restrictions — but wanted to map to strings in the database, rather than integers. Whatever hypothetical performance or storage advantage you get from using integers, I’d rather trade it for more easily human-readable database values, and avoiding the Enum ‘gotcha’ where db values depend on the order the enumerated values are listed in your model in the basic Enum usage case.

So I took a look at the source to see how hard it would be to add — and it looked like it wouldn’t be hard at all, it looked kind of like it should work already!  So okay, let’s add some tests and see if there are any problems. Find the test file in master — and hey, look at that, it’s already sets up a test model with string values in master, and tests em.

The tests were added in this commit from (rails committer?) sgrif, which is in Rails 5.0.0.beta2 and on. The commit message suggests it’s fixing a bug, not adding a new feature, and indeed only adds 7 new lines to enum.rb source.

It says it ‘fixes’ a (not directly merged) Pull Request #23190, which itself says it fixes Issue #23187 , with all of this code reverting each other and other commits. Which all have pretty terse comments, but seem to be about db-defined default values in Enums, rather than String vs Integer. But if they were only meant to solve that problem, why did @sgrif add tests for string values here?

So maybe Enum has supported String mappings for quite a while? But maybe buggily and without tests? Or was it a feature added in Rails 5, but buggy until 5.0.0.beta2? I actually tried it out in Rails 4.2.7, and it seemed to work in SQLite3. Including with db default values. But maybe with edge case bugs in at least some db adapters, that are fixed in Rails 5?

It’s not in the docs at all.

In my opinion, committing new code without updating docs to match is just as much a sin as committing new code without tests. I’m not sure if everyone (or Rails core) shares my opinion.  But it’s not like this was an out of the way doc reference that someone wouldn’t have noticed, it’s the module header docs themselves which are missing a reference to non-integer values!  But maybe the committer considered this a bugfix rather than a new feature so didn’t think they needed to consider doc updates. Hard to say. The mysteries of Rails!

I commented in github asking, cause I’m curious. 

In any event, it seems safe to use Enum with String values in Rails 5 — there are test cases and everything. And likely will even work in Rails 4.2.7, although there are no test cases in Rails 4.2.x and there may be edge case bugs I didn’t run into? The docs should really be updated. If you’re looking to have your name in the commit logs for Rails, maybe you want to update the docs? Or maybe I’ll get around to it.

update Sep 7

So, interestingly, this also seems to work just fine with a database column that’s been defined as a Postgres Enumerated Type. The Rails PostgreSQL Guide says: “Currently there is no special support for enumerated types. They are mapped as normal text columns.”

However, in fact, since ActiveRecord maps Postgres enumerated types as string/text, so long as ActiveRecord::Enum supports string/text values as it seems to, it looks like you can use ActiveRecord::Enum as a cover for it on the Rails side, getting restricted values (on the Rails side), query methods, automatic scopes, etc.  Which is pretty nice. I tested this on Rails 4.2.7 and 5.0 and was unable to find any problems.

However, the bad news is that Rails team has said in response to a doc PR  “This is left out of the documentation because while it happens to work, this isn’t something that we want to explicitly support, and reserve the right to break in the future.”

So that is what it is. This is a weird situation, because it doesn’t make sense to write a third-party plugin for Rails to add this feature — it wouldn’t include any code at all, because the feature already works just fine. (I suppose I could copy and paste the Enum implementation into the plugin and give it different names? That seems silly though). It’s there, it works fine — but may go away in the future, so use at your own risk.

I gotta say I’m mystified by Rails decision-making approach these days. They throw in a variety of 0fficially supported features  that nobody but dhh understands any use cases for  (thread_mattr_accessor anyone?), but a feature that lots of people have asked for, that is actually already present and working with test cases — they say they don’t want to explicitly support. (And it’s not like something being doc’d and ‘supported’ usually stops Rails from removing it in the next version before anyway!).  Go figure.  I assume if dhh ever ran into the cases we have where he wanted this, it would magically become ‘explicitly supported’.

Hmm, maybe I will just copy the enum.rb implementation into my own gem, I dunno.




Posted in General | 2 Comments

Mythical Man-Months et al

I’ve never actually read Fred Brooks’ Mythical Man-Month, but have picked up many of it’s ideas by cultural osmosis.  I think I’m not alone, it’s a book that’s very popular by reputation, but perhaps not actually very influential in terms of it’s ideas actually being internalized by project managers and architects.

Or as Brooks himself said:

Some people have called the book the “bible of software engineering.” I would agree with that in one respect: that is, everybody quotes it, some people read it, and a few people go by it.

Ha. I should really get around to reading it, I routinely run into things that remind me of the ideas I understand from it that I’ve just sort of absorbed (perhaps inaccurately).

In the meantime, here’s another good quote from Brooks to stew upon:

The ratio of function to conceptual complexity is the ultimate test of system design.

Quite profound really. Terribly frustrating to work with software packages can, I think, almost always be described in those terms: The ratio of function to conceptual complexity is far, far too low.  That is nearly(?) the definition of a frustrating to work with software package.

Posted in General | Leave a comment

bittorrent for sharing enormous research datasets says:

We’ve designed a distributed system for sharing enormous datasets – for researchers, by researchers. The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds.

There are data sets from researchers are several respected universities listed, including the University of Michigan and Stanford.

Posted in General | Leave a comment

technical debt/technical weight

Bart Wronski writes a blog post about “technical weight”, a concept related to but distinct from “technical debt.”  I can associate some of what he’s talking about to some library-centered open source projects I’ve worked on.

Technical debt… or technical weight?

…What most post don’t cover is that recently huge amount of technical debt in many codebases comes from shifting to naïve implementations of agile methodologies like Scrum, working sprint to sprint. It’s very hard to do any proper architectural work in such environment and short time and POs usually don’t care about it (it’s not a feature visible to customer / upper management)…


…I think of it as a property of every single technical decision you make – from huge architectural decisions through models of medium-sized systems to finally way you write every single line of code. Technical weight is a property that makes your code, systems, decisions in general more “complex”, difficult to debug, difficult to understand, difficult to change, difficult to change active developer.…


…To put it all together – if we invested lots of thought, work and effort into something and want to believe it’s good, we will ignore all problems, pretend they don’t exist and decline to admit (often blaming others and random circumstances) and will tend to see benefits. The more investment you have and heavier is the solution – the more you will try to stay with it, making other decisions or changes very difficult even if it would be the best option for your project.…





Posted in General | Leave a comment

UC Berkeley Data Science intro to programming textbook online for free

Looks like a good resource for library/information professionals who don’t know how to program, but want to learn a little bit of programming along with (more importantly) computational and inferential thinking, to understand the technological world we work in. As well as those who want to learn ‘data science’!

Data are descriptions of the world around us, collected through observation and stored on computers. Computers enable us to infer properties of the world from these descriptions. Data science is the discipline of drawing conclusions from data using computation. There are three core aspects of effective data analysis: exploration, prediction, and inference. This text develops a consistent approach to all three, introducing statistical ideas and fundamental ideas in computer science concurrently. We focus on a minimal set of core techniques that they apply to a vast range of real-world applications. A foundation in data science requires not only understanding statistical and computational techniques, but also recognizing how they apply to real scenarios.

For whatever aspect of the world we wish to study—whether it’s the Earth’s weather, the world’s markets, political polls, or the human mind—data we collect typically offer an incomplete description of the subject at hand. The central challenge of data science is to make reliable conclusions using this partial information.

In this endeavor, we will combine two essential tools: computation and randomization. For example, we may want to understand climate change trends using temperature observations. Computers will allow us to use all available information to draw conclusions. Rather than focusing only on the average temperature of a region, we will consider the whole range of temperatures together to construct a more nuanced analysis. Randomness will allow us to consider the many different ways in which incomplete information might be completed. Rather than assuming that temperatures vary in a particular way, we will learn to use randomness as a way to imagine many possible scenarios that are all consistent with the data we observe.

Applying this approach requires learning to program a computer, and so this text interleaves a complete introduction to programming that assumes no prior knowledge. Readers with programming experience will find that we cover several topics in computation that do not appear in a typical introductory computer science curriculum. Data science also requires careful reasoning about quantities, but this text does not assume any background in mathematics or statistics beyond basic algebra. You will find very few equations in this text. Instead, techniques are described to readers in the same language in which they are described to the computers that execute them—a programming language.

Posted in General | Leave a comment

How to see if current version of a gem is greater than X

I sometimes need to this, and always forget how. I want to see the currently loaded version of a current gem, and see if it’s greater than a certain version X.

Mainly because I’ve monkey-patched that gem, and want to either automatically stop monkey patching it if a future version is installed, or more likely output a warning message “Hey, you probably don’t need to monkey patch this anymore.”

I usually forget the right rubygems API, so I’m leaving this partially as a note to myself.

Here’s how you do it.

# If some_gem_name is at 2.0 or higher, warn that this patch may
# not be needed. Here's a URL to the PR we're back-porting: <URL>
if Gem.loaded_specs["some_gem_name"].version >='2.0')
   msg = "		
   Please check and make sure this patch is still needed\		
  at #{__FILE__}:#{__LINE__}\n\n"		
   $stderr.puts msg		
   Rails.logger.warn msg		

Whenever I do this, I always include the URL to the github PR that implements the fix we’re monkey-patch back-porting, in a comment right by here.

The `$stderr.puts` is there to make sure the warning shows up in the console when running tests.

Unfortunately:"1.4.0.rc1") >="1.4")
# => false

I really want the warning to trigger if I’m using a pre-release too. Hmm.

Aha! Perusing the docs, this seems like it’ll work:

if Gem.loaded_specs["some_gem_name"].version.release >='2.0')

`Gem::Version#release` trims off the prerelease tags.

Posted in General | Leave a comment

Handy introspection for debugging Rails routes

I always forget how to do this, so leave this here partly as a note to myself. From Zobie’s Blog and Mike Blyth’s Stack Overflow answer


 routes = Rails.application.routes
 # figure out what route a path maps to:
 routes.recognize_path "/station/index/42.html"
 #  => {:controller=>"station", :action=>"index", :format=>"html", :id=>"42"}
 # or get a ActionController::RoutingError

 # figure out what url is generated for params, what url corresponds
 # to certain controller/action/parameters...
 r.generate :controller => :station, :action=> :index, :id=>42

If you have an isolated Rails engine mounted, it’s paths seem to not be accessible from the
`Rails.application.routes` router. You may need to try that specific engine’s router, like `Spree::Core::Engine.routes`.

It seems to me there’s got to be a way to get the actual ‘master’ router that’s actually used
for recognizing incoming urls, since there’s got to be one that sends to the mounted engine
routes as appropriate based on paths. But I haven’t figured out how to do that.

Posted in General | Leave a comment

GREAT presentation on open source development

I highly recommend Schneem’s presentation on “Saving Sprockets”, which he has also turned into a written narrative. Not so much for what it says about Sprockets, but for what it says about open source development.

I won’t say I agree with 100% of it, but probably 85%+, and some of the stuff I agree with is really important and useful, and Schneem’s analyzes what’s going on very well and figures out how to say it very well.

Some of my favorite points:

“To them, I ask: what are the problems? Do you know what they are? Because we can’t fix what we can’t define, and if we want to attempt a re-write, then a re-write would assume that we know better. We still have the same need to do things with assets, so we don’t really know better.”

A long term maintainer is really important, coders aren’t just inter-changeable widgets:

“While I’m working on Sprockets, there’s so many times that I say “this is absolutely batshit insane. This makes no sense. I’m going to rip this all out. I’m going to completely redo all of this.” And then, six hours later, I say “wow, that was genius,” and I didn’t have the right context for looking at the code. Maintainers are really historians, and these maintainers, they help bring context. We try to focus on good commit messages and good pull requests. Changelog entries. Please keep a changelog, btw. But none of that compares to having someone who’s actually there. A story is worth 1000 commit messages. For example, you can’t exactly ask a commit message a question, like, “hey, did you consider trying to uh…” and the commit message is like, “uh, I’m a commit message.” It doesn’t store the context about the conversations around that”

“These are all different people with very different needs who need different documentation. Don’t make them hunt down the documentation that they need. When I started working on Sprockets, somebody would ask, “is this expected?” and I would say honestly, “I don’t know, you tell me. Was it happening before?” And through doing that research, I put together some guides, and eventually we could definitively say what was expected behavior. The only way that I could make those guides make sense is if I split them out, and so, we have a guide for “building an asset processing framework”, if you’re building the next Rails asset pipeline, or “end user asset generation”, if you are a Rails user, or “extending Sprockets” if you want to make one of those plugins. It’s all right there, it’s kind of right at your fingertips, and you only need to look at the documentation that fits your use case, when you need it.

We made it easier for developers to find what they need. Also, it was a super useful exercise for me as well. One thing I love about these guides is that they live in the source and not in a wiki, because documentation is really only valid for one point in time.”

I also really like the concept that figuring out how to support or fix someone else’s code (which is really all ‘legacy’ means), is an excercize in a sort of code archeology.  I’ve been doing that a lot lately.  Also how to use someone else’s code that isn’t documented sufficiently.  It’s sort of fun sometimes, but better to have better docs.

Posted in General | Leave a comment

Really slow rspec suite? Use the fuubar formatter!

I am working on a ‘legacy’-ish app that unfortunately has a pretty slow test suite (10 minutes+).

I am working on some major upgrades to some dependencies, that require running the full test suite or a major portion of it iteratively lots of times. I’m starting with a bunch of broken tests, and whittling them down.

It was painful. I was getting really frustrated with the built-in rspec formatters — I’d see an ‘f’ on the output, but wouldn’t know what test had failed until the whole suite finished, or or I could control-c or run with –fail-fast to see the first/some subset of failed tests when they happen, but interrupting the suite so I’d never see other later failures.

Then I found the fuubar rspec formatter.  Perfect!

  • A progress bar makes the suite seem faster psychologically even though it isn’t. There’s reasons a progress bar is considered good UI for a long-running task!
  • Outputs failed spec as they happen, but keep running the whole suite. For a long-running suite, this lets me start investigating a failure as it happens without having to wait for suite to run, while still letting the suite finish to see the total picture of how I’m doing and what other sorts of failures I’m getting.

I recommend fuubar, it’s especially helpful for slow suites. I had been wanting something like this for a couple months, and wondering why it wasn’t a built-in formatter in rspec — just ran across it now in a reddit thread (started by someone else considering writing such a formatter who didn’t know fuubar already existed!).  So I write this blog post to hopefully increase exposure!

Posted in General | Leave a comment