“Internet Archive Successfully Fends Off Secret FBI Order”


A DECADE AGO, the FBI sent Brewster Kahle, founder of the Internet Archive, a now-infamous type of subpoena known as a National Security Letter, demanding the name, address and activity record of a registered Internet Archive user. The letter came with an everlasting gag order, barring Kahle from discussing the order with anyone but his attorney — not even his wife could know.

But Kahle did eventually talk about it, calling the order “horrendous,” after challenging its constitutionality in a joint legal effort with the Electronic Frontier Foundation and the American Civil Liberties Union. As a result of their fight, the FBI folded, rescinding the NSL and unsealing associated court records rather than risk a ruling that their surveillance orders were illegal. “This is an unqualified success that will help other recipients understand that you can push back on these,” Kahle told reporters once the gag order was lifted.

The bureau continued to issue tens of thousands of NSLs in subsequent years, but few recipients followed in Kahle’s footsteps. Those who did achieved limited but important transparency gains; as a result of one challenge, a California District Court ruled in 2013 that the everlasting gag orders accompanying NSLs are unconstitutional, and last year Congress passed a law forcing the FBI to commit to periodically reviewing such orders and rescinding them when a gag is no longer necessary to a case.

Now, Kahle and the archive are notching another victory, one that underlines the progress their original fight helped set in motion. The archive, a nonprofit online library, has disclosed that it received another NSL in August, its first since the one it received and fought in 2007. Once again it pushed back, but this time events unfolded differently: The archive was able to challenge the NSL and gag order directly in a letter to the FBI, rather than through a secretive lawsuit. In November, the bureau again backed down and, without a protracted battle, has now allowed the archive to publish the NSL in redacted form.…

Posted in General | Leave a comment

“Harvesting Government History, One Web Page at a Time”


With the arrival of any new president, vast troves of information on government websites are at risk of vanishing within days. The fragility of digital federal records, reports and research is astounding.

No law protects much of it, no automated machine records it for history, and the National Archives and Records Administrationannounced in 2008 that it would not take on the job.

“Large portions of dot-gov have no mandate to be taken care of,” said Mark Phillips, a library dean at the University of North Texas, referring to government websites. “Nobody is really responsible for doing this.”

Enter the End of Term Presidential Harvest 2016 — a volunteer, collaborative effort by a small group of university, government and nonprofit libraries to find and save valuable pages now on federal websites. The project began before the 2008 elections, when George W. Bush was serving his second term, and returned in 2012.

It recorded, for example, the home page of the United States Central Command on Sept. 16, 2008, and the State Department’s official blog on February 13, 2013. The pages are archived on servers operated by the project, and are available to anyone.

The ritual has taken on greater urgency this year, Mr. Phillips said, out of concern that certain pages may be more vulnerable than usual because they contain scientific data for which Mr. Trump and some of his allies have expressed hostility or contempt.

Posted in General | Leave a comment

Three articles on information ethics and power

Today I happened to come across three very good articles which to me all seemed to form a theme: Ethical and political considerations of information and information technology.

First, Weaponized data: How the obsession with data has been hurting marginalized communities

Consider contexts and who is driving the data: The problem of people not from communities affected by communities making decisions for those who are is very prevalent in our field, and the work around data is no exception. Who created the data? Was the right mix of people involved? Who interpreted the data? The rallying cry among marginalized communities is “Stop talking about us without us,” and this applies to data collection and interpretation.

I think there’s deeper things to be said about ‘weaponized data’ too that have been rattling around in my brain for a while, this essay is a useful contribution to the mix.

For more on measurement and data as a form of power and social control, and not an ‘objective’ or ‘neutral’ thing at all, see James C. Scott’s Seeing Like a State, and the works of Michel Foucault.

Second, from Business Insider, Programmers are having a huge discussion about the unethical and illegal things they’ve been asked to do by Julie Bort.

I’m not sure I buy the conclusion that “what developers really need is an organization that governs and regulates their profession like other industries have” — professional licensure for developers, you can’t pay someone to write a program unless they are licensed? I don’t think that’s going to work, and it’s kind of the opposite of democratization of making software that I think is actually important.

But requiring pretty much any IT program anywhere to include 3 credits of ethics would be a good start, and is something academic credentialing organizations can easily do.

“We rule the world,” he said. “We don’t know it yet. Other people believe they rule the world but they write down the rules and they hand them to us. And then we write the rules that go into the machines that execute everything that happens.”

I don’t think that means we “rule the world”. It means we’re tools.  But increasingly important and powerful ones. Be careful who’s rule you are complicit with.

Thirdly and lastly but not leastly, a presentation by Tara Robertson, Not all information wants to be free. (Thanks for the link Sean Hannan via facebook).

I can’t really find a pull quote to summarize this one, but it’s a really incredible lecture you should go and read. Several case studies in how ‘freeing information’ can cause harm, to privacy, safety, cultural autonomy, and dignity.

This is not a topic I’ve spent a lot of time thinking about, and Robertson provides a very good entry to it.

The original phrase “information wants to be free” was not of course meant to say that people wanted information to be free. Quite the opposite, it was that many people, especially people in positions of power did not want information to be free — but it is very difficult to keep information under wraps, it tends toward being free anyway.

But yes, especially people in positions of power — the hacker assumption was that the digital era acceleration of information’s tendency toward unrestricted distribution would be a net gain to freedom and popular power.  Sort of the “wikileaks thesis”, eh?  I think the past 20 years have definitely dashed the hacker-hippy techno-utopianism of Steward Brand and Mondo 2000 in a dystopian world of state panopticon, corporate data mining (see the first essay on data as a form of power, eh?), information overload distraction and information bubble ignorance.

Information may want to be free, but the powerful aren’t the only ones that are harmed when it becomes so.

Still, while it perhaps makes sense for a librarian’s conference closing lecture, I can’t fully get behind Robertson’s conclusion:

I’d like to ask you to listen to the voices of the people in communities whose materials are in the collections that we care for. I’d also like to invite you to speak up where and when you can. As a profession we need to travel the last mile to build relationships with communities and listen to what they think is appropriate access, and then build systems that respect that.

Yes, and no. “Community’s” ideas of “appropriate access” can be stifling and repressive too,  as the geeks and queers and weirdos who grew up to be hackers and librarians know well too.   Just because “freeing” information can do and has done  real harm to the vulnerable, it doesn’t mean the more familiar story of censorship as a form of political control by the powerful isn’t also often true.

In the end, all three of these essays I encountered today, capped off by Robertson’s powerful essay, remind us that information is power, and, like all power, it’s formation and expression and use is never neutral, it has real consequences, for good and ill, intended and unintended. Those who work with information need to think seriously about their ethical responsibilities with regard to that power they wield.

Posted in General | Leave a comment

Rubyland: A new ruby news and blog feed aggregator

So I thought there should be a site aggregating ruby rss/atom feeds. As far as I’m aware, there hasn’t been a really maintained one for a couple years now.

So in my spare time on my own, I made one, that worked the way I wanted. http://www.rubyland.news.

The source is open at github.

I’ve got a few more features planned still.

It’s running on a free heroku dyno with a free postgres. This works out — the CPU needs of an RSS aggregator are not very high, so this works out. But it does limit things in some ways, such as no SSL/https.  If any organization is interested in sponsoring rubyland for a modest contribution to pay for hosting costs and make more things possible, get in touch.

Most people seem to approach feed aggregators with a tool that produces static HTML. I decided to make a dynamic site to make certain things possible/easier, and use the tools I knew. But since the content is of course mostly static, there’s a lot of caching going on. Rails fragment caching over the entire page, as well as etags delivered to browsers.

Some other interesting features of the code include: flexbox for responsive display with zero media queries, which was fun (although I think I’ll have to add a media query for the a UI element I’m going to add soon); reddit API for live comments count on /r/ruby; and feedjira providing a great assist in dealing with feed idiosyncracies.

But beyond the code (which was fun to write), I’m hoping the Rubyland aggregator can be a valuable resource for rubyists and help (re-)strenghten the ruby online community, which is in a bit of a weird state these days.

Posted in General | Tagged | Leave a comment

flexbox is so nice there’s really no need for ‘grid’ frameworks anymore

That’s really all I got to say. I guess I should start tweeting or something.

Posted in General | Leave a comment

Resolving relative URLs to base in ruby

You ever have to resolve relative URLs in ruby?  I did.

It’s not clear if the stdlib URI can do this — I think not, universally.

But the awesome addressable gem can — in tested RFC 3986 way nonetheless!  Although it wasn’t immediately obvious to me what the correct API to use in addressable was. But it’s simply join, also aliased as +.

base = Addressable::URI.parse("http://example.com")
base + "foo.html"
# => #<Addressable::URI:0x3ff9964aabe4 URI:http://example.com/foo.html>

base = Addressable::URI.parse("http://example.com/path/to/file.html")
base + "relative_file.xml"
# => #<Addressable::URI:0x3ff99648bc80 URI:http://example.com/path/to/relative_file.xml>

base = Addressable::URI.parse("https://example.com/path")
base + "//newhost/somewhere.jpg"
# => #<Addressable::URI:0x3ff9960c9ebc URI:https://newhost/somewhere.jpg>

base = Addressable::URI.parse("http://example.com/path/subpath/file.html")
base + "../up-one-level.html"
=> #<Addressable::URI:0x3fe13ec5e928 URI:http://example.com/path/up-one-level.html>

It looks like the route_to and route_from methods can be used to go the other way, make a URL into a relative URL, relative to some base. But I haven’t played with them.

Posted in ruby | Tagged | Leave a comment

Rails5 (and earlier?) ActiveRecord::Enum supports strings in db

I’ve often wanted to use ActiveRecord::Enum for the features it provides like query methods and value restrictions — but wanted to map to strings in the database, rather than integers. Whatever hypothetical performance or storage advantage you get from using integers, I’d rather trade it for more easily human-readable database values, and avoiding the Enum ‘gotcha’ where db values depend on the order the enumerated values are listed in your model in the basic Enum usage case.

So I took a look at the source to see how hard it would be to add — and it looked like it wouldn’t be hard at all, it looked kind of like it should work already!  So okay, let’s add some tests and see if there are any problems. Find the test file in master — and hey, look at that, it’s already sets up a test model with string values in master, and tests em.

The tests were added in this commit from (rails committer?) sgrif, which is in Rails 5.0.0.beta2 and on. The commit message suggests it’s fixing a bug, not adding a new feature, and indeed only adds 7 new lines to enum.rb source.

It says it ‘fixes’ a (not directly merged) Pull Request #23190, which itself says it fixes Issue #23187 , with all of this code reverting each other and other commits. Which all have pretty terse comments, but seem to be about db-defined default values in Enums, rather than String vs Integer. But if they were only meant to solve that problem, why did @sgrif add tests for string values here?

So maybe Enum has supported String mappings for quite a while? But maybe buggily and without tests? Or was it a feature added in Rails 5, but buggy until 5.0.0.beta2? I actually tried it out in Rails 4.2.7, and it seemed to work in SQLite3. Including with db default values. But maybe with edge case bugs in at least some db adapters, that are fixed in Rails 5?

It’s not in the docs at all.

In my opinion, committing new code without updating docs to match is just as much a sin as committing new code without tests. I’m not sure if everyone (or Rails core) shares my opinion.  But it’s not like this was an out of the way doc reference that someone wouldn’t have noticed, it’s the module header docs themselves which are missing a reference to non-integer values!  But maybe the committer considered this a bugfix rather than a new feature so didn’t think they needed to consider doc updates. Hard to say. The mysteries of Rails!

I commented in github asking, cause I’m curious. 

In any event, it seems safe to use Enum with String values in Rails 5 — there are test cases and everything. And likely will even work in Rails 4.2.7, although there are no test cases in Rails 4.2.x and there may be edge case bugs I didn’t run into? The docs should really be updated. If you’re looking to have your name in the commit logs for Rails, maybe you want to update the docs? Or maybe I’ll get around to it.

update Sep 7

So, interestingly, this also seems to work just fine with a database column that’s been defined as a Postgres Enumerated Type. The Rails PostgreSQL Guide says: “Currently there is no special support for enumerated types. They are mapped as normal text columns.”

However, in fact, since ActiveRecord maps Postgres enumerated types as string/text, so long as ActiveRecord::Enum supports string/text values as it seems to, it looks like you can use ActiveRecord::Enum as a cover for it on the Rails side, getting restricted values (on the Rails side), query methods, automatic scopes, etc.  Which is pretty nice. I tested this on Rails 4.2.7 and 5.0 and was unable to find any problems.

However, the bad news is that Rails team has said in response to a doc PR  “This is left out of the documentation because while it happens to work, this isn’t something that we want to explicitly support, and reserve the right to break in the future.”

So that is what it is. This is a weird situation, because it doesn’t make sense to write a third-party plugin for Rails to add this feature — it wouldn’t include any code at all, because the feature already works just fine. (I suppose I could copy and paste the Enum implementation into the plugin and give it different names? That seems silly though). It’s there, it works fine — but may go away in the future, so use at your own risk.

I gotta say I’m mystified by Rails decision-making approach these days. They throw in a variety of 0fficially supported features  that nobody but dhh understands any use cases for  (thread_mattr_accessor anyone?), but a feature that lots of people have asked for, that is actually already present and working with test cases — they say they don’t want to explicitly support. (And it’s not like something being doc’d and ‘supported’ usually stops Rails from removing it in the next version before anyway!).  Go figure.  I assume if dhh ever ran into the cases we have where he wanted this, it would magically become ‘explicitly supported’.

Hmm, maybe I will just copy the enum.rb implementation into my own gem, I dunno.




Posted in General | 2 Comments

Mythical Man-Months et al

I’ve never actually read Fred Brooks’ Mythical Man-Month, but have picked up many of it’s ideas by cultural osmosis.  I think I’m not alone, it’s a book that’s very popular by reputation, but perhaps not actually very influential in terms of it’s ideas actually being internalized by project managers and architects.

Or as Brooks himself said:

Some people have called the book the “bible of software engineering.” I would agree with that in one respect: that is, everybody quotes it, some people read it, and a few people go by it.

Ha. I should really get around to reading it, I routinely run into things that remind me of the ideas I understand from it that I’ve just sort of absorbed (perhaps inaccurately).

In the meantime, here’s another good quote from Brooks to stew upon:

The ratio of function to conceptual complexity is the ultimate test of system design.

Quite profound really. Terribly frustrating to work with software packages can, I think, almost always be described in those terms: The ratio of function to conceptual complexity is far, far too low.  That is nearly(?) the definition of a frustrating to work with software package.

Posted in General | Leave a comment

bittorrent for sharing enormous research datasets

academictorrents.com says:

We’ve designed a distributed system for sharing enormous datasets – for researchers, by researchers. The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds.

There are data sets from researchers are several respected universities listed, including the University of Michigan and Stanford.

Posted in General | Leave a comment

technical debt/technical weight

Bart Wronski writes a blog post about “technical weight”, a concept related to but distinct from “technical debt.”  I can associate some of what he’s talking about to some library-centered open source projects I’ve worked on.

Technical debt… or technical weight?

…What most post don’t cover is that recently huge amount of technical debt in many codebases comes from shifting to naïve implementations of agile methodologies like Scrum, working sprint to sprint. It’s very hard to do any proper architectural work in such environment and short time and POs usually don’t care about it (it’s not a feature visible to customer / upper management)…


…I think of it as a property of every single technical decision you make – from huge architectural decisions through models of medium-sized systems to finally way you write every single line of code. Technical weight is a property that makes your code, systems, decisions in general more “complex”, difficult to debug, difficult to understand, difficult to change, difficult to change active developer.…


…To put it all together – if we invested lots of thought, work and effort into something and want to believe it’s good, we will ignore all problems, pretend they don’t exist and decline to admit (often blaming others and random circumstances) and will tend to see benefits. The more investment you have and heavier is the solution – the more you will try to stay with it, making other decisions or changes very difficult even if it would be the best option for your project.…





Posted in General | Leave a comment