Maybe beware of microservices

In a comment on my long, um,  diatribe last year about linked data, Eric Hellman suggested “I fear the library world has no way to create a shared technology roadmap that can steer it away from dead ends that at one time were the new shiny,” and I responded “I think there’s something to what you suggest at the end, the slow-moving speed of the library community with regard to technology may mean we’re stuck responding with what seemed to be exciting future trends…. 10+ years ago, regardless of how they’ve worked out since. Perhaps if that slow speed were taken into account, it would mean we should stick to well established mature technologies, not “new shiny” things which we lack the agility to respond to appropriately.”

I was reminded of this recently when running across a blog post about “Microservices”, which I also think were very hyped 5-10 years ago, but lately are approached with a lot more caution in the general software engineering industry, as a result of hard-earned lessons from practice.

Sean Kelly, in Microservices? Please, Don’t does write about some of the potential advantages of microservces, but as you’d expect from the title, mainfully focuses on pitfalls engineers have learned through working with microservice architectures. He warns:

When Should You Use Microservices?

“When you’re ready as an engineering organization.”

I’d like to close by going over when it could be the right time to pivot to this approach (or, if you’re starting out, how to know if this is the right way to start).

The single most important step on the path to a solid, workable approach to microservices is simply understanding the domain you’re working in. If you can’t understand it, or if you’re still trying to figure it out, microservices could do more harm than good. However, if you have a deep understanding, then you know where the boundaries are and what the dependencies are, so a microservices approach could be the right move.

Another important thing to have a handle on is your workflows – specifically, how they might relate to the idea of a distributed transaction. If you know the paths each category of request will make through your system and you understand where, how, and why each of those paths might fail, then you could start to build out a distributed model of handling your requests.

Alongside understanding your workflows is monitoring your workflows. Monitoring is a subject greater than just “Microservice VS Monolith,” but it should be something at the core of your engineering efforts. You may need a lot of data at your fingertips about various parts of your systems to understand why one of them is underperforming, or even throwing errors. If you have a solid approach for monitoring the various pieces of your system, you can begin to understand your systems behaviors as you increase its footprint horizontally.

Finally, when you can actually demonstrate value to your engineering organization and the business, then moving to microservices will help you grow, scale, and make money. Although it’s fun to build things and try new ideas out, at the end of the day the most important thing for many companies is their bottom line. If you have to delay putting out a new feature that will make the company revenue because a blog post told you monoliths were “doing it wrong,” you’re going to need to justify that to the business. Sometimes these tradeoffs are worth it. Sometimes they aren’t. Knowing how to pick your battles and spend time on the right technical debt will earn you a lot of credit in the long run.

Now, I think many library and library industry development teams actually are pretty okay at understanding the domain and workflows. With the important caveat that ours tend to end up so complex (needlessly or not), that they can be very difficult to understand, and often change — which is a pretty big caveat, for Kelly’s warning.

But monitoring?  In library/library industry projects?  Years (maybe literally a decade) behind the software industry at large.  Which I think is actually just a pointer to a general lack of engineering capabilities (whether skill or resource based) in libraries (especially) and the library industry (including vendors, to some extent).

Microservices are a complicated architecture. They are something to do not only when there’s a clear benefit you’re going to get from them, but when you have an engineering organization that has the engineering experience, skill, resources, and coordination to pull off sophisticated software engineering feats.

How many library engineering organizations do you think meet that?  How many library engineering organizations can even be called ‘engineering organizations’?

Beware, when people are telling you microservices are the new thing or “the answer”. In the industry at large, people and organizations have been burned by biting off more than they can chew in a microservice-based architecture, even starting with more sophisticated engineering organizations than most libraries or many library sector vendors have.

Posted in General | Leave a comment

“Internet Archive Successfully Fends Off Secret FBI Order”

https://theintercept.com/2016/12/01/internet-archive-fends-off-secret-fbi-order-in-latest-victory-against-nsls/

A DECADE AGO, the FBI sent Brewster Kahle, founder of the Internet Archive, a now-infamous type of subpoena known as a National Security Letter, demanding the name, address and activity record of a registered Internet Archive user. The letter came with an everlasting gag order, barring Kahle from discussing the order with anyone but his attorney — not even his wife could know.

But Kahle did eventually talk about it, calling the order “horrendous,” after challenging its constitutionality in a joint legal effort with the Electronic Frontier Foundation and the American Civil Liberties Union. As a result of their fight, the FBI folded, rescinding the NSL and unsealing associated court records rather than risk a ruling that their surveillance orders were illegal. “This is an unqualified success that will help other recipients understand that you can push back on these,” Kahle told reporters once the gag order was lifted.

The bureau continued to issue tens of thousands of NSLs in subsequent years, but few recipients followed in Kahle’s footsteps. Those who did achieved limited but important transparency gains; as a result of one challenge, a California District Court ruled in 2013 that the everlasting gag orders accompanying NSLs are unconstitutional, and last year Congress passed a law forcing the FBI to commit to periodically reviewing such orders and rescinding them when a gag is no longer necessary to a case.

Now, Kahle and the archive are notching another victory, one that underlines the progress their original fight helped set in motion. The archive, a nonprofit online library, has disclosed that it received another NSL in August, its first since the one it received and fought in 2007. Once again it pushed back, but this time events unfolded differently: The archive was able to challenge the NSL and gag order directly in a letter to the FBI, rather than through a secretive lawsuit. In November, the bureau again backed down and, without a protracted battle, has now allowed the archive to publish the NSL in redacted form.…

Posted in General | Leave a comment

“Harvesting Government History, One Web Page at a Time”

http://www.nytimes.com/2016/12/01/nyregion/harvesting-government-history-one-web-page-at-a-time.html

With the arrival of any new president, vast troves of information on government websites are at risk of vanishing within days. The fragility of digital federal records, reports and research is astounding.

No law protects much of it, no automated machine records it for history, and the National Archives and Records Administrationannounced in 2008 that it would not take on the job.

“Large portions of dot-gov have no mandate to be taken care of,” said Mark Phillips, a library dean at the University of North Texas, referring to government websites. “Nobody is really responsible for doing this.”

Enter the End of Term Presidential Harvest 2016 — a volunteer, collaborative effort by a small group of university, government and nonprofit libraries to find and save valuable pages now on federal websites. The project began before the 2008 elections, when George W. Bush was serving his second term, and returned in 2012.

It recorded, for example, the home page of the United States Central Command on Sept. 16, 2008, and the State Department’s official blog on February 13, 2013. The pages are archived on servers operated by the project, and are available to anyone.

The ritual has taken on greater urgency this year, Mr. Phillips said, out of concern that certain pages may be more vulnerable than usual because they contain scientific data for which Mr. Trump and some of his allies have expressed hostility or contempt.

Posted in General | Leave a comment

Three articles on information ethics and power

Today I happened to come across three very good articles which to me all seemed to form a theme: Ethical and political considerations of information and information technology.

First, Weaponized data: How the obsession with data has been hurting marginalized communities

Consider contexts and who is driving the data: The problem of people not from communities affected by communities making decisions for those who are is very prevalent in our field, and the work around data is no exception. Who created the data? Was the right mix of people involved? Who interpreted the data? The rallying cry among marginalized communities is “Stop talking about us without us,” and this applies to data collection and interpretation.

I think there’s deeper things to be said about ‘weaponized data’ too that have been rattling around in my brain for a while, this essay is a useful contribution to the mix.

For more on measurement and data as a form of power and social control, and not an ‘objective’ or ‘neutral’ thing at all, see James C. Scott’s Seeing Like a State, and the works of Michel Foucault.

Second, from Business Insider, Programmers are having a huge discussion about the unethical and illegal things they’ve been asked to do by Julie Bort.

I’m not sure I buy the conclusion that “what developers really need is an organization that governs and regulates their profession like other industries have” — professional licensure for developers, you can’t pay someone to write a program unless they are licensed? I don’t think that’s going to work, and it’s kind of the opposite of democratization of making software that I think is actually important.

But requiring pretty much any IT program anywhere to include 3 credits of ethics would be a good start, and is something academic credentialing organizations can easily do.

“We rule the world,” he said. “We don’t know it yet. Other people believe they rule the world but they write down the rules and they hand them to us. And then we write the rules that go into the machines that execute everything that happens.”

I don’t think that means we “rule the world”. It means we’re tools.  But increasingly important and powerful ones. Be careful who’s rule you are complicit with.

Thirdly and lastly but not leastly, a presentation by Tara Robertson, Not all information wants to be free. (Thanks for the link Sean Hannan via facebook).

I can’t really find a pull quote to summarize this one, but it’s a really incredible lecture you should go and read. Several case studies in how ‘freeing information’ can cause harm, to privacy, safety, cultural autonomy, and dignity.

This is not a topic I’ve spent a lot of time thinking about, and Robertson provides a very good entry to it.

The original phrase “information wants to be free” was not of course meant to say that people wanted information to be free. Quite the opposite, it was that many people, especially people in positions of power did not want information to be free — but it is very difficult to keep information under wraps, it tends toward being free anyway.

But yes, especially people in positions of power — the hacker assumption was that the digital era acceleration of information’s tendency toward unrestricted distribution would be a net gain to freedom and popular power.  Sort of the “wikileaks thesis”, eh?  I think the past 20 years have definitely dashed the hacker-hippy techno-utopianism of Steward Brand and Mondo 2000 in a dystopian world of state panopticon, corporate data mining (see the first essay on data as a form of power, eh?), information overload distraction and information bubble ignorance.

Information may want to be free, but the powerful aren’t the only ones that are harmed when it becomes so.

Still, while it perhaps makes sense for a librarian’s conference closing lecture, I can’t fully get behind Robertson’s conclusion:

I’d like to ask you to listen to the voices of the people in communities whose materials are in the collections that we care for. I’d also like to invite you to speak up where and when you can. As a profession we need to travel the last mile to build relationships with communities and listen to what they think is appropriate access, and then build systems that respect that.

Yes, and no. “Community’s” ideas of “appropriate access” can be stifling and repressive too,  as the geeks and queers and weirdos who grew up to be hackers and librarians know well too.   Just because “freeing” information can do and has done  real harm to the vulnerable, it doesn’t mean the more familiar story of censorship as a form of political control by the powerful isn’t also often true.

In the end, all three of these essays I encountered today, capped off by Robertson’s powerful essay, remind us that information is power, and, like all power, it’s formation and expression and use is never neutral, it has real consequences, for good and ill, intended and unintended. Those who work with information need to think seriously about their ethical responsibilities with regard to that power they wield.

Posted in General | Leave a comment

Rubyland: A new ruby news and blog feed aggregator

So I thought there should be a site aggregating ruby rss/atom feeds. As far as I’m aware, there hasn’t been a really maintained one for a couple years now.

So in my spare time on my own, I made one, that worked the way I wanted. http://www.rubyland.news.

The source is open at github.

I’ve got a few more features planned still.

It’s running on a free heroku dyno with a free postgres. This works out — the CPU needs of an RSS aggregator are not very high, so this works out. But it does limit things in some ways, such as no SSL/https.  If any organization is interested in sponsoring rubyland for a modest contribution to pay for hosting costs and make more things possible, get in touch.

Most people seem to approach feed aggregators with a tool that produces static HTML. I decided to make a dynamic site to make certain things possible/easier, and use the tools I knew. But since the content is of course mostly static, there’s a lot of caching going on. Rails fragment caching over the entire page, as well as etags delivered to browsers.

Some other interesting features of the code include: flexbox for responsive display with zero media queries, which was fun (although I think I’ll have to add a media query for the a UI element I’m going to add soon); reddit API for live comments count on /r/ruby; and feedjira providing a great assist in dealing with feed idiosyncracies.

But beyond the code (which was fun to write), I’m hoping the Rubyland aggregator can be a valuable resource for rubyists and help (re-)strenghten the ruby online community, which is in a bit of a weird state these days.

Posted in General | Tagged | Leave a comment

flexbox is so nice there’s really no need for ‘grid’ frameworks anymore

That’s really all I got to say. I guess I should start tweeting or something.

Posted in General | Leave a comment

Resolving relative URLs to base in ruby

You ever have to resolve relative URLs in ruby?  I did.

It’s not clear if the stdlib URI can do this — I think not, universally.

But the awesome addressable gem can — in tested RFC 3986 way nonetheless!  Although it wasn’t immediately obvious to me what the correct API to use in addressable was. But it’s simply join, also aliased as +.

base = Addressable::URI.parse("http://example.com")
base + "foo.html"
# => #<Addressable::URI:0x3ff9964aabe4 URI:http://example.com/foo.html>

base = Addressable::URI.parse("http://example.com/path/to/file.html")
base + "relative_file.xml"
# => #<Addressable::URI:0x3ff99648bc80 URI:http://example.com/path/to/relative_file.xml>

base = Addressable::URI.parse("https://example.com/path")
base + "//newhost/somewhere.jpg"
# => #<Addressable::URI:0x3ff9960c9ebc URI:https://newhost/somewhere.jpg>

base = Addressable::URI.parse("http://example.com/path/subpath/file.html")
base + "../up-one-level.html"
=> #<Addressable::URI:0x3fe13ec5e928 URI:http://example.com/path/up-one-level.html>

It looks like the route_to and route_from methods can be used to go the other way, make a URL into a relative URL, relative to some base. But I haven’t played with them.

Posted in ruby | Tagged | Leave a comment

Rails5 (and earlier?) ActiveRecord::Enum supports strings in db

I’ve often wanted to use ActiveRecord::Enum for the features it provides like query methods and value restrictions — but wanted to map to strings in the database, rather than integers. Whatever hypothetical performance or storage advantage you get from using integers, I’d rather trade it for more easily human-readable database values, and avoiding the Enum ‘gotcha’ where db values depend on the order the enumerated values are listed in your model in the basic Enum usage case.

So I took a look at the source to see how hard it would be to add — and it looked like it wouldn’t be hard at all, it looked kind of like it should work already!  So okay, let’s add some tests and see if there are any problems. Find the test file in master — and hey, look at that, it’s already sets up a test model with string values in master, and tests em.

The tests were added in this commit from (rails committer?) sgrif, which is in Rails 5.0.0.beta2 and on. The commit message suggests it’s fixing a bug, not adding a new feature, and indeed only adds 7 new lines to enum.rb source.

It says it ‘fixes’ a (not directly merged) Pull Request #23190, which itself says it fixes Issue #23187 , with all of this code reverting each other and other commits. Which all have pretty terse comments, but seem to be about db-defined default values in Enums, rather than String vs Integer. But if they were only meant to solve that problem, why did @sgrif add tests for string values here?

So maybe Enum has supported String mappings for quite a while? But maybe buggily and without tests? Or was it a feature added in Rails 5, but buggy until 5.0.0.beta2? I actually tried it out in Rails 4.2.7, and it seemed to work in SQLite3. Including with db default values. But maybe with edge case bugs in at least some db adapters, that are fixed in Rails 5?

It’s not in the docs at all.

In my opinion, committing new code without updating docs to match is just as much a sin as committing new code without tests. I’m not sure if everyone (or Rails core) shares my opinion.  But it’s not like this was an out of the way doc reference that someone wouldn’t have noticed, it’s the module header docs themselves which are missing a reference to non-integer values!  But maybe the committer considered this a bugfix rather than a new feature so didn’t think they needed to consider doc updates. Hard to say. The mysteries of Rails!

I commented in github asking, cause I’m curious. 

In any event, it seems safe to use Enum with String values in Rails 5 — there are test cases and everything. And likely will even work in Rails 4.2.7, although there are no test cases in Rails 4.2.x and there may be edge case bugs I didn’t run into? The docs should really be updated. If you’re looking to have your name in the commit logs for Rails, maybe you want to update the docs? Or maybe I’ll get around to it.

update Sep 7

So, interestingly, this also seems to work just fine with a database column that’s been defined as a Postgres Enumerated Type. The Rails PostgreSQL Guide says: “Currently there is no special support for enumerated types. They are mapped as normal text columns.”

However, in fact, since ActiveRecord maps Postgres enumerated types as string/text, so long as ActiveRecord::Enum supports string/text values as it seems to, it looks like you can use ActiveRecord::Enum as a cover for it on the Rails side, getting restricted values (on the Rails side), query methods, automatic scopes, etc.  Which is pretty nice. I tested this on Rails 4.2.7 and 5.0 and was unable to find any problems.

However, the bad news is that Rails team has said in response to a doc PR  “This is left out of the documentation because while it happens to work, this isn’t something that we want to explicitly support, and reserve the right to break in the future.”

So that is what it is. This is a weird situation, because it doesn’t make sense to write a third-party plugin for Rails to add this feature — it wouldn’t include any code at all, because the feature already works just fine. (I suppose I could copy and paste the Enum implementation into the plugin and give it different names? That seems silly though). It’s there, it works fine — but may go away in the future, so use at your own risk.

I gotta say I’m mystified by Rails decision-making approach these days. They throw in a variety of 0fficially supported features  that nobody but dhh understands any use cases for  (thread_mattr_accessor anyone?), but a feature that lots of people have asked for, that is actually already present and working with test cases — they say they don’t want to explicitly support. (And it’s not like something being doc’d and ‘supported’ usually stops Rails from removing it in the next version before anyway!).  Go figure.  I assume if dhh ever ran into the cases we have where he wanted this, it would magically become ‘explicitly supported’.

Hmm, maybe I will just copy the enum.rb implementation into my own gem, I dunno.

 

 

 

Posted in General | 2 Comments

Mythical Man-Months et al

I’ve never actually read Fred Brooks’ Mythical Man-Month, but have picked up many of it’s ideas by cultural osmosis.  I think I’m not alone, it’s a book that’s very popular by reputation, but perhaps not actually very influential in terms of it’s ideas actually being internalized by project managers and architects.

Or as Brooks himself said:

Some people have called the book the “bible of software engineering.” I would agree with that in one respect: that is, everybody quotes it, some people read it, and a few people go by it.

Ha. I should really get around to reading it, I routinely run into things that remind me of the ideas I understand from it that I’ve just sort of absorbed (perhaps inaccurately).

In the meantime, here’s another good quote from Brooks to stew upon:

The ratio of function to conceptual complexity is the ultimate test of system design.

Quite profound really. Terribly frustrating to work with software packages can, I think, almost always be described in those terms: The ratio of function to conceptual complexity is far, far too low.  That is nearly(?) the definition of a frustrating to work with software package.

Posted in General | Leave a comment

bittorrent for sharing enormous research datasets

academictorrents.com says:

We’ve designed a distributed system for sharing enormous datasets – for researchers, by researchers. The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds.

There are data sets from researchers are several respected universities listed, including the University of Michigan and Stanford.

Posted in General | Leave a comment