Ruby 2.2 finally introduces a #unicode_normalize method on strings. Defaults to :nfc, but you can also normalize to other unicode normalization forms such as :nfd, :nfkc, and :nfkd.
Unicode normalization is something you often have to do when dealing with unicode, whether you knew it or not. Prior to ruby 2.2, you had to install a third-party gem to do this, adding another gem dependency. Of the gems available, some money-patched string in ways I wouldn’t have preferred, some worked only on MRI and not jruby, some had unpleasant performance characteristics, etc. Here’s some benchmarks I ran a while ago on available gems giving unicode normalization and performance, although since I did those benchmarks new options appeared and performance characteristics changed , but now we don’t need to deal with it, just use the stdlib.
One thing I can’t explain is that the only ruby stdlib documentation I can find on this, suggests the method should be called just `normalize`. But nope, it’s actually `unicode_normalize`. Okay. Can anyone explain what’s going on here?
`unicode_normalized?` (not just `normalized?`) is also available, also taking a normalization form argument.
The next major release of Rails, Rails 5, is planned to require ruby 2.2. I think a lot of other open source will follow that lead. I’m considering switching some of my projects over to require ruby 2.2 as well, to take advantage of some of the new stdlib like this. Although I’d probably wait until JRuby 9k comes out, planned to support 2.2 stdlib and other changes. Hopefully soon. In the meantime, I might write some code that uses #unicode_normalize when it’s present, otherwise monkey-patches in a #unicode_normalize method implemented with some other gem — although that still requires making the other gem a dependency. Which I’ll admit there are some projects I have that really should be unicode normalizing in some places, but I could barely get away without it, and skipped it because I didn’t want to deal with the dependency. Or I could require MRI 2.2 or jruby latest, and just monkey-patch a simple pure-java #unicode_normalize if JRuby and not String.instance_methods.include? :unicode_normalize.