The importance of unicode support in nodeJS for dev team environments (A tale of Woe)

We've been making use of the angular-gettext module to provide translation functionality on angular sites recently. It works quite nicely, giving a simple <translate> directive which can be used in templates to indicate strings for translation.

It comes with a companion grunt package called grunt-angular-gettext, which takes care of scanning your template and JS files for translatable strings and compiling these into a gettext compatible template.pot file.

This all works quite nicely, except recently we've been discovering the template.pot file has differences when generated on different developers machines/environments. These differences have been observed on the same machine running different versions of nodeJS (v0.10 - 0.12), and on machines in different environments (nodeJS 0.12 on Mac OS X and debian linux), and recently machines running the same OS and nodeJS version (Mac OS X node v0.12). The linux version is installed from the prebuilt x64 binary on the node website, and the Mac versions installed via homebrew brew install node.

Since this file is committed to version control, this ends up introducing a lot of annoying diffs in pull-request and resulting merge conflicts. It also makes actually reviewing any translation changes impossible due to the large number of other diffs. You could argue that since this file is auto-generated, it shouldn't be versioned, but having it in the repository helps to make sure that the translation ".po" files are also up to date.

On further investigation, I traced the source of the issue to a line in the angular-gettext-tools package in the lib/extract.js file. This line sorts the array of translation keys found by the extract process using the javascript string.localeCompare() function. It appears that localeCompare is performing differently on different environments, and changing the sort order of the keys.

According to the info I can find on how localeCompare works, it seems that it is entirely 'implementation dependent'. Basically this means a browser has a choice about how it implements it - however one would assume that node would be consistent in its implementation since it uses the same V8 JS engine everywhere.

Some initial research showed this is a problem others are running into. There's some useful reading in the following links if you have the time to pick through:

The last link implies there was an update in node 0.12 which changed this behaviour, explaining why we see differences in the different node versions. There is very little documentation on how "localeCompare" works, but as far as I can tell, nodeJS delegates some of this functionality to the underlying OS, which is why the same version on different operating systems can produce different results. Changing the unix locale settings doesn't seem to make a difference, and reading those github/google groups discussions, it looks like V8 ignores the locale settings. So V8 defers some localisation functionality to the underlying system, but we can't influenced the choice of locale in the system.

The additional parameters to localeCompare (see MDN docs) provide some hint as to what may be happening here. The way the comparison is done can be changed with these extra parameters, and MDN provides a quick function to check for compatibility:


function localeCompareSupportsLocales() {
  try {
    'a'.localeCompare('b', 'i');
  } catch (e) {
    return e.name === 'RangeError';
  }
  return false;
}

Running this in the node console shows that the prebuilt linux binary version does have support and the brewed Mac OS X version does not. Re-installing the Mac version using "brew install node --with-icu4c" (International Components for Unicode) rectifies this and then sorts in the same way as the linux version. Even though the brew default is to not build with unicode support, it seems like this should be the best solution, since that is what the official linux binaries ship with. I've not yet determined whether adding this support makes node react to system locales yet. If it does, then it could again introduce complications if one development machine is configured with a different locale setting. In the end, probably the best solution is to use a consistent dev environment when working in a team, such a a vagrant box or Docker container.

If anyone has had similar issues or has other suggestions for solutions, (environment variables, system config etc.) I'd love to hear it in the comments below.