Frank DENIS random thoughts.

Gettext and accents in po templates

.po templates (.pot files) are supposed to contain english messages. Yup, you are supposed to write applications in english and then to translate them in various languages. A nice side effect is that english uses only ASCII characters, so translators have no charset issue in order to read these texts.

But sometimes, you already have an application in another language, and localization is something you want to do afterwards.

So you can end up with non-ASCII characters in msgid strings of the initial po template. msgfmt and msgconv may warn you that english would be a better choice, it seems to work anyway.

Unfortunately, on some Linux distributions, gettext will fail to find the right message if there’s an accent or other non-ASCII characters in the text you are looking for. It perfectly works on OpenBSD, it perfectly works on OSX, it perfectly works on Gentoo, but it miserably fails on Debian-stable. It looks like on Debian-stable, the gettext tools and the glibc functions disagree on the way hashes are computed.

The fix is to override the glibc functions with the gettext functions. Before starting php, apache or whatever, preload the preloadable_libintl.so library.


env LD_PRELOAD=/usr/lib/preloadable_libintl.so php ....

Another way is to create the catalog without hashes :


msgfmt --no-hash ...

But you end up with something very slow if you have a large catalog. Preloading libintl is probably the best way as it brings back the same behavior as most other environments.

If you are using gettext() with PHP, I’d strongly recommend to avoid Eaccelerator. It seems to badly mix locales, sometimes you get the message you asked for, sometimes you get no translation. If you are experiencing odd things with PHP and gettext(), disable your accelerator, just to check whether it could be the culprit. On the other hand, APC and Xcache don’t seem to have any issue with gettext.