Django i18n tricks

Sun 19 September 2010

In Belgium, we have three national languages : French, Dutch and German. Consequently, when you create a web application for Belgian people, you HAVE to think about Internationalization, that is, preparing you application for the translation, and Localization, that is, really translating all the text and changing date formats and such for a given language. By the way, for those that are not in the know, as me one year ago, internationalization is often abbreviated to i18n, because there are 18 letters between the i and the n, and localization is abbreviated to l10n.

So for ZoFa, which is a Django project, I endured this lengthy process. There are not so many resources on the subject, so I thought I could share a few tricks.

My first advices would be: if you can avoid i18n in your project, do so, it adds a large amount of work and will make you less agile for the following steps, since every change on your site will need to be translated and checked in every available language (two available languages = two times more typos). That said, if you have to do it, take it into account from the beginning, since going through every one of your project file to mark strings for translations is not really the definition of fun. Finally, delay l10n until the very end, since every change in your templates will force you to redo some previous translation.

My second advice would be to read the Django i18n doc thoroughly, which as usual is very well written. You will learn that Django is bundled with a set of tools that, once you have marked strings for translation in your code and templates, generates the translation files. Django uses in the background the battle tested GNU gettext tools. You will get translation files full of snippets like this :

#: templates/shifts/home.html:91
msgid "Still no events ..."
msgstr ""

All you still have to do, basically, is to fill in the blanks in front of msgstr in the file with the translation of your choice. By the way, this is really a big improvement on the tools that I have been using in Java, where you have to write the whole translation files by hand. Notice also that each string is marked with its original place, which allows you to understand the context of the text, which can completely change the needed translation.

Those files are named django.po. My third advice would be to avoid editing those files with your usual text editor. It is possible, but it is far too easy to introduce syntax errors that will render the file useless, especially because error messages from the gettext tools are not super user-friendly. There exist specialized tools to edit the .po files, like poedit, and they really help to streamline the process (they make it possible to handle the translation to non techies). That said, one big pitfall is that a translated string has to contain the same number of carriage return than the original string. It is not very natural to take this constraint into account while translating, especially for carriage return that are located at the end of the string.

One nagging question for me while translating, is to find at which level I should split my text for translation: at the word, sentence or paragraph level ? The word level is only meaningful for small texts like 'Signup:', 'Login:' and so forth (quick note: include the columns into your strings, since the number of blanks required before and after columns is language dependent). Usually, the good level is the sentence level, but I often put whole paragraphs as one translation keys, for example in my FAQ, because I do not want the same word to be translated differently in two parts of the paragraph because it was split in the translation file. Another complication is that you do not want to make too much html tags appearing in your translation. Ideally, you would like to never have html tags in you translation file, since the translation process could insert bug into your code, but for text like the following, splitting the text to avoid enclosing the tags would not make sense:

View Saved Schedules and <strong>Publish</strong> the schedules of your services

I found that looking at existing Django translation files, which you can easily find on transifex, is interesting to see how other people are coping with those problems. Here is for example a link to the translation file of Bitbucket

One obvious problems with translation is the fact that the same text in two different languages does not have the same length (there is one interesting question about this on stackexchange). This can break the layout of your pages in no time, and so, my next advice is to review every translated page after translation.

Furthermore, since it does not make much sense to have a screenshot in English on a page written in French, it is useful to provide one file per language. You can use the i18n machinery to also localize the paths to static files, by marking the path as strings to be translated.

More specifically to Django, let me add a quick note about inserting url into translated text using the url template tag. I long thought it was not possible, but I just had the answer to my questions on the #django irc channel (which is by the way a very good resource when you need information, just do not forget to contribute back by answering a few questions if you can). Back to the url question : you can do it that way :

{% url name_of_the_view param1,param2 as the_url %}{% blocktrans %}{{ the_url }}{% endblocktrans %}</li>

Which by the way revealed to me a way to create local variable in the template using the as keyword, a feature I do not remember reading about in the doc, and about which I am unable to find any doc right now. Too bad, it is an interesting feature.

Finally, be aware that there are lots of i18n problems that I did not talk about in this blog post, mainly because I have still not encountered them or solved them for my own project :

  • i18n and search engine optimization: it would be better if every translated version of your site were visible by search engine. The classical solution seems to be the insertion of a language component in your paths:/en/news/ or /fr/news for example. This kind of mechanism is not baked in into Django, but there exist some pluggable apps that could help you, like django-localeurl
  • l10n of content stored in your data base (e.g. the slugs...): This problem seems complicated but there also exist some pluggable apps that could help like django-multilingual or django-transmeta

Thanks to BenoƮt Bryon for transmitting me his slides from Djangocong on the subject.

P.S. : One day, I would like to create a tool that parses all the files in a Django project to detect the strings that have NOT been marked for translation (using beautifulsoup for the html parsing ? ). If somebody wants to pick this idea before me, do not hesitate...

Feed - About me

comments powered by Disqus