Account services

Multilingual techniques in content management systems

Default language

Eionet sites must fall back to English if there is no Accept-Language line, or it contains only languages not supported by the website. This is better than RFC2616's recommendation to send the 406 error code.

Source language

The source language of Eionet sites is English. There can not be no source language. What does this mean? It means, that if the content manager changes the text in the source language, all the other translations of the text must be flagged as obsolete. This does not apply in the other direction.

Consider a website that has been translated into five languages already. Now the webmaster has discovered a usability issue. When visitors see "Sites", they don't realize the words also covers seas. He therefore updates the English texts from "sites" to "sites and seas" wherever there is space for the longer text.

Now, the system can either: 1) require him to painstakingly keep notes over which texts have changed, and then tell the translators which ones need reviewing, 2) let the translators review all 6000 texts for differences from the English version or 3) have the system keep track of which translations are obsolete with regards to the English text.

Coding calls to the translation

A webapplication is composed of many small texts, and the multilinguality functionality can in essence be implemented in two ways, shown with side-effects:

To use the text in the source language as the key in the same database:

Welcome to EionetdeWillkommen bei Eionet
Welcome to EionetdaVelkommen til Eionet

In the source code you would see PHP code like this: <h1><?php translate('Welcome to Eionet',$lang)?></h1>

Sideeffects:

  1. Can require some pretty long lookup keys. This is not a problem as you can hash (MD5 or something else) the key before making the database query.
  2. If the system can't contact the database or is unable to do the translation it can fall back and show the source text.
  3. The is better separation between the developer and the translator. The only thing the developer has to do is to put the same function call around the texts needing translation. In GNU gettext this function call is shortened to '_("text")'.
  4. If you want to change the text you must do it in the source code, and not in the database. This is not necessarily a bad thing: Uncontrolled editing of the source texts can compromise the system's integrity.

To put all text strings in a database keyed on an identifier and a language code, such as:

headline_01enWelcome to Eionet
headline_01deWillkommen bei Eionet
headline_01daVelkommen til Eionet

In the source code you would see PHP code like this: <h1><?php lookup('headline_01',$lang)?></h1>

Sideeffects:

  1. You can edit all texts. This is nice if you use your translation table as a content management system. Again: Uncontrolled editing of the source texts can compromise the system's integrity.
  2. The technique is more flexible, as two texts that in the source language would be the same word, but in one of the translations would be two words, can be coded as two different identifiers.
  3. The application doesn't have a natural source language anymore. You have to declare one in the database. And this is where the programmer usually fails. What should happen if the English text is changed from "Welcome to Eionet" to "Welcome to EWindows"? If you don't declare a source language, in this case English, you don't know that the German and Danish texts are obsolete and must be retranslated.
  4. With only short identifiers in the source code where there usually are plain text, the developer's burden is increased. He will have to imagine the user interface when he programs, and each text phrase he needs must be entered into the translation database.

One can also argue that the first technique doesn't require the developer to register his text key in a database for the application to work. This is an inverted argument. Normally a script will be written to harvest all translatable strings from the source files and register them in the message catalog.

Conclusion: It depends on the circumstances, but given the better separation between developer and translator in technique one, this technique is the Eionet best practice.