Universal Language Tool for PHP

Recognizing the problem * The solution * Realization * Language definitions * Language dictionaries * Transliteration * Language families * How to prepare documents * Properties and methods * How to use library * Download * Support forum *Contact


This script is listed on HotScripts. If you use it and you like it, write a review please.

What is ULT

After looking for multilanguage solution we in DataVoyage realized that we need to write our own. Before you comment this as just another multi language support PHP library take a look for some specific usage this library offers.

Univeral Language Tool for PHP is library developed to introduce new concept in multilanguage application development for WEB. It offers functionality which covers unlimited number of languages on single site, but in literal manner. It does not support just widely recognized term of languages but also expands to support language variations. You are provided with tools to use language macros in your documents which are replaced with exact text according to language dictionaries. However, you also are provided with transliteration tool, which allows direct text replacement in document with no need for predefined macros. The replacement is done according to transliteration rules specific for each language.

If you ever developed site which is targeted to audience which uses the same language but with some variations (example: English and American English, or variations of Spanish language, or the same language that uses two scripts, like Serbian Latin and Serbian Cyrillic) you met this problem. Usual multi language solutions force you to treat all these variations as different languages, which makes, not just development, but administration and site maintenance quite complicated tasks. ULT treats all variations as the same language, but it introduces difference rules. This means you are entitled to update site in just one language, and if needed, ULT will alter original document to create its variation according to predefined language variation rules including transliteration.

Recognizing the problem

What is the goal we wanted to be accomplished? Well, most of the multi language libraries are basically simple. You have definitions for language macros for wanted languages, and loaded page knows which one to use to alter page contents. Libraries mostly differ in approaches but essence is the same.

We develop web sites for a long time, and since our clients are mostly in Serbia we are faced with multi language problem on many occasions. Surely we solved it in the same manner as described, but we always had problems with that way.

Serbian language presents specific problem - the language may use two scripts: Cyrillic an Latin. We usually treated this two scripts as two languages and got usable results but not satisfactory. Although two scripts do act as different languages, they are actually not. Differences among scripts are just that - different scripts are used, but everything else is the same (with some minor issues). It is against common sense to treat them as different languages, creating separate dictionaries or bitmap sets or whatever else on site is language dependable, just for minor differences. That would be a nightmare, not just to create but also to maintain.

Although Serbian language is probably unique about using two scripts, other languages do have their variations, take a look at variations of English or Spanish language.

After some thoughts we came to this conclusion: A language is complex and usually it has some variations, but those variations are different in minor ways and usually by following some definite rules and exceptions. Thus, we should not deal with language variations as language themselves, but as such - VARIATIONS of original language. This conclusion is the basic idea for ULT.

We recognized two ways language variations may differ: by dictionary (some words, phrases or spelling differences may be different) and by script (this may be literally as in Serbian that each letter has it's Cyril and Latin representation) or simply by using different code pages).

The solution

After trying several approaches we came to the final solution: to allow definition of language as such with one improvement - we allow language to be defined as child of another language. That allows us to have mechanism to provide basic language definitions and variations to them.

What the script does when language information is needed is to merge basic language with variation differences and thus produce complete language variation as new language. What we got is that you actually have to define and maintain just basic languages for your site and define variations through rules of differences. If you change something in definition of basic language it will reflect all it's variations. This means, if you have to define dictionary, you will have to do it just for basic language. Variation will only have definition for difference rules.

Let's see an example for English language:

There are few differences among English and American English in words spelling. For instance, English word colour is spelled color in American English. ULT allows you to have both variations on your site but there is no need to define whole dictionaries for both. You will have English language fully defined and for American English you will state that it is child of English language with spelling difference for word colour. And that is all. Whenever your page is loaded in American English, dictionary of English language would be used but with variation differences applied. So, resulting document will have all occurrences of word colour replaced with word color.

Realization

We have created ULT PHP class (stored in ult.php) which deals with the problem. It does most of the job automatically. Your job is to initialize object, set some parameters and, of course, define some languages.

Here is an simple example of it's usage.

SOURCE CODE

<?php

//this code deals with passing parameters


if ($HTTP_POST_VARS) {
 foreach(array_keys($HTTP_POST_VARS) as $Var) {
   $$Var=$HTTP_POST_VARS[$Var];
 };
};
 
if ($HTTP_GET_VARS) {
 foreach(array_keys($HTTP_GET_VARS) as $Var) {
   $$Var=$HTTP_GET_VARS[$Var];
 };
};
 
 // include Universal Language Tool library in page code
include_once('ult.php');


// language id parameter is stored in $lang
// we have to initialize it if parameter value not passed
if (! $lang) $lang = 'en'; 


// create language object, this will also load all language
//definitions stored in -lang directory
$dvl = new ULT;


// now set source language (language that is used to design page)
// this is not necessary if you set [is_source_lng] property of rone of the language definitions $dvl->set_source_language ('en'); // set display language (language that is actualy used on page) $dvl->set_display_language ($lang); // start output buffering. This must be done to allow multilanguage processing $dvl->block_start(); ?> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title>ULT Test</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> <p> <?php // this will diplay list of all languages whose definitions are loaded foreach ($dvl->lang_defs as $m_lang) $m_lang_name = $m_lang['name']; $m_lang_id = $m_lang['id']; echo "<a href=\"index.php?lang=$m_lang_id\">$m_lang_name</a> " $m_lang = $dvl->lang_defs[$lang]; } ?> </p>

<p># #LANGUAGE_TEST# #</p>

<p># #Language# #:<?php echo "$m_lang[name] ($lang)" ?><p> <p># #Family# #: <img src="img/lngpic=%en=.gif" width="49" height="12">
<p># #OK# # # #CANCEL# # # #ABORT# # # #CONFIRM# # # #COLOUR# #<p>

</body>
</html>
<?php $dvl->block_end() ?>


EXECUTION RESULT

english english(us) српски srpski(ASCII) srpski(lat)

LANGUAGE TEST

Language: srpski(lat) (en)

Family:

OK Cancel Abort Confirm Colour

 

There is also an example for those who are interested in transliteration within the same language family. Example is for Serbian Cyrillic document.

Language definitions

Other than ult.php you need proper language definitions and language dictionaries created in language directory if you did not set library to load them from elsewhere. We provided some basic definitions for your start.

LANGUAGE DEFINITION: ldef_en.php

<?php
$lng_def['id'] = 'en';
$lng_def['name'] = 'english';
$lng_def['codepage'] = 'utf-8';
?>

Language definition is done by setting array named $lng_def.

Property $lng_def['id'] contains language ID. It is recommended to use standard language IDs. That is convinient as You may read default language from visitor's browser and set that language for display language, but it is actuallz that we all should use standards if they exist to keep things simple and compatible.

Property $lng_def['name'] contains name of the language. This name may be used to present language name in document.

Property $lng_def['codepage'] contains standard sign for code page used for language. You may use it to set correct code page for document. This also may be used for code page conversions if you use different code page for child and parent languages.

Property $lng_def['parent'] is used when you define child language. This property contains ID of parent language.

Property $lng_def['is_source_lng'] contains 1 if language should be used as source language for the site.

Here is an example:

LANGUAGE DEFINITION: ldef_en-us.php

<?php
$lng_def['id'] = 'en-us';
$lng_def['name'] = 'english(us)';
$lng_def['codepage'] = 'utf-8';
$lng_def['parent'] = 'en';
?>

Property $lng_def['parent_conversion_table'] is used for setting language transliteration but this will be explained later.

There is language naming convention. It is consisted of prefix 'ldef_' followed by language ID and extension '.php'. Prefix may be customized by altering value od constant named LANG_DEF_FILE.

Language dictionaries

Language definitions introduce available languages to the library. However, to use those languages you also need language dictionaries. Although definitions and libraries might be in the same file there is a good reason to separate them. This allows you to load dictionaries only for language used on single document and preserve resources. Also, usage of several dictionaries for the same language is recommended. There is no need to load whole large dictionary for simple page. It is better to divide dictionary in several parts and load only those which are really needed in a document. You may use function add_dictionary () for this purpose.

Here is an example of dictionary for english language.

LANGUAGE DICTIONARY: ldct_en.php

<?php
$lng['OK']='OK';
$lng['CANCEL']='Cancel';
$lng['COLOUR']='Colour';
$lng['family']='family';
$lng['LANGUAGE_TEST']='LANGUAGE TEST';
$lng['Language']='Language';
$lng['ABORT']='Abort';
$lng['CONFIRM']='Confirm';
$lng['Universal Language Tool for PHP']='Universal Language Tool for PHP';
$lng['ULT']='ULT';
$lng['SOURCE_CODE']='SOURCE CODE';
$lng['EXECUTION_RESULT']='EXECUTION RESULT';
$lng['LANGUAGE_DEFINITION']='LANGUAGE DEFINITION'; $lng['LANGUAGE_DICTIONARY']='LANGUAGE DICTIONARY'; ?>

And an example of dictionary for variation of English language - American English.

LANGUAGE DICTIONARY: ldct_en-us.php

<?php
$lng['COLOUR']='Color';
?>

Simple, isn't it? Variation contains just difference definition in dictionary. Library will load parent language first (which is, by definition, English language) and then apply dictionary for American English, which actually alters definition for COLOUR language macro.

As you can see, dictionary contains definition for $lng array. This is used to build in memory dictionary for specified language. You must put language macros in document to use dictionary. Macros are placed by putting macro name between double hash symbol, like ##LOGIN##. Macro as such this will be replaced with value defined in dictionary (in this document this one is not replaced because we deliberately left it undefined in dictionary). If you change language, dictionary would change, and value of the macro presented in output document will be changed according to the language.

This method we described before as "by dictionary" differences. It will do for most purposes and is especially suitable for menu options or other easily recognized elements on page. It is preffered method due to lower resource usage and speed.

Transliteration

As we spoke before, other method to define differences is "by script", which we call transliteration. Basically it means possibility to alter character from one language script with character from another. Our example is transliteration from cyrillic to latin script and vice versa.

However, transliteration is not limited to characters, it allows transliteration of whole group of characters too. This provides much more power to the library. You may transliterate whole words or expressions. It became another form of the dictionary but with significant difference: dictionary definitions work only on predefined language macros, transliteration works on free text.

Transliteration may be set in both language definition and dictionary. It is recommended to place transliteration definitions in language definition file if those definitions are needed all the time, for instance character conversions.

Here is an example of definition for Serbian Latin. It is child of Serbian language (which uses Cyril alphabet) and transliteration defines difference. Actually it replaces Cyrillic characters with Latin ones. Due to number of characters this definition is long. Since this transliteration is a must for this language to work, it is placed in language definition under property lng_def['parent_conversion_table'] which we mentioned before. When transliteration like this is used there is no need for dictionaries. Serbian language is the same, no mater of script, so dictionaries loaded for parent Serbian language will do all the job. This variant will just replaces Cyrilic with Latin letters.

If you are familiar with Serbian language pay attention to this: this definition for Serbian Latin also solves problem of digraphs. Some Cyril letters do not have Latin counterparts but are replaced with pairs of Latin letters. This tool easily solves that problem and also another one, problem of conversion of capital Cyril character to Latin digraph. It successfuly resolves capitalisation issue for such cases.

LANGUAGE DEFINITION: ldef_sr-lat.php

<?php
$lng_def['id'] = 'sr-lat';
$lng_def['name'] = 'srpski(lat)';
$lng_def['parent'] = 'sr';
$lng_def['codepage'] = 'utf-8';
$lng_def['parent_conversion_table'] = array (
"А" => "A",
"Б" => "B",
"В" => "V",
"Г" => "G",
"Д" => "D",
"Ђ" => "Đ",
"Е" => "E",
"Ж" => "Ž",
"З" => "Z",
"И" => "I",
"Ј" => "J",
"К" => "K",
"Л" => "L",
"Љ" => "LJ",
"М" => "M",
"Н" => "N",
"Њ" => "NJ",
"О" => "O",
"П" => "P",
"Р" => "R",
"С" => "S",
"Ш" => "Š",
"Т" => "T",
"Ћ" => "Ć",
"У" => "U",
"Ф" => "F",
"Х" => "H",
"Ц" => "C",
"Ч" => "Č",
"Џ" => "DŽ",
"Ш" => "Š",
"а" => "a",
"б" => "b",
"в" => "v",
"г" => "g",
"д" => "d",
"ђ" => "đ",
"е" => "e",
"ж" => "ž",
"з" => "z",
"и" => "i",
"ј" => "j",
"к" => "k",
"л" => "l",
"љ" => "lj",
"м" => "m",
"н" => "n",
"њ" => "nj",
"о" => "o",
"п" => "p",
"р" => "r",
"с" => "s",
"ш" => "š",
"т" => "t",
"ћ" => "ć",
"у" => "u",
"ф" => "f",
"х" => "h",
"ц" => "c",
"ч" => "č",
"џ" => "x",
"ш" => "š",
"Ња" => "Nja",
"Ње" => "Nje",
"Њи" => "Nji",
"Њо" => "Njo",
"Њу" => "Nju",
"Ља" => "Lja",
"Ље" => "Lje",
"Љи" => "Lji",
"Љо" => "Ljo",
"Љу" => "Lju",
"Џа" => "Dža",
"Џе" => "Dže",
"Џи" => "Dži",
"Џо" => "Džo",
"Џу" => "Džu",
);
?>

But, there may be some use for dictionaries. Rules of Serbian language state that foreign names or expressions should be transliterated into Cyrillic in a way they are pronounced. Therefore Microsoft becomes Мајкрософт, Linux becomes Линукс, and Windows becomes Виндоуз. When transliterating back to Latin characters one may want to transliterate back to original foreign transcript. Thus we may use transliteration but now we can set it within dictionary file. Remember, you may decide when to load dictionary files and manage resource usage in that way.

LANGUAGE DICTIONARY: ldct_sr-lat.php

<?php
$lngt = array (
"Мајкрософт" => "Microsoft",
"Виндоуз" => "Windows",
"имејл" => "e-mail",
"Линукс" => "Linux");
?>

Transliteration in dictionary file is set by defining $lngt array. Our example shows transliteration definition only but it will coexist with dictionary definition in the same file.

On the first glance, it seems that transliteration may replace dictionary, and that is true, but transliteration is, by its principles, slower method. It is recommended to use dictionaries whenever possible. If you can solve case with dictionary, do it. Use transliteration as last resource.

Language families

Several languages may have the same parent language. They are all, in fact, variations of parent language. Languages that share the same parent including parent form language family. The family also has an ID and it is the same as ID of the parent language.

Language families are important. Due to definition it is clear that it may be necessary to fully define only parent language in a family. Variations contain only difference rules, meaning that, when we prepare information that should be included in document, we do not have to prepare them in all variations of the language. It is enough to prepare information in parent language. Variation rules will alter that information if necessary.

This may be very handy when data is gathered from external documents or databases. Instead of having information prepared in advance for each language variation you may have it prepared only for parent language. Whole family of languages may use it and apply variation rules to create final output.

Note that in some occasions, families will not be of use - when gathered information closely depends on variations. How you will deal with it is up to you. With careful planning you may avoid or reduce this situations.

How to prepare documents

First you must decide what is source language for your site. Source language is treated as default one, one which will be showed to the visitor if he do not choose language. The best candidate is the language which you expect to be most used on your site. When source language is displayed the least or none processing is necessary which preserves resources. The significant role in your decision goes to transliteration, choose language which is easier to translate or transliterate to other languages.

When you decided which language is source you should create whole site using that language. You should of course take care about multi language features. Use language macros whenever possible.

Take care about images and other external documents. There are two important macros you may use for naming them.

=<id>=

This is file language id macro. Instead of <id> place id of language. So, filenames should be in form name=en=.jpg, name=en-us=.jpg, name=sr=.jpg, name=sr-lat=.jpg etc. However, in design time, you should use bitmaps naming scheme which corresponds to source language. If your source language is english use name=en=.jpg while designing page. This will allow library to change image name and load one which is appropriate for display language. You may use this macro in names of any external documents that are loaded in page and even links to other documents. Literalz, if you use name=en=.jpg while designing page and it's source language is en (English), then, if you load that page and arder for Italian language, name=it=.jpg will actually be loaded.

=%<id>=

This is file language family id. It acts in similar way as =>id>=, except it treats family id's of source and display language. This is useful when you have to create static documents for languages or query database for information but regarding language.

As we said before, since language variations know how to alter text in parent language there is no need to prepare text for each language but only for parent languages. You have to use this macro to state that you want to load information that suits whole family not just display language. Filenames should be in form name=%en=.jpg, name=%sr=.jpg etc. If display language is en you will get name=%en=.jpg, if display language is en-us you will also get name=%en=.jpg because en is family language. Accordingly, if display languages are sr or sr-lat, or sr-asc, you will get name=%sr=.jpg. Obviuosly this is good if you want to load the same item when loading page within same family language, but want it to change if page is displayed in another language.

Properties and methods

Library offers range of functions to process languages. It also defines global variable called $lang_defs which is array containing definitions of all languages and loaded dictionaries and transliteration rules for source language and at least display language. Remember that if you need, you may fully load several languages and use them all on single page. You may read this variable but do not alter its contents.

Class properties

$source_language

This contains ID of language used in design mode for text, and naming documents. While designing site you use this language for contents. Take care of macros to alter contents for other languages, including ones for external documents.

$display_language

This is language that should be displayed for current user. Library will know how to handle contents prepared in source language and display it in current language. You also may change display language within page to use several languages if needed.

$lang_defs;

Contains info about supported language definitions and dictionaries. Definitions are stored as multidimensional array. The top array is indexed by language id, and each record contains arrays with definitions for specific language. Language definition is also multidimensional array

Structure:

$lang_defs;
  [language id] => Array
  [id] => language id
  [name] => language name, used for description
  [codepage] => codepage that should be set for HTML document to present text using this language
  [is_source_lng] => 1, if this language is source language for site
  [parent] => languge id of parent language
  [dictionary] => Array
    [dictionary item] => value
    [dictionary item] => value
    [dictionary item] => value
    [dictionary item] => value
  [parent_conversion_table] => Array
    [source string] => replacement string
    [source string] => replacement string
    [source string] => replacement string
    [source string] => replacement string

  [language id] => Array
     . . .

This allows using multiple languages on the same page. It is not recommended to store complete dictionaries for multiple languages to reserve memory resources. Store only limited number of items which are needed to be presented in several languages on the same page like language names

function ULT($p_lang_dir)
Constructor function. Call this to create object. Optional parameter is path to directory where language definition files are stored. If not specified it defaults to lang/ directory.

function process($tpl_source)
Language processing:
- replace all language macros with items according to dictionary
- replace all family id's in document (targeted to url's to external documents and links)
- replace all language id's in document (targeted to url's to external documents and links)
- transcribe characters and phrases from one language to another (or one language script to another)

function set_display_language ($p_lang)
Set display language to $p_lang. If language id is invalid, it will set to source language.

function set_default_language ($p_lang)
Set default language to $p_lang. If language id is invalid, it will set first language in array $lang_def.

function set_language ($p_lang, $p_lang_dir)
Set language definition (add sub_array to $lang_defs) from language definition file. You may use this only if you want to load language definition manually.

function load_language_defs($p_lang_dir = '')
Load definitions from all language definition files in target directory. If directory is not specified, current directory is used.

function language_defined ($p_lang)
Returns true if language definition is loaded.

function display_language_has_parent()
Returns true if display language has parent.

function transcribe ($p_text, $p_trans_table)
Do transcription from one script to another using transcription table. This is for internal use. If you need some specific transcription, provide text and transcription table in form used in language definitions.

function transcribe_dictionary ($p_lang, $p_dictionary)
Convert dictionary array from parent language to current. Used to transcribe from different code page or script. If it is possible to create dictionary by transcribing parent's dictionary then do this and do not define whole dictionary again. It will make dictionary maintenance easier. This is for internal use.

function transcribe_parent_language_dictionary ($p_lang)
Convert dictionary of loaded parent language to current. Used to transcribe from different code page or script.

function transcribe_document ($p_lang)
If display language has parent language and source document is written in parent language then we should transcribe document contents from parent language to display language. For internal use.

function filename_from_template ($p_dict_filename, $p_lang)
Creates full file name based on template and language. If language has parent then filename will be created due to parent language. Always use this to load files that corresponds to needed language.

function load_dictionary ($p_dict_filename, $p_lang, $p_add = false)
Load dictionary table from specified file into specified language definition. If $p_add is true then definition is appended to already existing definitions.

function set_dictionary ($p_dict_filename, $p_lang, $p_add = false)
Load dictionary table from specified file into specified language definition. If $p_add is true then definition is appended to already existing definitions. If language has parent, then parent's file will be loaded first and after that child's file will be loaded. This helps dictionary maintenance. Most of the items may be stored in parent's dictionary and only differences in child's dictionary (example, for 'en' as parent and 'en-us' as a child, most of the items are the same but some are not, like 'color'/'colour' pair. This allows you to have item 'colour' in parent dictionary of 'en' among all others and just item 'color' for 'child 'en-us'.

function add_dictionary ($p_dict_filename, $p_lang)
Load dictionary table from specified file into specified language definition. It appends new definition to existing one.

function sync_child_dictionary ($p_lang)
Synchronize child language dictionary with parent's. This will replace child dictionary with parent's (but transcribing it).

function is_family ($p_lang, $p_family)
Check if language belongs to the family of languages (is parent or one of languages with the same parent)

function get_family ($p_lang)
Return family which language belongs to.

function block_start()
Begin output buffering. All output after this will be buffered so language replacements may be made.

function block_end()
End output buffering, do necessary language processing and display altered document.

How to use the library

Example above shows simple way to use this library. It will do the job if you have simple project and just need to add multi language support.

However, ULT is before all a library, not final solution. This just implements mechanism to help you develop multi language support for your site. Therefore it does not contain high end methods. We suppose that you already have developed templating and parameter system and we do not want to interfere with it. You should be able to include this library in your project without need to alter way you are doing things.

You may store application parameters in some other way. Use it, instead of simple method presented in example. You may use sessions to store language information. Feel free to use it. You may have templating system. Use it. You may have database as backhand for your site. Use it. You may even alter way of language and dictionary definition. Feel free to do it if you have better way.

Take care of this: methods block_start() and block_end() are provided just to accomplish minimum functionality. You are supposed to have your own way to handle output buffering, for purpose of templating for instance. If you do have output buffering implemented then do not use these two methods. You should use process() method do execute language processing instead.

What's new

02. Feb. 2004 - First public version released. Version 0.3.9.

27. Feb. 2004 - Documentation updated. Version 0.3.10.

30. march 2004 - Version 0.4.1 released. Global variable $lang_defs is removed and class property $lang_defs is introduced.

Download

Download the latest version of Universal Language Tool for PHP.

Credits

If you use this tool on your site, give us a credit. Also let us know that you use it, how it helps you and if you need some improvements. We know that number of sites use this tool, and we would like to have some feedback from them.

Implementations

Zotlan Csala wrote short and simple explanation how to use ULT in WordPress to allow transliteration from Serbian Cyrill to Serbian Latin. Instructions are in Serbian language but I guess you will understand simple code examples regardless of language barrier.

Contact and support

Universal Language Tool for PHP is free software. You may use it and redistribute it. You may change code to suit your needs but you may not distribute changed code. You use this software on your own responsibility. We cannot be responsible for any result of this software use.

We do not provide email support. However, you may visit support forum or home site.

Author: Predrag Supurović, ©2004 Copyright by DataVoyage, http://www.datavoyage.com