# The wxWidgets language database
## Regeneration of the wxWidgets language database related files
Run the `genlang.py` script from the top level wxWidgets directory to
update `include/wx/language.h` (wxLanguage enum), `interface/wx/language.h`
(its documentation), `src/common/languageinfo.cpp` (conversion tables)
and the actual tabular data in `include/wx/private/lang_*.h` with the data
from `langtabl.txt`, `synonymtabl.txt`, `scripttabl.txt`, `likelytabl.txt`,
`matchingtabl.txt`, and `regiongrouptabl.txt`.
The files with the raw tabular data - `langtabl.txt`, `synonymtabl.txt`,
`scripttabl.txt`, `likelytabl.txt`, `matchingtabl.txt`, and
`regiongrouptabl.txt` - are derived from public data of the Unicode
organization. The scripts to generate the files are provided.
## Description of the raw data files
`langtabl.txt` contains a tabular list of language entries. Each entry
contains
- a symbolic language identifier used in enum wxLanguage,
- a wxWidgets version when the entry was first introduced (a hyphen if not known)
- a BCP 47-like locale identifier,
- a Unix locale identifier,
- a Unix locale identifier including a region id (if the default Unix
locale identifier does not include a region identifier) (mainly for
compatibility with wxWidgets version below 3.1.6),
- numeric Windows language identifier (1),
- numeric Windows sublanguage identifier (1),
- language and region description in English
- language and region description in native language.
`scripttabl.txt` contains a list of 4-letter script codes and their
aliases (English) based on the ISO 15924 standard (2), restricted to
entries for which aliases are defined. This list is used in wxWidgets
to convert between script code used in BCP 47-like identifiers and
script modifiers used in Unix locale names. The data in (2) can be used
to update scripttabl.txt if necessary.
`synonymtabl.txt` contains a list of aliases for symbolic language identifiers.
This list is used to generate specific entries in wxLanguage enumeration.
The following 3 files are used in the algorithm determining the best translation
language based on the list of preferred UI languages and the list of available
translations.
`likelytabl.txt` contains for most locales the likely subtags for script and
region.
`matchingtabl.txt` contains a map relating languages to possible replacement
languages giving the relative distance between 2 languages.
`regiongrouptabl.txt` contains for some languages region groups. This allows
to determine whether certain regional language variants are closer to each
other or not.
**Note**: None of the files `langtabl.txt`, `synonymtabl.txt`, `scripttabl.txt`,
`likelytabl.txt`, `matchingtabl.txt`, and``regiongrouptabl.txt` should be
edited manually. Instead these files should be regenerated under Windows 11 or
above.
## Regeneration process
Windows provides an extensive list of locales. This list is used to regenerate
the files `langtabl.txt`, `synonymtabl.txt`, and `scripttabl.txt`. The
subdirectory `util` contains the C source of a small utility application
`showlocales.c` that queries Windows for a list of known locales.
**Note**: It is recommended to run `showlocales` on a Windows 11 Pro desktop
computer, because the desktop versions of Windows get updates of locale data
more frequently than server versions of Windows. Unfortunately, there is no
easy method to determine when the locale data were last updated. The date of
last modification of the file `Windows/System32/locale.nls` can be an
indication when the last update occurred.
2 additional tools are required to perform the regeneration process:
1) SQLite3 shell
Precompiled binaries can be downloaded from https://www.sqlite.org/download.html.
The download link is under the heading "Precompiled Binaries for Windows" and
looks like "sqlite-tools-win32-x86-3xxyyzz.zip" (where xx, yy, zz denote the
current SQLite version). Alternatively, the SQLite shell can be compiled from
sources - the archive "sqlite-amalgamation-3xxyyzz.zip" contains the required
source files.
The scripts expect that the executable `sqlite3.exe` can be found on the path.
2) Lua shell
Precompiled binaries are available at https://luabinaries.sourceforge.net/.
Download lua-x.y.z_Win32_bin.zip or lua-x.y.z_Win64_bin.zip (where x, y, z
denotes the lua version) from the download page.
Adjust the script file `misc/languages/data/setupenv.ps1`, so that the
environment variable `$env:luashell` contains the name of the executable,
and add the location of the executables to the path.
All provided scripts are PowerShell scripts. It is recommended to use
PowerShell 7 or higher. The regeneration process consists of the following steps:
1) Regenerate the list of known Windows locales (optional)
This step is usually only required when a new major Windows version is published.
The utility `showlocales` should be invoked from a command prompt as follows:
showlocales > win-locale-table-win.txt
The resulting file `win-locale-table-win.txt` has to be placed into subdirectory
`data/windows`. Alternatively, the script `getwindowsdata.ps1` can be used.
2) Update the Unicode data files (optional)
This step is only required when the Unicode data were actually updated.
To perform this step execute the script file `getunicodefiles.ps1`, located
in the `data` subdirectory.
3) Regenerate the tabular data files `langtabl.txt`, `synonymtabl.txt`,
`scripttabl.txt` etc.
To perform this step execute the script files `gensqlfiles.ps1` and
`makelangdb.ps1`, located in the `data` subdirectory.
The new versions will be placed in the `data` directory.
4) Check the resulting new tabular data.
The messages from step 3 issued by the script files and the resulting files
should be carefully checked. If no errors occurred, the script file
replacetables.bat can be executed.
5) Run the Python script `genlang.py` from the top level wxWidgets directory.
6) Commit the changes.
Notes:
1) Do not perform the regeneration process for older wxWidgets versions.
The scripts expect the data table files in a new format that was first introduced
in version 3.3.0.
2) If you need to add locales not present in the list of known Windows locales, then
they should be added at the end of the script `win_genlocaletable.lua`.
Footnotes
(1) used on Windows only, deprecated by Microsoft
(2) http://www.unicode.org/iso15924/iso15924-codes.html