glossa
- en: English
- zh: 中文
- zh-Hant: 繁體中文
Locale Fallback Chain
The core functionality of the glossa crate:
- Generates an array based on the similarity between the current locale and all available locales.
- (Theoretically) Higher similarity locales are prioritized.
Q: Why is fallback necessary?
A: When localized text for the current locale is missing, falling back to a more familiar language (e.g., another variant of the current language) ensures a better user experience.
A person may master multiple languages (or different variants of the same language).
Assume the current locale is pt-PT (Português, Portugal), and the available locales are pt-PT, pt (Português, Brasil), es-419 (Español, Latinoamérica), and en.
In this case, the i18n library should retrieve localized text in the order [pt-PT, pt, en], not [pt-PT, en].
Ignoring language similarity and directly falling back to en not only reduces localization (L10n) coverage but may also increase cognitive load for users.
Example: zh-Hans-HK
Assume the current locale is zh-Hans-HK, and the available locales are zh-Hant-MO, zh-SG, ru, zh-Hant, fr, zh, ar, zh-HK, en-001, lzh.
After calling try_init_chain(), the generated locale chain is: ["zh", "zh-SG", "zh-HK", "zh-Hant-MO", "zh-Hant"].
When the log level is debug or trace, you can see [... DEBUG glossa::fallback] ...<(id, score)>:
Higher scores indicate higher priority.
- Exact match: full score (50 points).
- Partial matches:
- Same language: +20 points.
- Since the current language is
zh(Chinese), and no other languages are included in the built-in rules, onlyzhvariants appear in the chain. - Theoretically,
lzh(Classical Chinese) shares some similarity with modern Chinese, but it is not included in the built-in fallback rules forzh-Hans-HK.
- Since the current language is
- Same script: +15 points.
- The current script is
Hans(Simplified).Hansscores higher thanHant.zh-HKis essentiallyzh-Hant-HK.- Since
Hansscores higher thanHant, andzh-Hansresources exist,zh-HKdoes not have the highest score.
- Since
- The current script is
- Matches built-in fallback rules:
- Full match: +9 points.
- Partial match (language + script): +6 points.
- Same region: +4 points.
- Comparing
zh-Hant(zh-Hant-TW),zh-Hant-MO, andzh-HK(zh-Hant-HK):zh-HKshares the same region (HK) as the current locale, earning +4 points.zh-Hantandzh-Hant-MOdo not share the HK region, so no bonus.
- Comparing
- Proximity bonus:
- Same sub-region (e.g., East Asia): +2 points.
- Same continent (e.g., Asia): +1 point.
- Comparing
zh(zh-Hans-CN) andzh-SG(zh-Hans-SG):- HK (HongKong SAR, China) and CN (Mainland China) are both in East Asia (+2).
- SG (Singapore) is in Southeast Asia, sharing the same continent (Asia) with HK (+1).
- Same language: +20 points.
Example: en-AU
Assume the current locale is en-AU, with extensive localization resources for various regions (including sparsely populated islands).
From a linguistic similarity perspective, en-NZ (New Zealand English) is closer to en-AU (Australian English) than en-GB (British English).
However, the chain generated by glossa may not guarantee 100% accuracy.
// <(id, score)>:
Example: gsw-LI
gswis Swiss German (Schwiizertüütsch), whiledeis Standard German (Deutsch).
use ;
let chain = try_init_chain_from_slice?;
// <(id, score)>:
// [ ("gsw-LI", 50), ("gsw", 37), ("gsw-FR", 37), ("de-LI", 27), ("de", 26),
// ("de-AT", 23), ("de-BE", 23), ("de-CH", 23), ("de-LU", 23), ("de-IT", 22) ]
let v = conv_to_str_chain;
assert_eq!;
Practical Usage
Implement corresponding logic based on the localization resource (L10n Map) types generated by
glossa-codegen.
Code Generation
use ;
let generator = default
.with_resources
.with_visibility;
The Generator supports outputting various types.
If you invoke generator.output_match_fn_all_in_one_by_language_and_key(MapType::Regular)?, the generated code will resemble:
pub const
Invoking generator.output_locales_fn(MapType::Regular, true)? generates:
// super: use glossa_shared::lang_id;
pub const
LocaleContext
Next, implement logic to lookup localized texts based on the types generated by codegen.
As shown above, codegen produces a match_fn.
Given the function definition: const fn map(language: &[u8], key: &[u8]) -> &'static str, the lookup logic is:
let lookup = ;
If the generated function uses map(language, map_name, key), adjust the lookup accordingly:
let lookup = ;
For binary serialized data (e.g., bincode), deserialize it into a HashMap or BTreeMap.
And we can use .get() to lookup.
let map = decode_file_to_maps?;
let lookup = ;
Trait Example
use ;
pub
Bilingual Example
Scenario 1:
In resource-constrained environments, Chinese characters may fail to display properly. In such cases, we can switch the localization language to zh-pinyin (Chinese romanization).
However, due to polysemous homophones in Mandarin Chinese, ambiguities may arise in certain contexts.(can only use Pinyin, not Chinese characters.)
This is precisely where the bilingual functionality shines brightly ✨!
The "bilingual functionality" must be manually implemented.
// en-GB, zh-pinyin