# glossa
[](https://crates.io/crates/glossa)
[](https://docs.rs/glossa)
[](./License)
<details>
<summary>
<a href="Readme-zh.md">
<img alt="Language/语言" src="./svg/language.svg"/>
</a>
</summary>
- en: English
- [zh: 中文](Readme-zh.md)
- [zh-Hant: 繁體中文](Readme-zh-Hant.md)
</details>
<details open>
<summary>
<img alt="Table of Contents" src="./svg/toc/toc.svg" />
</summary>
- [Locale Fallback Chain](#locale-fallback-chain)
- [Example: zh-Hans-HK](#example-zh-hans-hk)
- [Example: en-AU](#example-en-au)
- [Example: gsw-LI](#example-gsw-li)
- [Practical Usage](#practical-usage)
- [Code Generation](#code-generation)
- [LocaleContext](#localecontext)
- [Trait Example](#trait-example)
- [Bilingual Example](#bilingual-example)
</details>
## Locale Fallback Chain
The core functionality of the glossa crate:
- Generates an array based on the **similarity** between the current locale and all available locales.
- (Theoretically) Higher similarity locales are prioritized.
Q: Why is fallback necessary?
A:
When localized text for the current locale is missing, falling back to a more familiar language (e.g., another variant of the current language) ensures a better user experience.
> A person may master multiple languages (or different variants of the same language).
Assume the current locale is `pt-PT` (Português, Portugal), and the available locales are `pt-PT`, `pt` (Português, Brasil), `es-419` (Español, Latinoamérica), and `en`.
In this case, the i18n library should retrieve localized text in the order `[pt-PT, pt, en]`, not `[pt-PT, en]`.
Ignoring language similarity and directly falling back to `en` not only reduces localization (L10n) coverage but may also increase cognitive load for users.
### Example: zh-Hans-HK
Assume the current locale is `zh-Hans-HK`, and the available locales are `zh-Hant-MO`, `zh-SG`, `ru`, `zh-Hant`, `fr`, `zh`, `ar`, `zh-HK`, `en-001`, `lzh`.
After calling `try_init_chain()`, the generated locale chain is: `["zh", "zh-SG", "zh-HK", "zh-Hant-MO", "zh-Hant"]`.
When the log level is `debug` or `trace`, you can see `[... DEBUG glossa::fallback] ...<(id, score)>`:
```rust
[
("zh", 37), // zh-Hans-CN
("zh-SG", 36), // zh-Hans-SG
("zh-HK", 35), // zh-Hant-HK
("zh-Hant-MO", 31),
("zh-Hant", 28) // zh-Hant-TW
]
```
> Higher scores indicate higher priority.
- Exact match: full score (50 points).
- Partial matches:
- Same language: +20 points.
- Since the current language is `zh` (Chinese), and no other languages are included in the built-in rules, only `zh` variants appear in the chain.
- Theoretically, `lzh` (Classical Chinese) shares some similarity with modern Chinese, but it is not included in the built-in fallback rules for `zh-Hans-HK`.
- Same script: +15 points.
- The current script is `Hans` (Simplified). `Hans` scores higher than `Hant`.
- `zh-HK` is essentially `zh-Hant-HK`.
- Since `Hans` scores higher than `Hant`, and `zh-Hans` resources exist, `zh-HK` does not have the highest score.
- Matches built-in fallback rules:
- Full match: +9 points.
- Partial match (language + script): +6 points.
- Same region: +4 points.
- Comparing `zh-Hant` (zh-Hant-TW), `zh-Hant-MO`, and `zh-HK` (zh-Hant-HK):
- `zh-HK` shares the same region (HK) as the current locale, earning +4 points.
- `zh-Hant` and `zh-Hant-MO` do not share the HK region, so no bonus.
- Proximity bonus:
- Same sub-region (e.g., East Asia): +2 points.
- Same continent (e.g., Asia): +1 point.
- Comparing `zh` (zh-Hans-CN) and `zh-SG` (zh-Hans-SG):
- HK (HongKong SAR, China) and CN (Mainland China) are both in East Asia (+2).
- SG (Singapore) is in Southeast Asia, sharing the same continent (Asia) with HK (+1).
### Example: en-AU
Assume the current locale is `en-AU`, with extensive localization resources for various regions (including sparsely populated islands).
From a linguistic similarity perspective, `en-NZ` (New Zealand English) is closer to `en-AU` (Australian English) than `en-GB` (British English).
However, the chain generated by glossa may not guarantee 100% accuracy.
```rust
// <(id, score)>:
[
("en-AU", 50), ("en-GB", 44), ("en-CC", 43), ("en-CX", 43), ("en-NF", 43),
("en-NZ", 43), ("en-UM", 42), ("en-CK", 42), ("en-DG", 42), ("en-FJ", 42),
("en-FM", 42), ("en-KI", 42), ("en-NR", 42), ("en-NU", 42), ("en-PG", 42),
("en-PN", 42), ("en-PW", 42), ("en-SB", 42), ("en-TK", 42), ("en-TO", 42),
("en-TV", 42), ("en-VU", 42), ("en-WS", 42), ("en-AS", 42), ("en-GU", 42),
("en-MH", 42), ("en-MP", 42), ("en-US", 22), ...
]
```
### Example: gsw-LI
> `gsw` is Swiss German (Schwiizertüütsch), while `de` is Standard German (Deutsch).
```rust
use glossa::{
error::GlossaError, fallback::conv_to_str_chain,
try_init_chain_from_slice,
};
let chain = try_init_chain_from_slice(
// current:
"gsw-LI",
// all_locales:
&[
"en", "es", "pt", "zh", "gsw", "gsw-FR", "gsw-LI", "de", "de-AT", "de-BE", "de-CH", "de-IT",
"de-LI", "de-LU",
],
)?;
// <(id, score)>:
// [ ("gsw-LI", 50), ("gsw", 37), ("gsw-FR", 37), ("de-LI", 27), ("de", 26),
// ("de-AT", 23), ("de-BE", 23), ("de-CH", 23), ("de-LU", 23), ("de-IT", 22) ]
let v = conv_to_str_chain(&chain);
assert_eq!(
v.as_ref(),
[
"gsw-LI", "gsw", "gsw-FR", "de-LI", "de", "de-AT", "de-BE", "de-CH",
"de-LU", "de-IT",
]
);
```
## Practical Usage
> Implement corresponding logic based on the **localization resource (L10n Map)** types generated by `glossa-codegen`.
### Code Generation
```rust
use glossa_codegen::{Generator, L10nResources, Visibility, generator::MapType};
let generator = Generator::default()
.with_resources(L10nResources::new("locales").with_include_map_names(["yes-no"]))
.with_visibility(Visibility::Pub);
```
The `Generator` supports outputting various types.
If you invoke `generator.output_match_fn_all_in_one_without_map_name(MapType::Regular)?`, the generated code will resemble:
```rust
pub const fn map(language: &[u8], key: &[u8]) -> &'static str {
match (language, key) {
(b"cs", b"cancel") => r#####"Zrušit"#####,
(b"cs", b"no") => r#####"Ne"#####,
(b"cs", b"yes") => r#####"Ano"#####,
(b"de", b"cancel") => r#####"Abbrechen"#####,
(b"de", b"no") => r#####"Nein"#####,
(b"de", b"yes") => r#####"Ja"#####,
(b"en", b"cancel") => r#####"Cancel"#####,
(b"en", b"no") => r#####"No"#####,
(b"en", b"ok") => r#####"OK"#####,
(b"en", b"yes") => r#####"Yes"#####,
(b"es", b"cancel") => r#####"Cancelar"#####,
(b"es", b"ok") => r#####"Aceptar"#####,
(b"es", b"yes") => r#####"Sí"#####,
(b"fr", b"cancel") => r#####"Annuler"#####,
(b"fr", b"no") => r#####"Non"#####,
(b"fr", b"yes") => r#####"Oui"#####,
(b"ja", b"cancel") => r#####"取消"#####,
(b"ja", b"no") => r#####"いいえ"#####,
(b"ja", b"ok") => r#####"了解"#####,
(b"ja", b"yes") => r#####"はい"#####,
(b"ko", b"cancel") => r#####"취소"#####,
(b"ko", b"no") => r#####"아니오"#####,
(b"ko", b"ok") => r#####"확인"#####,
(b"ko", b"yes") => r#####"예"#####,
(b"ru", b"no") => r#####"Нет"#####,
(b"ru", b"yes") => r#####"Да"#####,
(b"zh-Hant", b"cancel") => r#####"取消"#####,
(b"zh-Hant", b"no") => r#####"否"#####,
(b"zh-Hant", b"ok") => r#####"確定"#####,
(b"zh-Hant", b"yes") => r#####"是"#####,
(b"zh-Latn-CN", b"cancel") => r#####"QuXiao"#####,
(b"zh-Latn-CN", b"no") => r#####"Fou"#####,
(b"zh-Latn-CN", b"ok") => r#####"QueDing"#####,
(b"zh-Latn-CN", b"yes") => r#####"Shi"#####,
_ => "",
}
}
```
Invoking `generator.output_locales_fn(MapType::Regular, true)?` generates:
```rust
// super: use glossa_shared::lang_id;
pub const fn all_locales() -> [super::lang_id::LangID; 10] {
#[allow(unused_imports)]
use super::lang_id::RawID;
use super::lang_id::consts::*;
[
lang_id_cs(),
lang_id_de(),
lang_id_en(),
lang_id_es(),
lang_id_fr(),
lang_id_ja(),
lang_id_ko(),
lang_id_ru(),
lang_id_zh_hant(),
lang_id_zh_pinyin(),
]
}
```
### LocaleContext
Next, implement logic to lookup localized texts based on the types generated by codegen.
As shown above, codegen produces a `match_fn`.
Given the function definition: `const fn map(language: &[u8], key: &[u8]) -> &'static str`, the lookup logic is:
```rust
s => Some(s),
};
```
If the generated function uses `map(language, map_name, key)`, adjust the lookup accordingly:
```rust
s => Some(s),
};
```
For binary serialized data (e.g., bincode), deserialize it into a `HashMap` or `BTreeMap`.
And we can use `.get()` to lookup.
```rust
let map = glossa_shared::decode::file::decode_file_to_maps(path)?;
.get(language)?
.get(&tuple_key)
};
```
### Trait Example
```rust
use glossa::{LocaleContext, traits::ChainProvider};
trait GetL10nText: ChainProvider {
fn try_get_by_key<'t>(&self, key: &[u8]) -> Option<&'t str> {
let lookup = |(language, key)| match map(language, key) {
"" => None,
s => Some(s),
};
self
.provide_chain()?
.iter()
.map(|id| (id.as_bytes(), key))
.find_map(lookup)
}
}
impl GetL10nText for LocaleContext {}
#[test]
pub(crate) fn print_l10n_text() {
let new_ctx = || LocaleContext::default().with_all_locales(all_locales());
// #[cfg(any(target_os = "macos", target_os = "linux"))]
let display = |ctx: &LocaleContext, key: &str| {
let text = ctx
.try_get_by_key(key.as_bytes())
.unwrap_or_else(|| panic!("{}", glossa::Error::new_text_not_found(key)));
println!("{key}: {text}")
};
{
// set_env_lang("gsw_CH.UTF-8");
//
let ctx = new_ctx()
.with_current_locale(Some(glossa_shared::lang_id::consts::lang_id_gsw()));
// [("de", 26)]
for key in ["yes", "no", "ok", "cancel"] {
display(&ctx, key)
}
}
// Output:
// yes: Ja
// no: Nein
// ok: OK
// cancel: Abbrechen
{
set_env_lang("zh_MO.UTF-8");
// new_ctx(); // current_locale => get_static_locale()
let ctx = new_ctx().with_current_locale(None);
log::debug!("\n---\n--- current locale => zh-MO");
// [("zh-Hant", 43), ("zh-Latn-CN", 22)]
for key in ["yes", "no", "ok", "cancel", "confirm"] {
display(&ctx, key)
}
}
// Output:
// yes: 是
// no: 否
// ok: 確定
// cancel: 取消
// confirm: Confirm
}
```
### Bilingual Example
**Scenario 1**:
In resource-constrained environments, Chinese characters may fail to display properly.
In such cases, we can switch the localization language to **zh-pinyin** (Chinese romanization).
However, due to **polysemous homophones** in Mandarin Chinese, ambiguities may arise in certain contexts.(can only use Pinyin, not Chinese characters.)
This is precisely where the **bilingual functionality** shines brightly ✨!
> The "bilingual functionality" must be **manually implemented**.
---
```rust
#[ignore]
#[test]
// en-GB, zh-pinyin
fn test_bilingual() {
use glossa_shared::lang_id::consts::{lang_id_en_gb, lang_id_zh_pinyin};
let new_ctx = |id| {
LocaleContext::default()
.with_current_locale(Some(id))
.with_all_locales(all_locales())
};
let zh_pinyin_ctx = new_ctx(lang_id_zh_pinyin());
let en_gb_ctx = new_ctx(lang_id_en_gb());
fn get_text<'a>(ctx: &LocaleContext, key: &str) -> Option<&'a str> {
let key_bytes = key.as_bytes();
let lookup = |language| match map(language, key_bytes) {
"" => None,
x => Some(x),
};
ctx
.get_or_try_init_chain()?
.iter()
.map(|id| id.as_bytes())
.find_map(lookup)
}
let zh_pinyin_text = get_cancel_text(&zh_pinyin_ctx);
let en_gb_text = get_cancel_text(&en_gb_ctx);
let text = match zh_pinyin_text == en_gb_text {
true => zh_pinyin_text.into(),
_ => glossa_shared::fmt_compact!("{en_gb_text}; {zh_pinyin_text}"),
};
assert_eq!(text, "Cancel; QuXiao")
}
```