glossa-codegen
- en: English
- zh: 中文
- zh-Hant: 繁體中文
glossa-codegen is used to generate Rust code (with localized texts) and bincode files.
Note: Although glossa-codegen requires std, both glossa and glossa-shared support no-std environments.
- glossa-codegen is used to generate production code.
- glossa is used to build fallback chains.
- glossa-shared provides data types required by production code.
You only need to include glossa-codegen in
#[test]tests orbuild.rs, not in production code.
Core Concepts
Language ID and Map Name
Assume a locales directory with the following structure:
locales
├── ar
│ └── error.yaml
├── en
│ ├── error.yaml
│ └── yes-no.toml
├── es
│ └── yes-no.toml
├── fr
│ └── yes-no.toml
├── ru
│ └── error.yaml
└── zh
├── error.yaml
└── yes-no.toml
Here, "ar", "en", "es", "fr", "ru", "zh" are Language IDs.
"error" and "yes-no" are Map Names.
Different file stems (e.g.,
a.toml,b.json) correspond to different map names.What about identical stems? (e.g.,
a.tomlanda.json)
Q: If multiple files with the same stem exist (e.g., error.yaml, error.yml, error.toml, etc.), which one becomes the actual "error" map?
A: If all files are valid and non-empty K-V pairs, it depends on luck! Otherwise, the first valid file with the same stem becomes the map.
Note:
a.toml(stem:a) anda.dsl.toml(stem:a.dsl) are not considered the same. However,en/a.tomlanden/subdir/a.jsonare considered the same stem.
Q: Why does it depend on luck?
A: Because during the initialization of localization resources, we utilize Rayon for parallel multi-threaded deserialization (multiple files are read and parsed across threads).
The execution order is non-deterministic.
Thread scheduling and file processing completion timing depend on runtime conditions, making the final initialization result probabilistically variable.
L10n Data
| L10n Type | Description |
|---|---|
| Raw Text Files | Untreated source files (e.g., en/hello.toml) |
| Generated Rust Code | Hardcoded into the program via const fn |
| Bincode | Binary files for efficient deserialization |
Raw files can be seen as source code, while other formats are compiled from them.
Raw L10n Text Syntax
Standard K-V Pairs
The most basic type.
TOML example:
= "世界"
= "喵 ฅ(°ω°ฅ)"
JSON5 example:
{
// JSON5 supports comments
"world": "世界",
"🐱": "喵 ฅ(°ω°ฅ)", // Trailing commas allowed
}
glossa-DSL
DSL: Domain-Specific Language
Learn glossa-dsl's 5 syntax rules in 5 minutes:
1. Basic key = "value"
- TOML:
name = "Tom" - JSON:
{"name": "Tom"}
2. References
TOML:
= "Tom"
= "Hello { name }"
helloreferences{name}(whitespace inside braces is ignored).- Result:
"Hello Tom".
JSON5:
{
"hello": "Hello {🐱}",
"🐱": "ฅ(°ω°ฅ)",
}
①. hello references {🐱}.
②. Result: "Hello ฅ(°ω°ฅ)".
3. External Arguments
TOML:
= "Good morning, { $🐱 }"
= "{good-morning}, { $name }!"
{ $🐱 }and{ $name }require external Arguments.
Rust:
let ctx = ;
let text = res.get_with_context?;
assert_eq!;
Difference between { 🐱 } and { $🐱 }
{ 🐱 }: Internal reference.{ $🐱 }: Requires an external argument.
Internal reference:
= "ฅ(°ω°ฅ)"
= "{ 🐱 }"
Requires an external argument:
= "{ $🐱 }"
4. Selectors (Conditional Logic)
zh/unread.toml:
= """
$num ->
[0] 〇
[1] 一
[2] 二
[3] 三
[10] 十
*[其他] {$num}
"""
= "未读消息"
= """
$num ->
[0] 没有{ 未读msg }
[2] 您有两条{ 未读msg }
*[其他] 您有{ 阿拉伯数字转汉字 }条{ 未读msg }
"""
= "{显示未读消息数量}。"
rust:
let get_text = ;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
en/unread.toml:
= """
$num ->
[0] zero
[1] one
[2] two
[3] three
*[other] {$num}
"""
= "unread message"
= """
$num ->
[0] No {unread_msg}s.
[1] You have { num-to-en } {unread_msg}.
*[other] You have { num-to-en } {unread_msg}s.
"""
= "{unread-count}"
rust:
let get_text = ;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
5. Escape Syntax
In the above context, we learned that { a } represents an internal reference, while { $a } depends on the externally passed argument a.
Q: How can we preserve the raw {a } format and prevent its automatic parsing?
A: Use escape syntax with nested braces:
• To preserve {a }, wrap it in two layers of braces: {{ {a } }}
• To preserve {{a }, wrap it in three layers of braces: {{{ {{a } }}}
"{{ a }}"=>"a""{{{a}}}"=>"a""{{{{ a }}}}"=>"a""{{ {a} }}"=>"{a}""{{a}"=> ❌ nom Error, code: take_until"{{{ {{a}} }}}"=>"{{a}}""{{{ {{ a }} }}}"=>"{{ a }}""{{{ {{a} }}}"=>"{{a}"
MapType
- Regular: Standard K-V pairs.
- Highlight: K-V pairs with syntax highlighting.
- RegularAndHighlight: Combines Regular and Highlight.
- DSL: Outputs the AST of glossa-DSL (not raw DSL).
AST: Abstract Syntax Tree
L10nResources (Localization Resources)
;
- dir: Path to the directory containing localization resources, e.g.,
"./locales". - dsl_suffix:
- Suffix for glossa-DSL files (default:
".dsl").- When set to
".dsl":"a.dsl.toml"is recognized as a glossa-DSL file."b.dsl.json"is also recognized as a glossa-DSL file."a.toml"is treated as a regular file.
- When set to
- Suffix for glossa-DSL files (default:
- include_languages:
- Whitelist mode. If non-empty, only language IDs in the list will be initialized.
- Example: All language IDs are
["de", "en", "es", "pt", "ru", "zh"]..with_include_languages(["en", "zh"])⇒ Only resources for"en"and"zh"are initialized.
- Example: All language IDs are
- Whitelist mode. If non-empty, only language IDs in the list will be initialized.
- include_map_names:
- If non-empty, only map names in the list will be initialized.
- Example: Files include
"en/a.toml","en/b.json","zh/a.json","zh/b.ron".- All map names are
["a", "b"]. .with_include_map_names(["a"])⇒ Only"en/a.toml"and"zh/a.json"are initialized.
- All map names are
- Example: Files include
- If non-empty, only map names in the list will be initialized.
- exclude_languages:
- Blacklist mode. Language IDs in the list will not be initialized.
- Example: Language IDs are
["de", "en", "es", "pt", "ru", "zh"]..with_exclude_languages(["en", "es", "ru"])⇒ Initializes["de", "pt", "zh"]..with_include_languages(["en", "es"]).with_exclude_languages(["en"])⇒ Initializes["es"].
- Example: Language IDs are
- Blacklist mode. Language IDs in the list will not be initialized.
- exclude_map_names:
- Map names in the list will not be initialized.
- Example: Files include
"en/a.toml","en/b.json","zh/a.json","zh/b.ron","zh/c.toml"..with_exclude_map_names(["a"])⇒ Initializes"en/b.json","zh/b.ron","zh/c.toml"..with_include_map_names(["b", "c"]).with_exclude_map_names(["b"])⇒ Initializes"zh/c.toml"..with_include_languages(["en"]).with_exclude_map_names(["a"])⇒ Initializes"en/b.json".
- Example: Files include
- Map names in the list will not be initialized.
- lazy_data:
- Data initialized lazily at runtime.
- Accessed via
.get_or_init_data(), equivalent to a cache.
| Method | Description |
|---|---|
.get_dir() |
Retrieves the directory path. |
.with_dir("/path/to/new_dir".into()) |
Sets the L10n directory path. |
.get_dsl_suffix() |
Retrieves the DSL suffix. |
.with_dsl_suffix(".new_suffix".into()) |
Sets the DSL suffix. |
.with_include_languages([]) |
Configures the language whitelist. |
.with_include_map_names([]) |
Configures the map name whitelist. |
.with_exclude_languages([]) |
Configures the language blacklist. |
.with_exclude_map_names([]) |
Configures the map name blacklist. |
.get_or_init_data() |
Retrieves &HashMap<KString, Vec<L10nMapEntry>>, initializing data if needed. |
.with_lazy_data(OnceLock::new()) |
Resets lazy_data by replacing it with a new uninitialized OnceLock. |
Q: How to construct a new L10nResources struct?
A:
use L10nResources;
let _res = new;
// Equivalent to: L10nResources::default().with_dir("locales".into())
The "locales" path can be replaced with other directories, e.g., "../../l10n/".
Generator
- resources: Localization resources.
- visibility:
- Visibility of the generated Rust code (default:
PubCrate).-
glossa_codegen::Visibility { Private, PubCrate, Pub, PubSuper } .with_visibility(Visibility::Pub)⇒ Generatespub const fn xxx..with_visibility(Visibility::PubCrate)⇒ Generatespub(crate) const fn xxx.
-
- Visibility of the generated Rust code (default:
- outdir:
- Directory for outputting Rust code and bincode files.
- bincode_suffix:
- Suffix for bincode files (default:
".bincode").
- Suffix for bincode files (default:
- mod_prefix:
- Module prefix for generated Rust code (default:
"l10n_").
- Module prefix for generated Rust code (default:
- highlight:
- Syntax highlighting configuration (slightly complex, discussed in the advanced usage section).
- lazy_maps:
- Lazily initialized maps.
- Related methods:
.get_or_init_maps()// Regular.get_or_init_highlight_maps()// Highlight.get_or_init_merged_maps()// RegularAndHighlight.get_or_init_dsl_maps()// Template
Constructing a Generator
use ;
let resources = new;
let generator = default
.with_resources
.with_outdir;
Output Methods
-
const fn with internal
matchexpressions-
Calling
.output_match_fn(MapType::Regular)generates Rust code:const
-
-
PHF map functions
-
Calling
.output_phf(MapType::Regular)generates Rust code:const
-
-
Bincode
- Calling
.output_bincode(MapType::Regular)generates binary bincode files.
- Calling
MapType::DSL can only output to bincode, while other MapTypes support all output formats.
You can treat DSL as a Regular Map (e.g., by modifying
L10nResources'sdsl_suffix), but this offers no performance benefit. Parsing the AST of DSL is faster than parsing raw DSL.
- When DSL is treated as Regular, the generated code contains raw K-V pairs. At runtime, these must first be parsed into AST.
- Directly outputting
MapType::DSLas bincode serializes the DSL's AST instead of raw K-V pairs.
Code Generation: Const Functions with match Expressions
Key methods:
-
.output_match_fn()- Generates separate Rust files per language.
- Output path:
{outdir}/{mod_prefix}{snake_case_language}.rs- Example:
en→tmp/l10n_en.rsen-GB→tmp/l10n_en_gb.rs
- Example:
-
.output_match_fn_all_in_one()-
Aggregates all languages into a single function:
const
-
-
.output_match_fn_all_in_one_by_language()-
Aggregates all languages into a single function:
constUse only if both
map_nameandkeyare unique to avoid conflicts.
-
-
.output_match_fn_all_in_one_by_language_and_key()-
Aggregates all languages into a single function:
constUse only if
map_nameis unique to avoid key conflicts.
-
output_match_fn()
Given:
-
l10n/en-GB/error.toml:= "No localised text found" -
l10n/de/error.yml:text-not-found: Kein lokalisierter Text gefunden
Code:
use ;
let resources = new;
default
.with_resources
.with_outdir
.output_match_fn?;
Output (tmp/l10n_en_gb.rs):
pub const
Output (tmp/l10n_de.rs):
pub const
output_match_fn_all_in_one()
Q: What do we get if we use output_match_fn_all_in_one()?
A: We will receive a String containing the function data.
All localization resources for every language are consolidated into a single function.
let function_data = generator.output_match_fn_all_in_one?;
Output (function_data):
pub const
output_match_fn_all_in_one_by_language_and_key()
TLDR:
- If
map_name- is unique, using
output_match_fn_all_in_one_by_language_and_key()can improve performance. - is not unique, use
.output_match_fn_all_in_one()instead.
- is unique, using
When map_name is unique, we can omit it for performance optimization.
match
match
Comparing these two match expressions:
- The first matches two items (
langandkey). - The second matches three items (
lang,map_name, andkey).
Theoretically, the first is faster due to fewer match arms.
output_match_fn_all_in_one_by_language_and_key() generates code similar to the first approach.
If you aren’t concerned with nanosecond-level optimizations, you can safely skip this section.
When map_name is unique (e.g., yes-no):
en/yes-no { yes: "Yes", no: "No"}de/yes-no { yes: "Ja", no: "Nein" }
Calling .output_match_fn_all_in_one_by_language_and_key(Regular)?
Output:
pub const
When map_name is not unique. For example, adding a new entry like en/yes-no2 { yes: "YES", no: "NO", ok: "OK" }.
Different map_names may contain identical keys (e.g., "yes" and "no"), causing key conflicts. In such cases, omitting map_name becomes invalid.
Code Generation: PHF Maps
output_phf(): Generates Perfect Hash Function (PHF) maps per language..output_phf_all_in_one():Aggregates all localization resources into a single string containing serialized PHF map data
output_phf()
use ;
pub
es_generator.output_phf?;
tmp/l10n_es.rs
pub const
Q: Wait, where do PhfL10nOrderedMap and PhfTupleKey come from?
A: These types are defined in the
output_phf_all_in_one()
let data = new
.with_include_languages
.with_include_map_names;
let function_data = default.with_resources.output_phf_all_in_one?;
function_data:
pub const
Bincode
output_bincode(): Serializes data into a separate bincode file for each language.- =>
{outdir}/{language}{bincode_suffix}- en => tmp/en{bincode_suffix} => tmp/en.bincode
- en-GB => tmp/en-GB{bincode_suffix} => tmp/en-GB.bincode
- =>
output_bincode_all_in_one()- Aggregates all language data into a bincode file
- =>
{outdir}/all{bincode_suffix}- => tmp/all{bincode_suffix} => tmp/all.bincode
output_bincode()
../../locales/en/unread.dsl.toml:
= """
$num ->
[0] zero
[1] one
[2] two
[3] three
*[other] {$num}
"""
= "unread message"
= """
$num ->
[0] No {unread}s.
[1] You have { num-to-en } {unread}.
*[other] You have { num-to-en } {unread}s.
"""
= "{unread-count}"
rust:
use ;
use decode_single_file_to_dsl_map;
use Path;
// -------------------
// Encode
let resources = cratenew;
// Output to tmp/{language}_dsl.bincode
default
.with_resources
.with_outdir
.with_bincode_suffix
.output_bincode?;
// ------------------
// Decode
let file = new.join;
let dsl_maps = decode_single_file_to_dsl_map?;
let unread_resolver = dsl_maps
.get
.expect;
let get_text = ;
let one = get_text?;
assert_eq!;
let zero = get_text?;
assert_eq!;
Ok
Advanced Usage
Syntax Highlighting
TLDR: Pre-render syntax-highlighted texts into constants for performance.
glossa-codegen supports rendering localized texts into syntax-highlighted content and converting them into Rust code and bincode.
Q: Why pre-render syntax highlighting?
A: For performance optimization.
- Directly outputting pre-rendered
&'static strconstants is orders of magnitude faster than rendering syntax highlighting at runtime using regex.
Q: Where are pre-rendered syntax-highlighted strings useful?
A: Ideal for CLI applications.
- Use pre-rendered highlighted strings for help messages, ensuring high performance and readability.

Data Structures
pub type HighlightCfgMap<'h> = ;
Basic Usage:
generator
.with_highlight
.output_bincode;
Note: The above code will not run until
HighlightCfgMapis properly configured. ReplaceHighlightCfgMap::default()with valid data to make it work.
Key Concepts:
HighlightCfgMap applies different syntax highlighting configurations to multiple maps.
Example Path Structure:
en/
├── help-markdown.toml // Base map: help-markdown
└── a-zsh.toml // Base map: a-zsh
Example Configuration (Pseudocode)
<
// help-markdown_monokai
,
// help-markdown_ayu
,
// a-zsh_custom2
>
Key Rules
-
DerivedMapKeybase_name: References an existing regular map (e.g.,"help-markdown").suffix: Appended tobase_nameto create a new derived map (e.g.,"help-markdown_monokai").- Avoid naming conflicts: Ensure
format!("{base_name}{suffix}")does not clash with existing map names.
-
SyntaxHighlightConfigsyntax_name: The language syntax (e.g.,"md"for Markdown).- If unsupported, load a custom
SyntaxSetviaHighlightResource.
- If unsupported, load a custom
true_color:- Enable for terminals supporting 24-bit color (e.g., modern terminals).
- Disable for terminals limited to 256-color (e.g., macOS 15.3 Terminal.app(v2.14)).
-
HighlightResource- For details, see the hlight documentation.
Example
let highlight_generator = default
.with_resources
.with_outdir
.with_highlight
.with_bincode_suffix;
highlight_generator.output_bincode_all_in_one