Expand description
golia-pinyin — self-developed Mandarin Pinyin input method engine.
Engine surface ✓ (segmenter, fuzzy, FST dict, encode, session) +
919k-entry corpus-derived dict (Unihan + jieba + Leipzig + SUBTLEX) +
L0 user-learning ranking (3-pick auto-pin). The published crate
version stays at 0.1.0 per the publish strategy in lab8-ime ROADMAP
item 35; internal milestone names (v0.2-data, v0.3-l0) refer to data +
feature readiness. See workspace ROADMAP.
Sibling library: wubi — same
architectural pattern (PHF static tables, FST main dict, zero-alloc hot
path).
§Quickstart
use golia_pinyin::{PinyinEngine, Session};
let engine = PinyinEngine::new();
let mut session = Session::new(&engine);
for c in "zhongguo".chars() {
session.input_char(c);
}
let cands = session.candidates();
assert_eq!(cands.first().map(String::as_str), Some("中国"));§Module map
syllable— 403 valid Mandarin syllable inventory (PHF set)fuzzy— toggleable fuzzy-pair expansion (z↔zhetc.)segmenter— DP segmentation of continuous pinyin stringsdict— FST-backedpinyin → wordslookup with L0 user-learningencode—char → readingsreverse lookupengine— immutablePinyinEngine(dict + fuzzy)session— mutableSessionholding the user’s input bufferranking— L0 snapshot type for host-side persistence
Re-exports§
pub use dict::PinyinDict;pub use encode::char_to_pinyin;pub use encode::covered_char_count;pub use engine::PinyinEngine;pub use fuzzy::FuzzyConfig;pub use ranking::L0Snapshot;pub use ranking::PROMOTE_THRESHOLD;pub use segmenter::Segmentation;pub use segmenter::segment;pub use session::Session;pub use syllable::VALID_SYLLABLES;pub use syllable::count as syllable_count;pub use syllable::is_valid as is_valid_syllable;
Modules§
- dict
- FST-backed pinyin dictionary with a two-tier ranking model.
- encode
- Reverse lookup —
char → Vec<pinyin>. - engine
PinyinEngine— immutable assembly of dict + fuzzy config.- fuzzy
- Fuzzy syllable expansion —
z↔zh,c↔ch,s↔sh,n↔l,f↔h,r↔l,in↔ing,en↔eng,an↔ang. Toggleable per-pair so users can match their own dialect / typing habits. Expansion happens at lookup time; the dictionary stays canonical (no bloat). - ranking
- L0 ranking — user-learning layer on top of the immutable dict.
- segmenter
- Pinyin syllable segmentation via dynamic programming.
- session
- Per-input mutable state — accumulates the user’s typing buffer and exposes candidates / commit semantics.
- syllable
- Canonical Mandarin syllable inventory.