Expand description
Chinese → Pīnyīn conversion with jieba word segmentation.
§Architecture
| Layer | Crate / module | Role |
|---|---|---|
| Segmentation | jieba-rs | word boundaries + context |
| Lookup | src/pinyin_dict.rs | Chinese characters → pinyin_numbers |
| Conversion | pinyin_dict::numbers_to_marks | pinyin_numbers → pinyin_marks |
| Protocol | wasm-minimal-protocol | Typst WASM ABI |
§Build inputs
| File | Purpose |
|---|---|
dict/dict.txt.big | jieba extended segmentation dict |
dict/cedict_ts.u8 | CC-CEDICT source for pinyin lookup |
See dict/README.md for download instructions.
Structs§
- Segment
- One segment per jieba word boundary, with pīnyīn syllables.
pinyinisNone(JSONnull) for non-Chinese tokens.
Functions§
- __
wasm_ minimal_ protocol_ internal_ function_ pinyin_ flat - __
wasm_ minimal_ protocol_ internal_ function_ pinyin_ segmented - pinyin_
flat - Returns flat space-separated pīnyīn as UTF-8 bytes.
- pinyin_
segmented - Returns JSON array
[{"word":"…","pinyin":["…"]|null},…]as UTF-8 bytes. - to_
pinyin_ flat - Space-separated pīnyīn string.
Non-Chinese tokens are omitted entirely.
style:"numbers"for tone numbers, anything else for tone marks. - to_
pinyin_ segmented - One
Segmentper jieba word boundary.