Skip to main content

Crate rust_mando

Crate rust_mando 

Source
Expand description

Chinese → Pīnyīn conversion with jieba word segmentation.

§Architecture

LayerCrate / moduleRole
Segmentationjieba-rsword boundaries + context
Lookupsrc/pinyin_dict.rsChinese characters → pinyin_numbers
Conversionpinyin_dict::numbers_to_markspinyin_numbers → pinyin_marks
Protocolwasm-minimal-protocolTypst WASM ABI

§Build inputs

FilePurpose
dict/dict.txt.bigjieba extended segmentation dict
dict/cedict_ts.u8CC-CEDICT source for pinyin lookup

See dict/README.md for download instructions.

Structs§

Segment
One segment per jieba word boundary, with pīnyīn syllables. pinyin is None (JSON null) for non-Chinese tokens.

Functions§

__wasm_minimal_protocol_internal_function_pinyin_flat
__wasm_minimal_protocol_internal_function_pinyin_segmented
pinyin_flat
Returns flat space-separated pīnyīn as UTF-8 bytes.
pinyin_segmented
Returns JSON array [{"word":"…","pinyin":["…"]|null},…] as UTF-8 bytes.
to_pinyin_flat
Space-separated pīnyīn string. Non-Chinese tokens are omitted entirely. style: "numbers" for tone numbers, anything else for tone marks.
to_pinyin_segmented
One Segment per jieba word boundary.