Expand description
Chinese pinyin and CC-CEDICT adapters for moine.
The current adapter indexes simplified and traditional written Chinese forms
with Mandarin pinyin readings from CC-CEDICT. The default public artifact
view is no-tone pinyin; tone3 is an explicit tone-aware artifact view.
Cantonese, Jyutping, and non-Mandarin readings are outside this crate’s
current scope.
Dictionary artifacts are external input. Prefer try_* lookup and expansion
APIs at trust boundaries so indexed-payload decode errors are reported as
ZhArtifactPayloadError instead of being collapsed into empty lookup
results for backward-compatible convenience APIs.
use moine_zh::{
compare_with_zh_index, PinyinReadingOptions, ZhReadingIndex, ZhReadingIndexPayload,
ZhReadingIndexPayloadEntry,
};
let payload = ZhReadingIndexPayload {
schema_version: 1,
payload_type: "moine.zh.reading-index.surface-readings".to_string(),
pinyin_view: "no-tone".to_string(),
entries: vec![ZhReadingIndexPayloadEntry {
surface: "威士忌".to_string(),
readings: vec!["weishiji".to_string()],
}],
};
let index = ZhReadingIndex::from_artifact_payload(payload).unwrap();
assert_eq!(
compare_with_zh_index("weishiji", "威士忌", &index, PinyinReadingOptions::default())
.unwrap()
.lattice,
0,
);Structs§
- Cedict
Index Options - Options used while building a CC-CEDICT reading index.
- Cedict
Reading Index - CC-CEDICT-derived surface-to-pinyin reading index.
- Chinese
Distance - Distances computed for one Chinese comparison.
- Pinyin
Reading Expansion - Reading-path expansion result plus pruning statistics.
- Pinyin
Reading Options - Controls Chinese dictionary reading-path expansion.
- Pinyin
Reading Path - One complete segmentation and joined pinyin reading for an input string.
- Pinyin
Reading Segment - One Chinese surface segment and its selected pinyin reading.
- Pinyin
Reading Stats - Counters describing Chinese reading-path expansion.
- ZhArtifact
Build - Build-time settings recorded in Chinese artifact metadata.
- ZhArtifact
License - License metadata for a Chinese dictionary artifact.
- ZhArtifact
License Reference - One license reference stored in Chinese artifact metadata.
- ZhArtifact
Metadata - Metadata stored in a Chinese dictionary bundle.
- ZhArtifact
Metadata Options - Inputs used to build Chinese artifact metadata from an index.
- ZhArtifact
Payload - Payload metadata stored in a Chinese dictionary bundle.
- ZhArtifact
Query Defaults - Default reading expansion options recorded in Chinese artifact metadata.
- ZhArtifact
Source - Source dictionary metadata for a Chinese artifact.
- ZhIndexed
Artifact Payload Header - Header for indexed FST Chinese payloads.
- ZhReading
Index Payload - Normalized Chinese reading-index payload.
- ZhReading
Index Payload Entry - One surface form and its normalized pinyin readings.
Enums§
- Cedict
Error - Errors returned while parsing CC-CEDICT source text.
- CnLattice
Error - Errors returned while building Chinese pinyin lattices.
- Pinyin
View - Pinyin representation used by a Chinese reading index.
- ZhArtifact
Payload Error - Errors returned while loading or validating Chinese artifact payloads.
Constants§
- ARTIFACT_
PAYLOAD_ CHECKSUM_ ALGORITHM - Current canonical checksum algorithm for normalized Chinese payload content.
- ARTIFACT_
PAYLOAD_ FILE_ DIGEST_ ALGORITHM - File digest algorithm used to verify payload bytes before loading.
Functions§
- artifact_
file_ digest_ path - Computes the SHA-256 file digest string for a Chinese artifact payload file.
- artifact_
file_ digest_ reader - Computes the SHA-256 file digest string from a reader.
- cedict_
or_ direct_ lattice - Builds a pinyin lattice from direct input, CC-CEDICT readings, or both.
- compare_
with_ cedict_ index - Compares two strings using direct pinyin handling and a CC-CEDICT index.
- compare_
with_ zh_ index - Compares two strings using direct pinyin handling and a Chinese index.
- normalize_
pinyin - Normalizes a whitespace-separated CC-CEDICT pinyin field.
- normalized_
similarity_ with_ zh_ index - Computes the best normalized similarity across Chinese pinyin readings.
- pinyin_
lattice_ from_ reading_ paths - Builds a pinyin lattice from expanded reading paths.
- zh_
or_ direct_ lattice - Builds a pinyin lattice from direct input, dictionary readings, or both.
- zh_
or_ direct_ pinyin_ paths - Returns pinyin paths from direct input, dictionary readings, or both.
Type Aliases§
- ZhReading
Index - Public alias for the Chinese reading index type.