Skip to main content

Crate moine_zh

Crate moine_zh 

Source
Expand description

Chinese pinyin and CC-CEDICT adapters for moine.

The current adapter indexes simplified and traditional written Chinese forms with Mandarin pinyin readings from CC-CEDICT. The default public artifact view is no-tone pinyin; tone3 is an explicit tone-aware artifact view. Cantonese, Jyutping, and non-Mandarin readings are outside this crate’s current scope.

Dictionary artifacts are external input. Prefer try_* lookup and expansion APIs at trust boundaries so indexed-payload decode errors are reported as ZhArtifactPayloadError instead of being collapsed into empty lookup results for backward-compatible convenience APIs.

use moine_zh::{
    compare_with_zh_index, PinyinReadingOptions, ZhReadingIndex, ZhReadingIndexPayload,
    ZhReadingIndexPayloadEntry,
};

let payload = ZhReadingIndexPayload {
    schema_version: 1,
    payload_type: "moine.zh.reading-index.surface-readings".to_string(),
    pinyin_view: "no-tone".to_string(),
    entries: vec![ZhReadingIndexPayloadEntry {
        surface: "威士忌".to_string(),
        readings: vec!["weishiji".to_string()],
    }],
};
let index = ZhReadingIndex::from_artifact_payload(payload).unwrap();

assert_eq!(
    compare_with_zh_index("weishiji", "威士忌", &index, PinyinReadingOptions::default())
        .unwrap()
        .lattice,
    0,
);

Structs§

CedictIndexOptions
Options used while building a CC-CEDICT reading index.
CedictReadingIndex
CC-CEDICT-derived surface-to-pinyin reading index.
ChineseDistance
Distances computed for one Chinese comparison.
PinyinReadingExpansion
Reading-path expansion result plus pruning statistics.
PinyinReadingOptions
Controls Chinese dictionary reading-path expansion.
PinyinReadingPath
One complete segmentation and joined pinyin reading for an input string.
PinyinReadingSegment
One Chinese surface segment and its selected pinyin reading.
PinyinReadingStats
Counters describing Chinese reading-path expansion.
ZhArtifactBuild
Build-time settings recorded in Chinese artifact metadata.
ZhArtifactLicense
License metadata for a Chinese dictionary artifact.
ZhArtifactLicenseReference
One license reference stored in Chinese artifact metadata.
ZhArtifactMetadata
Metadata stored in a Chinese dictionary bundle.
ZhArtifactMetadataOptions
Inputs used to build Chinese artifact metadata from an index.
ZhArtifactPayload
Payload metadata stored in a Chinese dictionary bundle.
ZhArtifactQueryDefaults
Default reading expansion options recorded in Chinese artifact metadata.
ZhArtifactSource
Source dictionary metadata for a Chinese artifact.
ZhIndexedArtifactPayloadHeader
Header for indexed FST Chinese payloads.
ZhReadingIndexPayload
Normalized Chinese reading-index payload.
ZhReadingIndexPayloadEntry
One surface form and its normalized pinyin readings.

Enums§

CedictError
Errors returned while parsing CC-CEDICT source text.
CnLatticeError
Errors returned while building Chinese pinyin lattices.
PinyinView
Pinyin representation used by a Chinese reading index.
ZhArtifactPayloadError
Errors returned while loading or validating Chinese artifact payloads.

Constants§

ARTIFACT_PAYLOAD_CHECKSUM_ALGORITHM
Current canonical checksum algorithm for normalized Chinese payload content.
ARTIFACT_PAYLOAD_FILE_DIGEST_ALGORITHM
File digest algorithm used to verify payload bytes before loading.

Functions§

artifact_file_digest_path
Computes the SHA-256 file digest string for a Chinese artifact payload file.
artifact_file_digest_reader
Computes the SHA-256 file digest string from a reader.
cedict_or_direct_lattice
Builds a pinyin lattice from direct input, CC-CEDICT readings, or both.
compare_with_cedict_index
Compares two strings using direct pinyin handling and a CC-CEDICT index.
compare_with_zh_index
Compares two strings using direct pinyin handling and a Chinese index.
normalize_pinyin
Normalizes a whitespace-separated CC-CEDICT pinyin field.
normalized_similarity_with_zh_index
Computes the best normalized similarity across Chinese pinyin readings.
pinyin_lattice_from_reading_paths
Builds a pinyin lattice from expanded reading paths.
zh_or_direct_lattice
Builds a pinyin lattice from direct input, dictionary readings, or both.
zh_or_direct_pinyin_paths
Returns pinyin paths from direct input, dictionary readings, or both.

Type Aliases§

ZhReadingIndex
Public alias for the Chinese reading index type.