Expand description
Japanese kana, romaji, override, and UniDic adapters for moine.
This crate converts Japanese surface text into romaji lattices through
direct kana/ASCII handling, manual override dictionaries, or UniDic-derived
reading artifacts. The language-independent edit-distance algorithms remain
in moine-core.
Dictionary artifacts are external input. Prefer try_* lookup and expansion
APIs at trust boundaries so indexed-payload decode errors are reported as
UnidicArtifactPayloadError instead of being collapsed into empty lookup
results for backward-compatible convenience APIs.
use moine_ja::romaji_lattice;
use moine_core::{distance, Lattice};
let left = romaji_lattice("もいにゃ").unwrap();
let right = Lattice::from_paths(["moinya"]);
assert_eq!(distance(&left, &right), 0);Structs§
- Dictionary
Reading Expansion - Reading-path expansion result plus pruning statistics.
- Dictionary
Reading Options - Controls dictionary reading-path expansion.
- Dictionary
Reading Path - One complete segmentation and joined reading for an input string.
- Dictionary
Reading Segment - One surface segment and its selected UniDic reading.
- Dictionary
Reading Stats - Counters describing dictionary reading-path expansion.
- Japanese
Distance - Distances computed for one Japanese comparison.
- Override
Dictionary - In-memory surface-to-reading override dictionary.
- Romaji
Variant Table - Kana-to-romaji variant table used by the Japanese adapter.
- Unidic
Artifact Build - Build settings and counts recorded in UniDic artifact metadata.
- Unidic
Artifact License - License metadata for a UniDic-derived artifact.
- Unidic
Artifact License Reference - One license or notice file referenced by artifact metadata.
- Unidic
Artifact Metadata - Metadata stored in a UniDic dictionary bundle.
- Unidic
Artifact Metadata Options - Inputs used to generate artifact metadata for an index.
- Unidic
Artifact Payload - Payload file metadata for a UniDic dictionary bundle.
- Unidic
Artifact Query Defaults - Default reading-path query settings stored in an artifact.
- Unidic
Artifact Source - Source dictionary metadata for a UniDic artifact.
- Unidic
Binary Artifact Payload Header - Header for legacy binary UniDic payloads.
- Unidic
Index Options - Options used while building a UniDic reading index.
- Unidic
Reading Index - UniDic-derived surface-to-reading index.
- Unidic
Reading Index Payload - Portable YAML representation of a UniDic reading index.
- Unidic
Reading Index Payload Entry - One surface entry in a UniDic reading-index payload.
Enums§
- JaLattice
Error - Errors returned while building Japanese romaji lattices.
- Override
Load Error - Errors returned while loading an override dictionary.
- Unidic
Artifact Payload Error - Errors returned while reading or validating UniDic artifact payloads.
- Unidic
CsvError - Errors returned while reading UniDic CSV resources.
- Unidic
Reading Field - UniDic CSV field used as the source reading.
Constants§
- ARTIFACT_
PAYLOAD_ CHECKSUM_ ALGORITHM - Current canonical checksum algorithm for normalized UniDic payload content.
- ARTIFACT_
PAYLOAD_ FILE_ DIGEST_ ALGORITHM - File digest algorithm used to verify payload bytes before loading.
- LEGACY_
ARTIFACT_ PAYLOAD_ CHECKSUM_ ALGORITHM - Legacy canonical checksum algorithm accepted for older UniDic artifacts.
Functions§
- artifact_
file_ digest_ path - Computes the SHA-256 file digest string for a UniDic artifact payload file.
- artifact_
file_ digest_ reader - Computes the SHA-256 file digest string from a reader.
- compare_
with_ overrides - Compares two strings using direct kana/romaji handling plus overrides.
- compare_
with_ unidic_ index - Compares two strings using direct handling and a UniDic reading index.
- is_kana
- Returns whether
chis hiragana, katakana, or the long-vowel mark. - normalize_
kana - Normalizes katakana in
inputto hiragana. - normalize_
kana_ char - Normalizes one katakana character to hiragana when possible.
- normalized_
similarity_ with_ unidic_ index - Computes the best normalized similarity across UniDic-backed readings.
- romaji_
lattice - Builds a compact romaji lattice from kana or ASCII romaji input.
- romaji_
lattice_ from_ reading_ paths - Builds a compact romaji lattice from dictionary reading paths.
- romaji_
paths - Expands kana or ASCII romaji input into explicit romaji paths.
- romaji_
paths_ from_ reading_ paths - Expands dictionary reading paths into explicit romaji strings.
- unidic_
or_ direct_ lattice - Builds a romaji lattice from direct input, dictionary readings, or both.
- unidic_
or_ direct_ romaji_ paths - Returns romaji paths from direct input, dictionary readings, or both.