gukhanmun-core
Format-neutral core of the Gukhanmun hanja-to-hangul pipeline. The crate is
no_std + alloc, so it is suitable for embedded and WASM targets. Format
adapters, dictionary backends, and runtime I/O live in separate crates.
What this crate provides
HanjaDictionary is the trait that all dictionary backends implement. The
engine works over any value that satisfies it. ChainDictionary composes two
backends so that a user-supplied dictionary is consulted before a larger
pre-built one.
The lattice segmenter covers all possible dictionary matches for a hanja span
and picks the best split using dynamic programming. 行事場所 segments as
行事 + 場所 rather than 行事場 + 所.
Fallback phonetization kicks in for characters not found in any dictionary. It
uses a table derived from the Unicode Unihan database (kHangul field) and
applies the initial sound law (頭音法則) when the active preset requires it.
The rendering step turns engine output into text. RenderMode controls the
shape: hangul-only, hangul(hanja) parentheses, hanja(hangul) parentheses, ruby
markup, or the original mixed-script form with selective glosses.
Installation
[]
= "0.1"
Usage
use ;
let mut dict = new;
dict.insert;
dict.insert;
let output = convert_plain_text;
assert_eq!;
no_std note
The crate declares #![no_std] and uses extern crate alloc. Callers on
std targets get alloc for free; embedded targets need a global allocator.
The dictionary trait itself is synchronous and allocation-free on the hot path.
License
GPL-3.0-only. See LICENSE at the repository root.