Expand description
Gukhanmun umbrella library.
This crate is the high-level entry point for the gukhanmun hanja-to-hangul
conversion pipeline. It re-exports public items from the workspace’s
sibling crates under feature flags and provides a Builder /
Converter facade that wires the reader, engine, middlewares, renderer,
and writer stages together.
See the workspace’s DESIGN.md for the architectural overview.
§Quick start (default preset, bundled dictionary)
use gukhanmun::Builder;
let converter = Builder::new().build()?;
assert_eq!(converter.convert_text_to_string("學校")?, "학교");§Custom user dictionary, no bundled data
use gukhanmun::{Builder, MapDictionary};
let mut user = MapDictionary::new();
user.insert("外字", "외자");
let converter = Builder::new()
.no_bundled_stdict()
.push_dictionary(user)
.build()?;
assert_eq!(converter.convert_text_to_string("外字")?, "외자");§North Korean (ko-kp) preset
use gukhanmun::{Builder, Preset};
let converter = Builder::with_preset(Preset::KoKp).build()?;
// Without the initial sound law, `來日` falls back to `래일`.
assert_eq!(converter.convert_text_to_string("來日")?, "래일");§HTML fragment conversion (feature = "html")
use gukhanmun::Builder;
let converter = Builder::new().build()?;
let output = converter.convert_html_fragment_to_string("<p>學校</p>")?;
assert!(output.contains("학교"));Modules§
- cdb
- CDB dictionary backend (re-export of
gukhanmun_cdb). - fst
- FST dictionary backend (re-export of
gukhanmun_fst). - html
- HTML adapter (re-export of
gukhanmun_html). - markdown
- Markdown adapter (re-export of
gukhanmun_markdown). - stdict
- Bundled Standard Korean Language Dictionary (re-export of
gukhanmun_stdict).
Structs§
- Annotation
- Metadata for a dictionary-backed hanja conversion.
- Builder
- Fluent builder that assembles a
Converterfrom aPresetplus overrides. - Chain
Dictionary - A dictionary composition that preserves caller-supplied priority order.
- Conversion
Options - Consolidated option bag carried through the umbrella facade.
- Converter
- Immutable conversion runtime produced by
Builder::build. - Dictionary
Record - A complete dictionary entry exposed for batch policy analysis.
- Engine
- Stateful hanja conversion engine for chunked token streams.
- Engine
Options - Engine-level options that affect hanja conversion before rendering.
- First
Occurrence Filter - Streaming first-occurrence middleware.
- Homophone
Marker - Streaming homophone marker middleware.
- MapDictionary
- A small in-memory dictionary backed by an ordered map.
- Match
- A dictionary match that starts at the queried cursor position.
- Match
Mark - Dictionary-provided rendering constraints for a match.
- Plain
Scope Data - Scope data used by the plain-text adapter.
- Recoverable
Input Error - A recoverable reader error plus the original source region.
- Render
Options - Rendering options that combine a
RenderModewith per-mode sub-options. - Renderer
- Stateful renderer for chunked
OutputTokenstreams. - Scope
- A structural scope in the format-neutral token stream.
- Unihan
Char Dict - Per-character Unihan fallback readings exposed as a dictionary.
- User
Directives - User rules that adjust annotation presentation policy.
Enums§
- Context
Window - The context boundary used by stateful annotation middlewares.
- Directive
Action - Action applied when a user directive predicate matches an annotation.
- Error
- Aggregated error type returned by the umbrella
gukhanmuncrate. - Input
Token - A token emitted by a reader before hanja conversion has run.
- Numeral
Strategy - Strategy for rendering hanja numerals encountered in fallback text.
- Original
Gloss - Form for the gloss attached to annotations in
RenderMode::Original. - Output
Token - A token emitted by the engine after hanja conversion.
- Preset
- Conversion preset that selects the orthographic conventions of a Korean variety.
- Recovery
- Stream-level error recovery policy.
- Render
Mode - The concrete rendering mode for annotated hanja words.
- Rendered
Token - A token emitted by a renderer after all annotations have been expanded.
- Ruby
Base - Selects which side of a
<ruby>element is the base text. - Segmentation
Strategy - Strategy used to segment hanja-containing spans.
Traits§
- Hanja
Dictionary - A hanja dictionary queried by the conversion engine.
- Scope
Data - Adapter-owned data attached to an intermediate-representation scope.
Functions§
- apply_
user_ directives - Applies literal user directives to annotation policy flags.
- apply_
user_ directives_ iter - Lazily applies literal user directives to an output token stream.
- convert_
plain_ text - Converts plain text through reader, engine, renderer, and writer stages.
- convert_
plain_ text_ with_ options - Converts plain text with explicit hanja conversion engine options.
- filter_
first_ occurrences - Clears repeat gloss requirements after the first occurrence of each hanja.
- is_
hanja - Returns whether
chis in a known CJK ideograph range. - mark_
homophones - Sets
homophoneon dictionary annotations sharing a reading. - process_
fallible_ tokens - Processes fallible input tokens with default engine options.
- process_
fallible_ tokens_ with_ options - Processes fallible input tokens with explicit engine options.
- process_
tokens - Processes input tokens with the default hanja conversion engine options.
- process_
tokens_ iter - Processes input tokens through the default engine options and returns an iterator over the collected output.
- process_
tokens_ iter_ with_ options - Processes input tokens through explicit engine options and returns an iterator over the collected output.
- process_
tokens_ with_ options - Processes input tokens with explicit hanja conversion engine options.
- read_
plain_ text - Reads a plain-text string into the core input-token stream.
- recover_
input_ token - Resolves one fallible reader item according to a
Recoverypolicy. - recover_
input_ tokens - Resolves a fallible reader token stream into recovered input tokens.
- render_
tokens - Renders engine output tokens into annotation-free tokens.
- render_
tokens_ iter - Renders engine output tokens into annotation-free tokens as an iterator.
- write_
plain_ text - Writes rendered plain-text tokens back to a string.
Type Aliases§
- Result
- Convenience
std::result::Resultalias that defaults the error parameter to the umbrellaError.