Skip to main content

Crate gukhanmun

Crate gukhanmun 

Source
Expand description

Gukhanmun umbrella library.

This crate is the high-level entry point for the gukhanmun hanja-to-hangul conversion pipeline. It re-exports public items from the workspace’s sibling crates under feature flags and provides a Builder / Converter facade that wires the reader, engine, middlewares, renderer, and writer stages together.

See the workspace’s DESIGN.md for the architectural overview.

§Quick start (default preset, bundled dictionary)

use gukhanmun::Builder;

let converter = Builder::new().build()?;
assert_eq!(converter.convert_text_to_string("學校")?, "학교");

§Custom user dictionary, no bundled data

use gukhanmun::{Builder, MapDictionary};

let mut user = MapDictionary::new();
user.insert("外字", "외자");
let converter = Builder::new()
    .no_bundled_stdict()
    .push_dictionary(user)
    .build()?;
assert_eq!(converter.convert_text_to_string("外字")?, "외자");

§North Korean (ko-kp) preset

use gukhanmun::{Builder, Preset};

let converter = Builder::with_preset(Preset::KoKp).build()?;
// Without the initial sound law, `來日` falls back to `래일`.
assert_eq!(converter.convert_text_to_string("來日")?, "래일");

§HTML fragment conversion (feature = "html")

use gukhanmun::Builder;

let converter = Builder::new().build()?;
let output = converter.convert_html_fragment_to_string("<p>學校</p>")?;
assert!(output.contains("학교"));

Modules§

cdb
CDB dictionary backend (re-export of gukhanmun_cdb).
fst
FST dictionary backend (re-export of gukhanmun_fst).
html
HTML adapter (re-export of gukhanmun_html).
markdown
Markdown adapter (re-export of gukhanmun_markdown).
stdict
Bundled Standard Korean Language Dictionary (re-export of gukhanmun_stdict).

Structs§

Annotation
Metadata for a dictionary-backed hanja conversion.
Builder
Fluent builder that assembles a Converter from a Preset plus overrides.
ChainDictionary
A dictionary composition that preserves caller-supplied priority order.
ConversionOptions
Consolidated option bag carried through the umbrella facade.
Converter
Immutable conversion runtime produced by Builder::build.
DictionaryRecord
A complete dictionary entry exposed for batch policy analysis.
Engine
Stateful hanja conversion engine for chunked token streams.
EngineOptions
Engine-level options that affect hanja conversion before rendering.
FirstOccurrenceFilter
Streaming first-occurrence middleware.
HomophoneMarker
Streaming homophone marker middleware.
MapDictionary
A small in-memory dictionary backed by an ordered map.
Match
A dictionary match that starts at the queried cursor position.
MatchMark
Dictionary-provided rendering constraints for a match.
PlainScopeData
Scope data used by the plain-text adapter.
RecoverableInputError
A recoverable reader error plus the original source region.
RenderOptions
Rendering options that combine a RenderMode with per-mode sub-options.
Renderer
Stateful renderer for chunked OutputToken streams.
Scope
A structural scope in the format-neutral token stream.
UnihanCharDict
Per-character Unihan fallback readings exposed as a dictionary.
UserDirectives
User rules that adjust annotation presentation policy.

Enums§

ContextWindow
The context boundary used by stateful annotation middlewares.
DirectiveAction
Action applied when a user directive predicate matches an annotation.
Error
Aggregated error type returned by the umbrella gukhanmun crate.
InputToken
A token emitted by a reader before hanja conversion has run.
NumeralStrategy
Strategy for rendering hanja numerals encountered in fallback text.
OriginalGloss
Form for the gloss attached to annotations in RenderMode::Original.
OutputToken
A token emitted by the engine after hanja conversion.
Preset
Conversion preset that selects the orthographic conventions of a Korean variety.
Recovery
Stream-level error recovery policy.
RenderMode
The concrete rendering mode for annotated hanja words.
RenderedToken
A token emitted by a renderer after all annotations have been expanded.
RubyBase
Selects which side of a <ruby> element is the base text.
SegmentationStrategy
Strategy used to segment hanja-containing spans.

Traits§

HanjaDictionary
A hanja dictionary queried by the conversion engine.
ScopeData
Adapter-owned data attached to an intermediate-representation scope.

Functions§

apply_user_directives
Applies literal user directives to annotation policy flags.
apply_user_directives_iter
Lazily applies literal user directives to an output token stream.
convert_plain_text
Converts plain text through reader, engine, renderer, and writer stages.
convert_plain_text_with_options
Converts plain text with explicit hanja conversion engine options.
filter_first_occurrences
Clears repeat gloss requirements after the first occurrence of each hanja.
is_hanja
Returns whether ch is in a known CJK ideograph range.
mark_homophones
Sets homophone on dictionary annotations sharing a reading.
process_fallible_tokens
Processes fallible input tokens with default engine options.
process_fallible_tokens_with_options
Processes fallible input tokens with explicit engine options.
process_tokens
Processes input tokens with the default hanja conversion engine options.
process_tokens_iter
Processes input tokens through the default engine options and returns an iterator over the collected output.
process_tokens_iter_with_options
Processes input tokens through explicit engine options and returns an iterator over the collected output.
process_tokens_with_options
Processes input tokens with explicit hanja conversion engine options.
read_plain_text
Reads a plain-text string into the core input-token stream.
recover_input_token
Resolves one fallible reader item according to a Recovery policy.
recover_input_tokens
Resolves a fallible reader token stream into recovered input tokens.
render_tokens
Renders engine output tokens into annotation-free tokens.
render_tokens_iter
Renders engine output tokens into annotation-free tokens as an iterator.
write_plain_text
Writes rendered plain-text tokens back to a string.

Type Aliases§

Result
Convenience std::result::Result alias that defaults the error parameter to the umbrella Error.