Skip to main content

Crate gukhanmun_core

Crate gukhanmun_core 

Source
Expand description

Core types and algorithms for Gukhanmun.

This crate is the home for the format-neutral intermediate representation, conversion engine, dictionary traits, lattice segmentation, and fallback hanja reading logic. Format adapters, command-line I/O, and language bindings live in separate crates.

Structs§

Annotation
Metadata for a dictionary-backed hanja conversion.
ChainDictionary
A dictionary composition that preserves caller-supplied priority order.
DictionaryRecord
A complete dictionary entry exposed for batch policy analysis.
Engine
Stateful hanja conversion engine for chunked token streams.
EngineOptions
Engine-level options that affect hanja conversion before rendering.
FirstOccurrenceFilter
Streaming first-occurrence middleware.
HomophoneMarker
Streaming homophone marker middleware.
MapDictionary
A small in-memory dictionary backed by an ordered map.
Match
A dictionary match that starts at the queried cursor position.
MatchMark
Dictionary-provided rendering constraints for a match.
PlainScopeData
Scope data used by the plain-text adapter.
RecoverableInputError
A recoverable reader error plus the original source region.
RenderOptions
Rendering options that combine a RenderMode with per-mode sub-options.
Renderer
Stateful renderer for chunked OutputToken streams.
Scope
A structural scope in the format-neutral token stream.
UnihanCharDict
Per-character Unihan fallback readings exposed as a dictionary.
UserDirectives
User rules that adjust annotation presentation policy.

Enums§

ContextWindow
The context boundary used by stateful annotation middlewares.
DirectiveAction
Action applied when a user directive predicate matches an annotation.
Error
Error returned by fallible core pipeline entry points.
InputToken
A token emitted by a reader before hanja conversion has run.
NumeralStrategy
Strategy for rendering hanja numerals encountered in fallback text.
OriginalGloss
Form for the gloss attached to annotations in RenderMode::Original.
OutputToken
A token emitted by the engine after hanja conversion.
Recovery
Stream-level error recovery policy.
RenderMode
The concrete rendering mode for annotated hanja words.
RenderedToken
A token emitted by a renderer after all annotations have been expanded.
RubyBase
Selects which side of a <ruby> element is the base text.
SegmentationStrategy
Strategy used to segment hanja-containing spans.

Traits§

HanjaDictionary
A hanja dictionary queried by the conversion engine.
ScopeData
Adapter-owned data attached to an intermediate-representation scope.

Functions§

apply_user_directives
Applies literal user directives to annotation policy flags.
apply_user_directives_iter
Lazily applies literal user directives to an output token stream.
convert_plain_text
Converts plain text through reader, engine, renderer, and writer stages.
convert_plain_text_with_options
Converts plain text with explicit hanja conversion engine options.
filter_first_occurrences
Clears repeat gloss requirements after the first occurrence of each hanja.
is_hanja
Returns whether ch is in a known CJK ideograph range.
mark_homophones
Sets homophone on dictionary annotations sharing a reading.
process_fallible_tokens
Processes fallible input tokens with default engine options.
process_fallible_tokens_with_options
Processes fallible input tokens with explicit engine options.
process_tokens
Processes input tokens with the default hanja conversion engine options.
process_tokens_iter
Processes input tokens through the default engine options and returns an iterator over the collected output.
process_tokens_iter_with_options
Processes input tokens through explicit engine options and returns an iterator over the collected output.
process_tokens_with_options
Processes input tokens with explicit hanja conversion engine options.
read_plain_text
Reads a plain-text string into the core input-token stream.
recover_input_token
Resolves one fallible reader item according to a Recovery policy.
recover_input_tokens
Resolves a fallible reader token stream into recovered input tokens.
render_tokens
Renders engine output tokens into annotation-free tokens.
render_tokens_iter
Renders engine output tokens into annotation-free tokens as an iterator.
write_plain_text
Writes rendered plain-text tokens back to a string.