Expand description
Core types and algorithms for Gukhanmun.
This crate is the home for the format-neutral intermediate representation, conversion engine, dictionary traits, lattice segmentation, and fallback hanja reading logic. Format adapters, command-line I/O, and language bindings live in separate crates.
Structs§
- Annotation
- Metadata for a dictionary-backed hanja conversion.
- Chain
Dictionary - A dictionary composition that preserves caller-supplied priority order.
- Dictionary
Record - A complete dictionary entry exposed for batch policy analysis.
- Engine
- Stateful hanja conversion engine for chunked token streams.
- Engine
Options - Engine-level options that affect hanja conversion before rendering.
- First
Occurrence Filter - Streaming first-occurrence middleware.
- Homophone
Marker - Streaming homophone marker middleware.
- MapDictionary
- A small in-memory dictionary backed by an ordered map.
- Match
- A dictionary match that starts at the queried cursor position.
- Match
Mark - Dictionary-provided rendering constraints for a match.
- Plain
Scope Data - Scope data used by the plain-text adapter.
- Recoverable
Input Error - A recoverable reader error plus the original source region.
- Render
Options - Rendering options that combine a
RenderModewith per-mode sub-options. - Renderer
- Stateful renderer for chunked
OutputTokenstreams. - Scope
- A structural scope in the format-neutral token stream.
- Unihan
Char Dict - Per-character Unihan fallback readings exposed as a dictionary.
- User
Directives - User rules that adjust annotation presentation policy.
Enums§
- Context
Window - The context boundary used by stateful annotation middlewares.
- Directive
Action - Action applied when a user directive predicate matches an annotation.
- Error
- Error returned by fallible core pipeline entry points.
- Input
Token - A token emitted by a reader before hanja conversion has run.
- Numeral
Strategy - Strategy for rendering hanja numerals encountered in fallback text.
- Original
Gloss - Form for the gloss attached to annotations in
RenderMode::Original. - Output
Token - A token emitted by the engine after hanja conversion.
- Recovery
- Stream-level error recovery policy.
- Render
Mode - The concrete rendering mode for annotated hanja words.
- Rendered
Token - A token emitted by a renderer after all annotations have been expanded.
- Ruby
Base - Selects which side of a
<ruby>element is the base text. - Segmentation
Strategy - Strategy used to segment hanja-containing spans.
Traits§
- Hanja
Dictionary - A hanja dictionary queried by the conversion engine.
- Scope
Data - Adapter-owned data attached to an intermediate-representation scope.
Functions§
- apply_
user_ directives - Applies literal user directives to annotation policy flags.
- apply_
user_ directives_ iter - Lazily applies literal user directives to an output token stream.
- convert_
plain_ text - Converts plain text through reader, engine, renderer, and writer stages.
- convert_
plain_ text_ with_ options - Converts plain text with explicit hanja conversion engine options.
- filter_
first_ occurrences - Clears repeat gloss requirements after the first occurrence of each hanja.
- is_
hanja - Returns whether
chis in a known CJK ideograph range. - mark_
homophones - Sets
homophoneon dictionary annotations sharing a reading. - process_
fallible_ tokens - Processes fallible input tokens with default engine options.
- process_
fallible_ tokens_ with_ options - Processes fallible input tokens with explicit engine options.
- process_
tokens - Processes input tokens with the default hanja conversion engine options.
- process_
tokens_ iter - Processes input tokens through the default engine options and returns an iterator over the collected output.
- process_
tokens_ iter_ with_ options - Processes input tokens through explicit engine options and returns an iterator over the collected output.
- process_
tokens_ with_ options - Processes input tokens with explicit hanja conversion engine options.
- read_
plain_ text - Reads a plain-text string into the core input-token stream.
- recover_
input_ token - Resolves one fallible reader item according to a
Recoverypolicy. - recover_
input_ tokens - Resolves a fallible reader token stream into recovered input tokens.
- render_
tokens - Renders engine output tokens into annotation-free tokens.
- render_
tokens_ iter - Renders engine output tokens into annotation-free tokens as an iterator.
- write_
plain_ text - Writes rendered plain-text tokens back to a string.