Struct RedundantParenCollapser

Source

pub struct RedundantParenCollapser<S>where
    S: ScopeData,
{ /* private fields */ }

Expand description

Streaming middleware that collapses an explicit parenthetical reading annotation into the converted hanja word it duplicates.

Mixed-script input sometimes spells a word together with a parenthetical gloss, either hanja-first (庫間(곳간)) or hangul-first (곳간(庫間)). Left alone, the converter would render the hanja and keep the parenthetical, producing a redundant 곳간(곳간). An author who wrote such a gloss meant “annotate this word fully”, so this middleware detects the two patterns, removes the now-redundant parenthetical text, and sets both Annotation::require_hanja and Annotation::require_hangul on the surviving annotation. Setting both flags reproduces the author’s intent in every render mode: RenderMode::HangulOnly honours require_hanja (곳간(庫間)) while RenderMode::Original honours require_hangul (庫間(곳간)).

A parenthetical may also pin an alternative reading. 數字 is normally read 숫자, but in the sense “a few characters” it reads 수자; writing 數字(수자) fixes the reading for that occurrence. Such a reading annotation is told apart from a definition gloss like 庫間(물건을 간직하여 두는 곳) with a two-tier test against the candidate hangul R:

Exact match — R equals the annotation’s reading. Collapse and keep the reading.
Valid alternative reading — R has exactly one hangul syllable per hanja character and every syllable is a recorded Unihan reading of its character (or the initial-sound-law variant of one). Collapse and override the reading with R.

Anything else (definition glosses, foreign transliterations such as 蔣介石(장제스), or a syllable-count mismatch) is left untouched.

The middleware runs immediately after the engine, before HomophoneMarker and FirstOccurrenceFilter, so later stages observe the corrected reading and flags. It coalesces adjacent OutputToken::Text tokens (the streaming engine flushes non-hanja text at safe points, so (곳간) can arrive split as (곳간 then )) and buffers only a bounded amount: a held annotation, the trailing matchable suffix of the preceding text, and the following parenthetical until it can be classified. This keeps the streaming result identical to a one-shot conversion while staying responsive on long hanja-free runs. OutputToken::Open, OutputToken::Close, and OutputToken::Verbatim flush the buffer and pass through, so a match never crosses a scope boundary. When enabled is false the middleware is an exact pass-through.

§Limitation

The collapser runs after the engine and never re-derives readings, so a hanja-first gloss immediately followed (with no space) by an initial-sound-law (頭音法則) character keeps the reading the engine chose with the parenthetical acting as a word boundary. For example 學(학)率 collapses to 학(學)율 rather than 학률: the engine read 率 as word-initial 율 because ) separated it from 學, and removing the gloss cannot recover the non-word-initial 률. This is narrow in practice; an intended compound is normally written 學率(학률). Insert a space (學(학) 率) or gloss the whole compound to control the reading.

Struct RedundantParenCollapser Copy item path

§Limitation

Implementations§

impl<S> RedundantParenCollapser<S>where S: ScopeData,

pub fn new(enabled: bool) -> RedundantParenCollapser<S>

pub fn push_token(&mut self, token: OutputToken<S>) -> Vec<OutputToken<S>>

pub fn finish(self) -> Vec<OutputToken<S>>

Auto Trait Implementations§

impl<S> Freeze for RedundantParenCollapser<S>

impl<S> RefUnwindSafe for RedundantParenCollapser<S>

impl<S> Send for RedundantParenCollapser<S>

impl<S> Sync for RedundantParenCollapser<S>

impl<S> Unpin for RedundantParenCollapser<S>

impl<S> UnsafeUnpin for RedundantParenCollapser<S>

impl<S> UnwindSafe for RedundantParenCollapser<S>

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

fn in_current_span(self) -> Instrumented<Self>

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<T> WithSubscriber for T

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>where S: Into<Dispatch>,

fn with_current_subscriber(self) -> WithDispatch<Self>

Struct RedundantParenCollapser

impl<S> RedundantParenCollapser<S>
where S: ScopeData,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,