Skip to main content

RedundantParenCollapser

Struct RedundantParenCollapser 

Source
pub struct RedundantParenCollapser<S>
where S: ScopeData,
{ /* private fields */ }
Expand description

Streaming middleware that collapses an explicit parenthetical reading annotation into the converted hanja word it duplicates.

Mixed-script input sometimes spells a word together with a parenthetical gloss, either hanja-first (庫間(곳간)) or hangul-first (곳간(庫間)). Left alone, the converter would render the hanja and keep the parenthetical, producing a redundant 곳간(곳간). An author who wrote such a gloss meant “annotate this word fully”, so this middleware detects the two patterns, removes the now-redundant parenthetical text, and sets both Annotation::require_hanja and Annotation::require_hangul on the surviving annotation. Setting both flags reproduces the author’s intent in every render mode: RenderMode::HangulOnly honours require_hanja (곳간(庫間)) while RenderMode::Original honours require_hangul (庫間(곳간)).

A parenthetical may also pin an alternative reading. 數字 is normally read 숫자, but in the sense “a few characters” it reads 수자; writing 數字(수자) fixes the reading for that occurrence. Such a reading annotation is told apart from a definition gloss like 庫間(물건을 간직하여 두는 곳) with a two-tier test against the candidate hangul R:

  1. Exact matchR equals the annotation’s reading. Collapse and keep the reading.
  2. Valid alternative readingR has exactly one hangul syllable per hanja character and every syllable is a recorded Unihan reading of its character (or the initial-sound-law variant of one). Collapse and override the reading with R.

Anything else (definition glosses, foreign transliterations such as 蔣介石(장제스), or a syllable-count mismatch) is left untouched.

The middleware runs immediately after the engine, before HomophoneMarker and FirstOccurrenceFilter, so later stages observe the corrected reading and flags. It coalesces adjacent OutputToken::Text tokens (the streaming engine flushes non-hanja text at safe points, so (곳간) can arrive split as (곳간 then )) and buffers only a bounded amount: a held annotation, the trailing matchable suffix of the preceding text, and the following parenthetical until it can be classified. This keeps the streaming result identical to a one-shot conversion while staying responsive on long hanja-free runs. OutputToken::Open, OutputToken::Close, and OutputToken::Verbatim flush the buffer and pass through, so a match never crosses a scope boundary. When enabled is false the middleware is an exact pass-through.

§Limitation

The collapser runs after the engine and never re-derives readings, so a hanja-first gloss immediately followed (with no space) by an initial-sound-law (頭音法則) character keeps the reading the engine chose with the parenthetical acting as a word boundary. For example 學(학)率 collapses to 학(學)율 rather than 학률: the engine read as word-initial because ) separated it from , and removing the gloss cannot recover the non-word-initial . This is narrow in practice; an intended compound is normally written 學率(학률). Insert a space (學(학) 率) or gloss the whole compound to control the reading.

Implementations§

Source§

impl<S> RedundantParenCollapser<S>
where S: ScopeData,

Source

pub fn new(enabled: bool) -> RedundantParenCollapser<S>

Creates a collapser. When enabled is false every token passes through unchanged.

Source

pub fn push_token(&mut self, token: OutputToken<S>) -> Vec<OutputToken<S>>

Pushes one output token and returns tokens ready for downstream stages.

Source

pub fn finish(self) -> Vec<OutputToken<S>>

Flushes buffered tokens and returns them.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more