Skip to main content

DocumentScript

Enum DocumentScript 

Source
pub enum DocumentScript {
    Latin,
    CJK,
    RTL,
    Complex,
    Mixed,
}
Expand description

Document script profile for optimization.

OPTIMIZATION (Issue #1 fix): Detect document primary script once, then skip unnecessary script detection functions for faster boundary detection.

When documents contain only Latin text, we skip RTL and CJK detection entirely. When documents are CJK-dominant, we skip RTL detection. This reduces function call overhead from millions per batch to thousands.

Variants§

§

Latin

Latin-only document (ASCII + extended Latin) Fast path: only check space, TJ offset, geometric gap

§

CJK

CJK-dominant document (Chinese, Japanese, Korean) Skip RTL detection, use optimized CJK path

§

RTL

Right-to-left dominant (Arabic, Hebrew) Skip CJK detection, use optimized RTL path

§

Complex

Complex scripts (Devanagari, Thai, Khmer, etc.) Use specialized complex script detection

§

Mixed

Mixed scripts or unknown Check all detection functions (slowest path)

Implementations§

Source§

impl DocumentScript

Source

pub fn detect_from_characters(characters: &[CharacterInfo]) -> Self

Detect document script profile by sampling first 1000 characters.

This optimization reduces boundary detection overhead by skipping unnecessary script detection for documents with known script profiles.

PERFORMANCE: O(min(n, 1000)) sampling, executed once per extraction

Trait Implementations§

Source§

impl Clone for DocumentScript

Source§

fn clone(&self) -> DocumentScript

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for DocumentScript

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl PartialEq for DocumentScript

Source§

fn eq(&self, other: &DocumentScript) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Copy for DocumentScript

Source§

impl Eq for DocumentScript

Source§

impl StructuralPartialEq for DocumentScript

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> Ungil for T
where T: Send,