Enum DocumentScript

Source

pub enum DocumentScript {
    Latin,
    CJK,
    RTL,
    Complex,
    Mixed,
}

Expand description

Document script profile for optimization.

OPTIMIZATION (Issue #1 fix): Detect document primary script once, then skip unnecessary script detection functions for faster boundary detection.

When documents contain only Latin text, we skip RTL and CJK detection entirely. When documents are CJK-dominant, we skip RTL detection. This reduces function call overhead from millions per batch to thousands.

Variants§

§

Latin

Latin-only document (ASCII + extended Latin) Fast path: only check space, TJ offset, geometric gap

§

CJK

CJK-dominant document (Chinese, Japanese, Korean) Skip RTL detection, use optimized CJK path

§

RTL

Right-to-left dominant (Arabic, Hebrew) Skip CJK detection, use optimized RTL path

§

Complex

Complex scripts (Devanagari, Thai, Khmer, etc.) Use specialized complex script detection

§

Mixed

Mixed scripts or unknown Check all detection functions (slowest path)

Implementations§

Source §

impl DocumentScript

Source

pub fn detect_from_characters(characters: &[CharacterInfo]) -> Self

Detect document script profile by sampling first 1000 characters.

This optimization reduces boundary detection overhead by skipping unnecessary script detection for documents with known script profiles.

PERFORMANCE: O(min(n, 1000)) sampling, executed once per extraction

Trait Implementations§

Source §

impl Clone for DocumentScript

Source §

fn clone(&self) -> DocumentScript

Returns a duplicate of the value. Read more

1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Source §

impl Debug for DocumentScript

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Source §

impl PartialEq for DocumentScript

Source §

fn eq(&self, other: &DocumentScript) -> bool

Tests for self and other values to be equal, and is used by ==.

1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.

Source §

impl Copy for DocumentScript

Source §

impl Eq for DocumentScript

Source §

impl StructuralPartialEq for DocumentScript

Auto Trait Implementations§

§

impl UnwindSafe for DocumentScript

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> CloneToUninit for T
where T: Clone,

Source §

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> Same for T

Source §

type Output = T

Should always be Self

Source §

impl<T> ToOwned for T
where T: Clone,

Source §

type Owned = T

The resulting type after obtaining ownership.

Source §

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more

Source §

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more

Source §

impl<T, U> TryFrom for T
where U: Into<T>,

Source §

type Error = Infallible

The type returned in the event of a conversion error.

Source §

fn try_from(value: U) -> Result<T, <T as TryFrom>::Error>

Performs the conversion.

Source §

impl<T, U> TryInto for T
where U: TryFrom<T>,

Source §

type Error = >::Error

The type returned in the event of a conversion error.

Source §

fn try_into(self) -> Result<U, >::Error>

Performs the conversion.

Source §

Enum DocumentScript Copy item path

Variants§

Latin

CJK

RTL

Complex

Mixed

Implementations§

impl DocumentScript

pub fn detect_from_characters(characters: &[CharacterInfo]) -> Self

Trait Implementations§

impl Clone for DocumentScript

fn clone(&self) -> DocumentScript

fn clone_from(&mut self, source: &Self)

impl Debug for DocumentScript

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl PartialEq for DocumentScript

fn eq(&self, other: &DocumentScript) -> bool

fn ne(&self, other: &Rhs) -> bool

impl Copy for DocumentScript

impl Eq for DocumentScript

impl StructuralPartialEq for DocumentScript

Auto Trait Implementations§

impl Freeze for DocumentScript

impl RefUnwindSafe for DocumentScript

impl Send for DocumentScript

impl Sync for DocumentScript

impl Unpin for DocumentScript

impl UnwindSafe for DocumentScript

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> Same for T

type Output = T

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<T> Ungil for Twhere T: Send,

Enum DocumentScript

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<T> Ungil for T
where T: Send,