pub struct WordExtractor;Expand description
Extracts words from a sequence of characters based on spatial proximity.
Implementations§
Source§impl WordExtractor
impl WordExtractor
Sourcepub fn extract(chars: &[Char], options: &WordOptions) -> Vec<Word>
pub fn extract(chars: &[Char], options: &WordOptions) -> Vec<Word>
Extract words from the given characters using the specified options.
Characters are grouped into words based on spatial proximity:
- Characters within
x_tolerancehorizontally andy_tolerancevertically are grouped together. - For CJK characters, character width (or height for vertical text) is used
as the tolerance instead of the fixed
x_tolerance/y_tolerance. - By default, whitespace characters split words. Set
keep_blank_charsto include them. - By default, characters are sorted spatially. Set
use_text_flowto preserve PDF content stream order. text_directioncontrols sorting and gap logic for vertical text.
Auto Trait Implementations§
impl Freeze for WordExtractor
impl RefUnwindSafe for WordExtractor
impl Send for WordExtractor
impl Sync for WordExtractor
impl Unpin for WordExtractor
impl UnsafeUnpin for WordExtractor
impl UnwindSafe for WordExtractor
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more