pub struct TextChunker {
pub chunk_size: usize,
pub overlap: usize,
pub by_sentence: bool,
}Expand description
Sliding-window text chunker.
§Example
use scirs2_text::segmentation::TextChunker;
let chunker = TextChunker::new(10, 2);
let chunks = chunker.chunk("Rust is fast. Rust is safe. Rust is fun.");
assert!(!chunks.is_empty());Fields§
§chunk_size: usizeNumber of tokens (words) per chunk.
overlap: usizeNumber of tokens of overlap between consecutive chunks.
by_sentence: boolIf true, try to respect sentence boundaries.
Implementations§
Source§impl TextChunker
impl TextChunker
Sourcepub fn with_sentence_boundaries(self) -> Self
pub fn with_sentence_boundaries(self) -> Self
Enable sentence-boundary-respecting mode.
Sourcepub fn chunk_with_metadata(&self, text: &str) -> Vec<TextChunk>
pub fn chunk_with_metadata(&self, text: &str) -> Vec<TextChunk>
Chunk text and return TextChunk structs with metadata.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for TextChunker
impl RefUnwindSafe for TextChunker
impl Send for TextChunker
impl Sync for TextChunker
impl Unpin for TextChunker
impl UnsafeUnpin for TextChunker
impl UnwindSafe for TextChunker
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
The inverse inclusion map: attempts to construct
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
Checks if
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
Use with care! Same as
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
The inclusion map: converts
self to the equivalent element of its superset.