pub struct StreamingTextProcessor<T: Tokenizer> { /* private fields */ }Expand description
Streaming text processor for handling arbitrarily large files
Implementations§
Source§impl<T: Tokenizer> StreamingTextProcessor<T>
impl<T: Tokenizer> StreamingTextProcessor<T>
Sourcepub fn with_buffer_size(self, size: usize) -> Self
pub fn with_buffer_size(self, size: usize) -> Self
Set custom buffer size
Sourcepub fn process_lines<P, F, R>(&self, path: P, processor: F) -> Result<Vec<R>>
pub fn process_lines<P, F, R>(&self, path: P, processor: F) -> Result<Vec<R>>
Process a file line by line
Sourcepub fn process_reader_lines<R: BufRead, F, U>(
&self,
reader: R,
processor: F,
) -> Result<Vec<U>>
pub fn process_reader_lines<R: BufRead, F, U>( &self, reader: R, processor: F, ) -> Result<Vec<U>>
Process lines from any reader
Sourcepub fn build_vocabulary_streaming<P: AsRef<Path>>(
&self,
path: P,
min_count: usize,
) -> Result<Vocabulary>
pub fn build_vocabulary_streaming<P: AsRef<Path>>( &self, path: P, min_count: usize, ) -> Result<Vocabulary>
Build vocabulary from a streaming corpus
Source§impl StreamingTextProcessor<WordTokenizer>
impl StreamingTextProcessor<WordTokenizer>
Sourcepub fn with_default_tokenizer() -> Self
pub fn with_default_tokenizer() -> Self
Create a streaming processor with default word tokenizer
Auto Trait Implementations§
impl<T> Freeze for StreamingTextProcessor<T>where
T: Freeze,
impl<T> RefUnwindSafe for StreamingTextProcessor<T>where
T: RefUnwindSafe,
impl<T> Send for StreamingTextProcessor<T>where
T: Send,
impl<T> Sync for StreamingTextProcessor<T>where
T: Sync,
impl<T> Unpin for StreamingTextProcessor<T>where
T: Unpin,
impl<T> UnwindSafe for StreamingTextProcessor<T>where
T: UnwindSafe,
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
The inverse inclusion map: attempts to construct
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
Checks if
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
Use with care! Same as
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
The inclusion map: converts
self to the equivalent element of its superset.