Crate scirs2_text

Source
Expand description

Text processing module for SciRS2

This module provides functionality for text processing, tokenization, vectorization, word embeddings, and other NLP-related operations.

Re-exports§

pub use distance::cosine_similarity;
pub use distance::jaccard_similarity;
pub use distance::levenshtein_distance;
pub use embeddings::Word2Vec;
pub use embeddings::Word2VecAlgorithm;
pub use embeddings::Word2VecConfig;
pub use error::Result;
pub use error::TextError;
pub use preprocess::BasicNormalizer;
pub use preprocess::BasicTextCleaner;
pub use preprocess::TextCleaner;
pub use preprocess::TextNormalizer;
pub use tokenize::CharacterTokenizer;
pub use tokenize::SentenceTokenizer;
pub use tokenize::Tokenizer;
pub use tokenize::WordTokenizer;
pub use vectorize::CountVectorizer;
pub use vectorize::TfidfVectorizer;
pub use vocabulary::Vocabulary;

Modules§

distance
Text distance and similarity measures
embeddings
Word embedding implementations
error
Error types for the text processing module
preprocess
Text preprocessing utilities
tokenize
Text tokenization utilities
utils
Utility functions for text processing
vectorize
Text vectorization utilities
vocabulary
Vocabulary management for text processing