Module text

Module text 

Source
Expand description

Text preprocessing utilities for sklears

This module provides text preprocessing capabilities including:

  • Text tokenization and normalization
  • TF-IDF vectorization
  • N-gram feature generation
  • Text similarity features
  • Sentence embeddings

Structs§

BagOfWordsConfig
Configuration for bag-of-words embeddings
BagOfWordsEmbedding
Simple bag-of-words sentence embeddings
NgramGenerator
N-gram generator for creating n-gram features from text
NgramGeneratorConfig
Configuration for N-gram generator
TextSimilarity
Text similarity calculator
TextSimilarityConfig
Configuration for text similarity calculator
TextTokenizer
Text tokenizer for preprocessing text data
TextTokenizerConfig
Configuration for text tokenizer
TfIdfVectorizer
TF-IDF vectorizer for converting text to numerical features
TfIdfVectorizerConfig
Configuration for TF-IDF vectorizer

Enums§

NgramType
N-gram type
NormalizationStrategy
Text normalization strategy
SimilarityMetric
Similarity metrics for text comparison
TokenizationStrategy
Text tokenization strategy