Expand description
Text preprocessing utilities for sklears
This module provides text preprocessing capabilities including:
- Text tokenization and normalization
- TF-IDF vectorization
- N-gram feature generation
- Text similarity features
- Sentence embeddings
Structs§
- BagOf
Words Config - Configuration for bag-of-words embeddings
- BagOf
Words Embedding - Simple bag-of-words sentence embeddings
- Ngram
Generator - N-gram generator for creating n-gram features from text
- Ngram
Generator Config - Configuration for N-gram generator
- Text
Similarity - Text similarity calculator
- Text
Similarity Config - Configuration for text similarity calculator
- Text
Tokenizer - Text tokenizer for preprocessing text data
- Text
Tokenizer Config - Configuration for text tokenizer
- TfIdf
Vectorizer - TF-IDF vectorizer for converting text to numerical features
- TfIdf
Vectorizer Config - Configuration for TF-IDF vectorizer
Enums§
- Ngram
Type - N-gram type
- Normalization
Strategy - Text normalization strategy
- Similarity
Metric - Similarity metrics for text comparison
- Tokenization
Strategy - Text tokenization strategy