Skip to main content

Module text

Module text 

Source

Structs§

TextProcessor
Configurable text preprocessing pipeline.

Functions§

build_vocab
Builds a vocabulary map and reverse lookup from tokenized sentences.
build_vocab_with_freq
Builds a vocabulary map, reverse lookup, and per-ID word frequencies from tokenized sentences.
load_text_data
Tokenizes text using the default TextProcessor settings.
load_text_data_advanced
Tokenizes text using a custom TextProcessor.