Expand description
Tokenizer API for text processing
Structs§
- Language
Aware Tokenizer - Language-aware tokenizer that can be configured per-field
- Lowercase
Tokenizer - Lowercase tokenizer - splits on whitespace and lowercases
- Multi
Language Stemmer - Multi-language stemmer that can select language dynamically
- Simple
Tokenizer - Simple whitespace tokenizer
- Stemmer
Tokenizer - Stemming tokenizer - splits on whitespace, lowercases, and applies stemming
- Stop
Word Tokenizer - Stop word filter tokenizer - wraps another tokenizer and filters out stop words
- Token
- A token produced by tokenization
- Tokenizer
Registry - Registry for named tokenizers
Enums§
- Language
- Supported stemmer languages
Traits§
- Tokenizer
- Trait for tokenizers
- Tokenizer
Clone
Functions§
- parse_
language - Parse a language string into a Language enum
Type Aliases§
- Boxed
Tokenizer - Boxed tokenizer for dynamic dispatch