Expand description
Tokenizer implementation for morphological analysis.
This module provides a builder pattern for creating tokenizers and the tokenizer itself.
§Examples
# Create a tokenizer with custom configuration
tokenizer = (lindera.TokenizerBuilder()
.set_mode("normal")
.append_token_filter("japanese_stop_tags", {"tags": ["助詞"]})
.build())
# Tokenize text
tokens = tokenizer.tokenize("すもももももももものうち")Structs§
- PyTokenizer
- Tokenizer for performing morphological analysis.
- PyTokenizer
Builder - Builder for creating a
Tokenizerwith custom configuration.