Module tokenizer

Module tokenizer 

Source
Expand description

Tokenizer implementation for morphological analysis.

This module provides a builder pattern for creating tokenizers and the tokenizer itself.

§Examples

# Create a tokenizer with custom configuration
tokenizer = (lindera.TokenizerBuilder()
    .set_mode("normal")
    .append_token_filter("japanese_stop_tags", {"tags": ["助詞"]})
    .build())

# Tokenize text
tokens = tokenizer.tokenize("すもももももももものうち")

Structs§

PyTokenizer
Tokenizer for performing morphological analysis.
PyTokenizerBuilder
Builder for creating a Tokenizer with custom configuration.

Type Aliases§

PyDictRef