Skip to main content

Module mode

Module mode

Expand description

Tokenization modes and penalty configurations.

This module defines the different tokenization modes available and their penalty configurations for controlling segmentation behavior.

§Modes

Normal: Standard tokenization based on dictionary cost
Decompose: Decomposes compound words with penalty-based control

§Examples

# Normal mode
tokenizer = lindera.TokenizerBuilder().set_mode("normal").build()

# Decompose mode
tokenizer = lindera.TokenizerBuilder().set_mode("decompose").build()

# Custom penalty configuration
penalty = lindera.Penalty(
    kanji_penalty_length_threshold=2,
    kanji_penalty_length_penalty=3000
)

Structs§

PyPenalty: Penalty configuration for decompose mode.

Enums§

PyMode: Tokenization mode.

Functions§

register