Module mode

Module mode 

Source
Expand description

Tokenization modes and penalty configurations.

This module defines the different tokenization modes available and their penalty configurations for controlling segmentation behavior.

§Modes

  • Normal: Standard tokenization based on dictionary cost
  • Decompose: Decomposes compound words with penalty-based control

§Examples

# Normal mode
tokenizer = lindera.TokenizerBuilder().set_mode("normal").build()

# Decompose mode
tokenizer = lindera.TokenizerBuilder().set_mode("decompose").build()

# Custom penalty configuration
penalty = lindera.Penalty(
    kanji_penalty_length_threshold=2,
    kanji_penalty_length_penalty=3000
)

Structs§

PyPenalty
Penalty configuration for decompose mode.

Enums§

PyMode
Tokenization mode.