Skip to main content

Module mode

Module mode 

Source
Expand description

Tokenization modes and penalty configurations.

This module defines the different tokenization modes available and their penalty configurations for controlling segmentation behavior.

§Modes

  • Normal: Standard tokenization based on dictionary cost
  • Decompose: Decomposes compound words with penalty-based control

§Examples

# Normal mode
tokenizer = lindera.TokenizerBuilder().set_mode("normal").build()

# Decompose mode
tokenizer = lindera.TokenizerBuilder().set_mode("decompose").build()

# Custom penalty configuration
penalty = lindera.Penalty(
    kanji_penalty_length_threshold=2,
    kanji_penalty_length_penalty=3000
)

Structs§

PyPenalty
Penalty configuration for decompose mode.

Enums§

PyMode
Tokenization mode.

Functions§

register