Crate lindera

Crate lindera 

Source
Expand description

§Lindera Python Bindings

Python bindings for Lindera, a morphological analysis library for CJK text.

Lindera provides high-performance tokenization and morphological analysis for:

  • Japanese (IPADIC, IPADIC NEologd, UniDic)
  • Korean (ko-dic)
  • Chinese (CC-CEDICT)

§Features

  • Dictionary management: Build, load, and use custom dictionaries
  • Tokenization: Multiple tokenization modes (normal, decompose)
  • Filters: Character and token filtering pipeline
  • Training: Train custom morphological models (with train feature)
  • User dictionaries: Support for custom user dictionaries

§Examples

import lindera

# Create a tokenizer
tokenizer = lindera.TokenizerBuilder().build()

# Tokenize text
tokens = tokenizer.tokenize("関西国際空港")
for token in tokens:
    print(token["text"], token["detail"])

Modules§

dictionary
Dictionary management for morphological analysis.
error
Error types for Lindera operations.
metadata
Dictionary metadata configuration.
mode
Tokenization modes and penalty configurations.
schema
Dictionary schema definitions.
tokenizer
Tokenizer implementation for morphological analysis.
trainer
Training functionality for custom morphological models.
util
Utility functions for Python-Rust data conversion.

Functions§

version
Returns the version of the lindera-python package.