Crate lindera_ruby

Source

Expand description

§Lindera Ruby Bindings

Ruby bindings for Lindera, a morphological analysis library for CJK text.

Lindera provides high-performance tokenization and morphological analysis for:

Japanese (IPADIC, IPADIC NEologd, UniDic)
Korean (ko-dic)
Chinese (CC-CEDICT, Jieba)

§Features

Dictionary management: Build, load, and use custom dictionaries
Tokenization: Multiple tokenization modes (normal, decompose)
Filters: Character and token filtering pipeline
Training: Train custom morphological models (with train feature)
User dictionaries: Support for custom user dictionaries

§Examples

require "lindera"

# Create a tokenizer
builder = Lindera::TokenizerBuilder.new
tokenizer = builder.build

# Tokenize text
tokens = tokenizer.tokenize("関西国際空港")
tokens.each { |token| puts "#{token.surface}: #{token.details}" }

Modules§

character_filter: Character filters for preprocessing text.
dictionary: Dictionary management for morphological analysis.
error: Error types for Lindera operations.
metadata: Dictionary metadata configuration.
mode: Tokenization modes and penalty configurations.
schema: Dictionary schema definitions.
segmenter: Segmenter implementation for morphological analysis.
token: Token representation for morphological analysis results.
token_filter: Token filters for post-processing tokens.
tokenizer: Tokenizer implementation for morphological analysis.
trainer: Training functionality for custom morphological models.
util: Utility functions for Ruby-Rust data conversion.

Functions§

Init_lindera_ruby^⚠

Crate lindera_ruby

Crate lindera_ruby Copy item path

§Lindera Ruby Bindings

§Features

§Examples

Modules§

Functions§

Crate lindera_ruby