Skip to main content

Crate lindera_ruby

Crate lindera_ruby 

Source
Expand description

§Lindera Ruby Bindings

Ruby bindings for Lindera, a morphological analysis library for CJK text.

Lindera provides high-performance tokenization and morphological analysis for:

  • Japanese (IPADIC, IPADIC NEologd, UniDic)
  • Korean (ko-dic)
  • Chinese (CC-CEDICT, Jieba)

§Features

  • Dictionary management: Build, load, and use custom dictionaries
  • Tokenization: Multiple tokenization modes (normal, decompose)
  • Filters: Character and token filtering pipeline
  • Training: Train custom morphological models (with train feature)
  • User dictionaries: Support for custom user dictionaries

§Examples

require "lindera"

# Create a tokenizer
builder = Lindera::TokenizerBuilder.new
tokenizer = builder.build

# Tokenize text
tokens = tokenizer.tokenize("関西国際空港")
tokens.each { |token| puts "#{token.surface}: #{token.details}" }

Modules§

character_filter
Character filters for preprocessing text.
dictionary
Dictionary management for morphological analysis.
error
Error types for Lindera operations.
metadata
Dictionary metadata configuration.
mode
Tokenization modes and penalty configurations.
schema
Dictionary schema definitions.
segmenter
Segmenter implementation for morphological analysis.
token
Token representation for morphological analysis results.
token_filter
Token filters for post-processing tokens.
tokenizer
Tokenizer implementation for morphological analysis.
trainer
Training functionality for custom morphological models.
util
Utility functions for Ruby-Rust data conversion.

Functions§

Init_lindera_ruby