lindera-unidic-builder 0.3.0

A Japanese morphological dictionary builder for UniDic.

Lindera UniDic Builder

License: MIT Join the chat at https://gitter.im/bayard-search/lindera

UniDic builder for Lindera. This project fork from fulmicoton's kuromoji-rs.

Install

% cargo install lindera-unidic-builder

Build

The following products are required to build:

  • Rust >= 1.39.0
  • make >= 3.81
% make lindera-unidic

Dictionary version

This project supports UniDic 2.1.2. See detail of UniDic .

Building a dictionary

Building a dictionary with lindera-unidic command:

% ./bin/lindera-unidic ./unidic-mecab-2.1.2_src ./lindera-unidic-2.1.2

Tokenizing text using produced dictionary

You can tokenize text using produced dictionary with lindera command:

% echo "羽田空港限定トートバッグ" | lindera -d ./lindera-unidic-2.1.2
羽田    名詞,固有名詞,人名,姓,*,*,羽田,ハタ,ハタ
空港    名詞,普通名詞,一般,*,*,*,空港,クーコー,クーコー
限定    名詞,普通名詞,サ変可能,*,*,*,限定,ゲンテー,ゲンテー
トート  名詞,普通名詞,一般,*,*,*,トート,トート,トート
バッグ  名詞,普通名詞,一般,*,*,*,バッグ,バッグ,バッグ
EOS

For more details about lindera command, please refer to the following URL:

API reference

The API reference is available. Please see following URL:

  • Lindera UniDic Builder

Project links

lindera consists of several projects. The list is following: