# Lindera IPADIC Builder
[](https://opensource.org/licenses/MIT) [](https://gitter.im/lindera-morphology/lindera?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
IPADIC dictionary builder for [Lindera](https://github.com/lindera-morphology/lindera). This project fork from fulmicoton's [kuromoji-rs](https://github.com/fulmicoton/kuromoji-rs).
## Install
```
% cargo install lindera-ipadic-builder
```
## Build
The following products are required to build:
- Rust >= 1.39.0
- make >= 3.81
```text
% make lindera-ipadic
```
## Dictionary version
This repository contains [mecab-ipadic-2.7.0-20070801](http://jaist.dl.sourceforge.net/project/mecab/mecab-ipadic/2.7.0-20070801/).
## Building a dictionary
Building a dictionary with `lindera-ipadic` command:
```
% ./bin/lindera-ipadic ./mecab-ipadic-2.7.0-20070801 ./lindera-ipadic-2.7.0-20070801
```
## Dictionary format
Refer to the [manual](https://ja.osdn.net/projects/ipadic/docs/ipadic-2.7.0-manual-en.pdf/en/1/ipadic-2.7.0-manual-en.pdf.pdf) for details on the IPADIC dictionary format and part-of-speech tags.
| 0 | 品詞 | part-of-speech | |
| 1 | 品詞細分類1 | sub POS 1 | |
| 2 | 品詞細分類2 | sub POS 2 | |
| 3 | 品詞細分類3 | sub POS 3 | |
| 4 | 活用形 | conjugation type | |
| 5 | 活用型 | conjugation form | |
| 6 | 原形 | base form | |
| 7 | 読み | reading | |
| 8 | 発音 | pronunciation | |
## Tokenizing text using produced dictionary
You can tokenize text using produced dictionary with `lindera` command:
```
限定 名詞,サ変接続,*,*,*,*,限定,ゲンテイ,ゲンテイ
トートバッグ UNK,*,*,*,*,*,*,*,*
EOS
```
For more details about `lindera` command, please refer to the following URL:
- [Lindera CLI](https://github.com/lindera-morphology/lindera-cli)
## API reference
The API reference is available. Please see following URL:
- <a href="https://docs.rs/lindera-ipadic-builder" target="_blank">lindera-ipadic-builder</a>