Lindera Dictionary

A morphological analysis dictionary library for Lindera.
This package contains dictionary structures and the viterbi algorithm.
Dictionary format
IPADIC
This repository uses mecab-ipadic.
IPADIC dictionary format
Refer to the manual for details on the IPADIC dictionary format and part-of-speech tags.
| Index |
Name (Japanese) |
Name (English) |
Notes |
| 0 |
表層形 |
Surface |
|
| 1 |
左文脈ID |
Left context ID |
|
| 2 |
右文脈ID |
Right context ID |
|
| 3 |
コスト |
Cost |
|
| 4 |
品詞 |
Major POS classification |
|
| 5 |
品詞細分類1 |
Middle POS classification |
|
| 6 |
品詞細分類2 |
Small POS classification |
|
| 7 |
品詞細分類3 |
Fine POS classification |
|
| 8 |
活用形 |
Conjugation type |
|
| 9 |
活用型 |
Conjugation form |
|
| 10 |
原形 |
Base form |
|
| 11 |
読み |
Reading |
|
| 12 |
発音 |
Pronunciation |
|
IPADIC user dictionary format (CSV)
IPADIC user dictionary simple version
| Index |
Name (Japanese) |
Name (English) |
Notes |
| 0 |
表層形 |
surface |
|
| 1 |
品詞 |
Major POS classification |
|
| 2 |
読み |
Reading |
|
IPADIC user dictionary detailed version
| Index |
Name (Japanese) |
Name (English) |
Notes |
| 0 |
表層形 |
Surface |
|
| 1 |
左文脈ID |
Left context ID |
|
| 2 |
右文脈ID |
Right context ID |
|
| 3 |
コスト |
Cost |
|
| 4 |
品詞 |
POS |
|
| 5 |
品詞細分類1 |
POS subcategory 1 |
|
| 6 |
品詞細分類2 |
POS subcategory 2 |
|
| 7 |
品詞細分類3 |
POS subcategory 3 |
|
| 8 |
活用形 |
Conjugation type |
|
| 9 |
活用型 |
Conjugation form |
|
| 10 |
原形 |
Base form |
|
| 11 |
読み |
Reading |
|
| 12 |
発音 |
Pronunciation |
|
| 13 |
- |
- |
After 13, it can be freely expanded. |
IPADIC NEologd
This repository uses mecab-ipadic-neologd.
IPADIC NEologd dictionary format
Refer to the manual for details on the IPADIC dictionary format and part-of-speech tags.
| Index |
Name (Japanese) |
Name (English) |
Notes |
| 0 |
表層形 |
Surface |
|
| 1 |
左文脈ID |
Left context ID |
|
| 2 |
右文脈ID |
Right context ID |
|
| 3 |
コスト |
Cost |
|
| 4 |
品詞 |
Major POS classification |
|
| 5 |
品詞細分類1 |
Middle POS classification |
|
| 6 |
品詞細分類2 |
Small POS classification |
|
| 7 |
品詞細分類3 |
Fine POS classification |
|
| 8 |
活用形 |
Conjugation type |
|
| 9 |
活用型 |
Conjugation form |
|
| 10 |
原形 |
Base form |
|
| 11 |
読み |
Reading |
|
| 12 |
発音 |
Pronunciation |
|
IPADIC NEologd user dictionary format (CSV)
IPADIC NEologd user dictionary simple version
| Index |
Name (Japanese) |
Name (English) |
Notes |
| 0 |
表層形 |
surface |
|
| 1 |
品詞 |
Major POS classification |
|
| 2 |
読み |
Reading |
|
IPADIC NEologd user dictionary detailed version
| Index |
Name (Japanese) |
Name (English) |
Notes |
| 0 |
表層形 |
Surface |
|
| 1 |
左文脈ID |
Left context ID |
|
| 2 |
右文脈ID |
Right context ID |
|
| 3 |
コスト |
Cost |
|
| 4 |
品詞 |
POS |
|
| 5 |
品詞細分類1 |
POS subcategory 1 |
|
| 6 |
品詞細分類2 |
POS subcategory 2 |
|
| 7 |
品詞細分類3 |
POS subcategory 3 |
|
| 8 |
活用形 |
Conjugation type |
|
| 9 |
活用型 |
Conjugation form |
|
| 10 |
原形 |
Base form |
|
| 11 |
読み |
Reading |
|
| 12 |
発音 |
Pronunciation |
|
| 13 |
- |
- |
After 13, it can be freely expanded. |
UniDic
This repository uses unidic-mecab.
UniDic dictionary format
Refer to the manual for details on the unidic-mecab dictionary format and part-of-speech tags.
| Index |
Name (Japanese) |
Name (English) |
Notes |
| 0 |
表層形 |
Surface |
|
| 1 |
左文脈ID |
Left context ID |
|
| 2 |
右文脈ID |
Right context ID |
|
| 3 |
コスト |
Cost |
|
| 4 |
品詞大分類 |
Major POS classification |
|
| 5 |
品詞中分類 |
Middle POS classification |
|
| 6 |
品詞小分類 |
Small POS classification |
|
| 7 |
品詞細分類 |
Fine POS classification |
|
| 8 |
活用型 |
Conjugation form |
|
| 9 |
活用形 |
Conjugation type |
|
| 10 |
語彙素読み |
Lexeme reading |
|
| 11 |
語彙素(語彙素表記 + 語彙素細分類) |
Lexeme |
|
| 12 |
書字形出現形 |
Orthography appearance type |
|
| 13 |
発音形出現形 |
Pronunciation appearance type |
|
| 14 |
書字形基本形 |
Orthography basic type |
|
| 15 |
発音形基本形 |
Pronunciation basic type |
|
| 16 |
語種 |
Word type |
|
| 17 |
語頭変化型 |
Prefix of a word form |
|
| 18 |
語頭変化形 |
Prefix of a word type |
|
| 19 |
語末変化型 |
Suffix of a word form |
|
| 20 |
語末変化形 |
Suffix of a word type |
|
UniDic user dictionary format (CSV)
UniDic user dictionary simple version
| Index |
Name (Japanese) |
Name (English) |
Notes |
| 0 |
表層形 |
Surface |
|
| 1 |
品詞大分類 |
Major POS classification |
|
| 2 |
語彙素読み |
Lexeme reading |
|
UniDic user dictionary detailed version
| Index |
Name (Japanese) |
Name (English) |
Notes |
| 0 |
表層形 |
Surface |
|
| 1 |
左文脈ID |
Left context ID |
|
| 2 |
右文脈ID |
Right context ID |
|
| 3 |
コスト |
Cost |
|
| 4 |
品詞大分類 |
Major POS classification |
|
| 5 |
品詞中分類 |
Middle POS classification |
|
| 6 |
品詞小分類 |
Small POS classification |
|
| 7 |
品詞細分類 |
Fine POS classification |
|
| 8 |
活用型 |
Conjugation form |
|
| 9 |
活用形 |
Conjugation type |
|
| 10 |
語彙素読み |
Lexeme reading |
|
| 11 |
語彙素(語彙素表記 + 語彙素細分類) |
Lexeme |
|
| 12 |
書字形出現形 |
Orthography appearance type |
|
| 13 |
発音形出現形 |
Pronunciation appearance type |
|
| 14 |
書字形基本形 |
Orthography basic type |
|
| 15 |
発音形基本形 |
Pronunciation basic type |
|
| 16 |
語種 |
Word type |
|
| 17 |
語頭変化型 |
Prefix of a word form |
|
| 18 |
語頭変化形 |
Prefix of a word type |
|
| 19 |
語末変化型 |
Suffix of a word form |
|
| 20 |
語末変化形 |
Suffix of a word type |
|
| 21 |
- |
- |
After 21, it can be freely expanded. |
ko-dic
This repository uses mecab-ko-dic.
ko-dic dictionary format
Information about the dictionary format and part-of-speech tags used by mecab-ko-dic id documented in this Google Spreadsheet, linked to from mecab-ko-dic's repository readme.
Note how ko-dic has one less feature column than NAIST JDIC, and has an altogether different set of information (e.g. doesn't provide the "original form" of the word).
The tags are a slight modification of those specified by 세종 (Sejong), whatever that is. The mappings from Sejong to mecab-ko-dic's tag names are given in tab 태그 v2.0 on the above-linked spreadsheet.
The dictionary format is specified fully (in Korean) in tab 사전 형식 v2.0 of the spreadsheet. Any blank values default to *.
| Index |
Name (Korean) |
Name (English) |
Notes |
| 0 |
표면 |
Surface |
|
| 1 |
왼쪽 문맥 ID |
Left context ID |
|
| 2 |
오른쪽 문맥 ID |
Right context ID |
|
| 3 |
비용 |
Cost |
|
| 4 |
품사 태그 |
part-of-speech tag |
See 태그 v2.0 tab on spreadsheet |
| 5 |
의미 부류 |
meaning |
(too few examples for me to be sure) |
| 6 |
종성 유무 |
presence or absence |
T for true; F for false; else * |
| 7 |
읽기 |
reading |
usually matches surface, but may differ for foreign words e.g. Chinese character words |
| 8 |
타입 |
type |
One of: Inflect (활용); Compound (복합명사); or Preanalysis (기분석) |
| 9 |
첫번째 품사 |
first part-of-speech |
e.g. given a part-of-speech tag of "VV+EM+VX+EP", would return VV |
| 10 |
마지막 품사 |
last part-of-speech |
e.g. given a part-of-speech tag of "VV+EM+VX+EP", would return EP |
| 11 |
표현 |
expression |
활용, 복합명사, 기분석이 어떻게 구성되는지 알려주는 필드 – Fields that tell how usage, compound nouns, and key analysis are organized |
ko-dic user dictionary format (CSV)
ko-dic user dictionary simple version
| Index |
Name (Japanese) |
Name (English) |
Notes |
| 0 |
표면 |
Surface |
|
| 1 |
품사 태그 |
part-of-speech tag |
See 태그 v2.0 tab on spreadsheet |
| 2 |
읽기 |
reading |
usually matches surface, but may differ for foreign words e.g. Chinese character words |
ko-dic user dictionary detailed version
| Index |
Name (Korean) |
Name (English) |
Notes |
| 0 |
표면 |
Surface |
|
| 1 |
왼쪽 문맥 ID |
Left context ID |
|
| 2 |
오른쪽 문맥 ID |
Right context ID |
|
| 3 |
비용 |
Cost |
|
| 4 |
품사 태그 |
part-of-speech tag |
See 태그 v2.0 tab on spreadsheet |
| 5 |
의미 부류 |
meaning |
(too few examples for me to be sure) |
| 6 |
종성 유무 |
presence or absence |
T for true; F for false; else * |
| 7 |
읽기 |
reading |
usually matches surface, but may differ for foreign words e.g. Chinese character words |
| 8 |
타입 |
type |
One of: Inflect (활용); Compound (복합명사); or Preanalysis (기분석) |
| 9 |
첫번째 품사 |
first part-of-speech |
e.g. given a part-of-speech tag of "VV+EM+VX+EP", would return VV |
| 10 |
마지막 품사 |
last part-of-speech |
e.g. given a part-of-speech tag of "VV+EM+VX+EP", would return EP |
| 11 |
표현 |
expression |
활용, 복합명사, 기분석이 어떻게 구성되는지 알려주는 필드 – Fields that tell how usage, compound nouns, and key analysis are organized |
| 12 |
- |
- |
After 12, it can be freely expanded. |
CC-CEDICT
This repository uses CC-CEDICT-MeCab.
CC-CEDICT dictionary format
Refer to the manual for details on the unidic-mecab dictionary format and part-of-speech tags.
| Index |
Name (Chinese) |
Name (English) |
Notes |
| 0 |
表面形式 |
Surface |
|
| 1 |
左语境ID |
Left context ID |
|
| 2 |
右语境ID |
Right context ID |
|
| 3 |
成本 |
Cost |
|
| 4 |
词类 |
Major POS classification |
|
| 5 |
词类1 |
Middle POS classification |
|
| 6 |
词类2 |
Small POS classification |
|
| 7 |
词类3 |
Fine POS classification |
|
| 8 |
併音 |
pinyin |
|
| 9 |
繁体字 |
traditional |
|
| 10 |
簡体字 |
simplified |
|
| 11 |
定义 |
definition |
|
CC-CEDICT user dictionary format (CSV)
CC-CEDICT user dictionary simple version
| Index |
Name (Chinese) |
Name (English) |
Notes |
| 0 |
表面形式 |
Surface |
|
| 1 |
词类 |
Major POS classification |
|
| 2 |
併音 |
pinyin |
|
CC-CEDICT user dictionary detailed version
| Index |
Name (Chinese) |
Name (English) |
Notes |
| 0 |
表面形式 |
Surface |
|
| 1 |
左语境ID |
Left context ID |
|
| 2 |
右语境ID |
Right context ID |
|
| 3 |
成本 |
Cost |
|
| 4 |
词类 |
POS |
|
| 5 |
词类1 |
POS subcategory 1 |
|
| 6 |
词类2 |
POS subcategory 2 |
|
| 7 |
词类3 |
POS subcategory 3 |
|
| 8 |
併音 |
pinyin |
|
| 9 |
繁体字 |
traditional |
|
| 10 |
簡体字 |
simplified |
|
| 11 |
定义 |
definition |
|
| 12 |
- |
- |
After 12, it can be freely expanded. |
Jieba
This repository uses mecab-jieba.
Jieba dictionary format
| Index |
Name (Chinese) |
Name (English) |
Notes |
| 0 |
表面形式 |
Surface |
|
| 1 |
左语境ID |
Left context ID |
|
| 2 |
右语境ID |
Right context ID |
|
| 3 |
成本 |
Cost |
|
| 4 |
词类 |
Part-of-speech |
|
| 5 |
併音 |
Pinyin |
|
| 6 |
繁体字 |
Traditional |
|
| 7 |
簡体字 |
Simplified |
|
| 8 |
定义 |
Definition |
|
Jieba user dictionary format (CSV)
Jieba user dictionary simple version
| Index |
Name (Chinese) |
Name (English) |
Notes |
| 0 |
表面形式 |
Surface |
|
| 1 |
词类 |
Part-of-speech |
|
| 2 |
併音 |
Pinyin |
|
Jieba user dictionary detailed version
| Index |
Name (Chinese) |
Name (English) |
Notes |
| 0 |
表面形式 |
Surface |
|
| 1 |
左语境ID |
Left context ID |
|
| 2 |
右语境ID |
Right context ID |
|
| 3 |
成本 |
Cost |
|
| 4 |
词类 |
Part-of-speech |
|
| 5 |
併音 |
Pinyin |
|
| 6 |
繁体字 |
Traditional |
|
| 7 |
簡体字 |
Simplified |
|
| 8 |
定义 |
Definition |
|
| 9 |
- |
- |
After 9, it can be freely expanded. |
API reference
The API reference is available. Please see following URL: