Expand description
Rust bindings for Kiwi C API.
This crate provides a high-level API that is convenient for day-to-day use, while still exposing lower-level handles for advanced scenarios.
§Quick Start
use kiwi_rs::Kiwi;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let kiwi = Kiwi::init()?;
let tokens = kiwi.tokenize("아버지가방에들어가신다.")?;
for token in tokens {
println!("{}/{}", token.form, token.tag);
}
Ok(())
}§Initialization Paths
kiwi-rs supports two common initialization modes:
- Automatic bootstrap via
Kiwi::init
- Uses local paths first.
- If unavailable, downloads a matching Kiwi library/model pair into cache.
- Explicit setup via
Kiwi::from_config
- For controlled deployments with fixed library/model paths.
use kiwi_rs::{Kiwi, KiwiConfig};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = KiwiConfig::default()
.with_library_path("/path/to/libkiwi.dylib")
.with_model_path("/path/to/models/cong/base");
let kiwi = Kiwi::from_config(config)?;
let _tokens = kiwi.tokenize("형태소 분석 예시")?;
Ok(())
}§Offset And Unit Rules
- For UTF-8 APIs, offsets are character indices (based on
str.chars()), not byte indices. - UTF-16 APIs accept
&[u16], but returned text in this crate is converted back to Rust UTF-8String.
§Environment Variables
KIWI_LIBRARY_PATH: explicit dynamic library path.KIWI_MODEL_PATH: explicit model directory path.KIWI_RS_VERSION: version used byKiwi::initbootstrap (latestby default).KIWI_RS_CACHE_DIR: cache root used byKiwi::initbootstrap.
Structs§
- Analysis
Candidate - One analysis candidate, including probability and token list.
- Analyze
Options - Options for
analyze*andtokenize*APIs. - Builder
Config - Builder-time configuration for constructing a
crate::Kiwiinstance. - Extracted
Word - Candidate extracted word from
extract_words*builder APIs. - Global
Config - Global runtime parameters for Kiwi inference behavior.
- Kiwi
- High-level Kiwi analyzer instance.
- Kiwi
Builder - Builder used to configure dictionaries/rules and then construct
Kiwi. - Kiwi
Config - Top-level configuration used by
crate::Kiwi::from_config. - Kiwi
Library - Handle to a loaded Kiwi dynamic library plus resolved function table.
- Kiwi
Typo - Typo model/preset handle used when building
Kiwi. - Morpheme
Info - Morpheme metadata from dictionary lookup APIs.
- Morpheme
Sense - Morpheme information with resolved string fields.
- Morpheme
Set - Morpheme id set used as a blocklist in analysis/tokenization APIs.
- PreAnalyzed
Token - Pre-analyzed token element passed to
crate::KiwiBuilder::add_pre_analyzed_word. - Prepared
Join Morphs - Reusable join input for high-throughput
joincalls. - Prepared
Joiner - Reusable joiner handle bound to a specific morph sequence.
- Pretokenized
- Container for user-supplied token spans used during analysis overrides.
- Sentence
- Sentence split result used by
split_into_sents*_with_options. - Sentence
Boundary - Begin/end boundary for a sentence in character offsets.
- Similarity
Pair (id, score)pair returned by similarity and prediction APIs.- SwTokenizer
- Subword tokenizer model handle opened from Kiwi-compatible tokenizer files.
- Token
- A single morpheme token produced by Kiwi analysis.
- Token
Info - Low-level token metadata returned by Kiwi C API.
- User
Word - A user dictionary entry consumed by
crate::KiwiBuilder::add_user_words.
Enums§
- Kiwi
Error - Error type returned by kiwi-rs public APIs.
Constants§
- KIWI_
BUILD_ DEFAULT - Default build option mask.
- KIWI_
BUILD_ DEFAULT_ WITH_ CONG - Default build options with CoNg model.
- KIWI_
BUILD_ INTEGRATE_ ALLOMORPH - Build option: integrate allomorph variants.
- KIWI_
BUILD_ LOAD_ DEFAULT_ DICT - Build option: load bundled default dictionary.
- KIWI_
BUILD_ LOAD_ MULTI_ DICT - Build option: load multi-word dictionary.
- KIWI_
BUILD_ LOAD_ TYPO_ DICT - Build option: load typo dictionary.
- KIWI_
BUILD_ MODEL_ TYPE_ CONG - Build option: CoNg model type.
- KIWI_
BUILD_ MODEL_ TYPE_ CONG_ GLOBAL - Build option: global CoNg model type.
- KIWI_
BUILD_ MODEL_ TYPE_ DEFAULT - Build option: default model type.
- KIWI_
BUILD_ MODEL_ TYPE_ KNLM - Build option: KNLM model type.
- KIWI_
BUILD_ MODEL_ TYPE_ LARGEST - Build option: largest model type.
- KIWI_
BUILD_ MODEL_ TYPE_ SBG - Build option: SBG model type.
- KIWI_
DIALECT_ ALL - Dialect mask containing all supported dialect flags.
- KIWI_
DIALECT_ ARCHAIC - Dialect flag: archaic expressions.
- KIWI_
DIALECT_ CHUNGCHEONG - Dialect flag: Chungcheong.
- KIWI_
DIALECT_ GANGWON - Dialect flag: Gangwon.
- KIWI_
DIALECT_ GYEONGGI - Dialect flag: Gyeonggi.
- KIWI_
DIALECT_ GYEONGSANG - Dialect flag: Gyeongsang.
- KIWI_
DIALECT_ HAMGYEONG - Dialect flag: Hamgyeong.
- KIWI_
DIALECT_ HWANGHAE - Dialect flag: Hwanghae.
- KIWI_
DIALECT_ JEJU - Dialect flag: Jeju.
- KIWI_
DIALECT_ JEOLLA - Dialect flag: Jeolla.
- KIWI_
DIALECT_ PYEONGAN - Dialect flag: Pyeongan.
- KIWI_
DIALECT_ STANDARD - Dialect mask: standard language only.
- KIWI_
MATCH_ ALL - Common default match options.
- KIWI_
MATCH_ ALL_ WITH_ NORMALIZING KIWI_MATCH_ALLwith coda normalization.- KIWI_
MATCH_ COMPATIBLE_ JAMO - Match option: emit compatible jamo.
- KIWI_
MATCH_ EMAIL - Match option: email detection.
- KIWI_
MATCH_ HASHTAG - Match option: hashtag detection.
- KIWI_
MATCH_ JOIN_ ADJ_ SUFFIX - Match option: join adjective suffixes.
- KIWI_
MATCH_ JOIN_ ADV_ SUFFIX - Match option: join adverb suffixes.
- KIWI_
MATCH_ JOIN_ AFFIX - Match option convenience mask for all affix-join flags.
- KIWI_
MATCH_ JOIN_ NOUN_ PREFIX - Match option: join noun prefixes.
- KIWI_
MATCH_ JOIN_ NOUN_ SUFFIX - Match option: join noun suffixes.
- KIWI_
MATCH_ JOIN_ VERB_ SUFFIX - Match option: join verb suffixes.
- KIWI_
MATCH_ JOIN_ V_ SUFFIX - Match option convenience mask for verb/adjective suffix joins.
- KIWI_
MATCH_ MENTION - Match option: mention detection.
- KIWI_
MATCH_ MERGE_ SAISIOT - Match option: merge saisiot.
- KIWI_
MATCH_ NORMALIZE_ CODA - Match option: normalize coda.
- KIWI_
MATCH_ SERIAL - Match option: serial number detection.
- KIWI_
MATCH_ SPLIT_ COMPLEX - Match option: split complex morphemes.
- KIWI_
MATCH_ SPLIT_ SAISIOT - Match option: split saisiot.
- KIWI_
MATCH_ URL - Match option: URL detection.
- KIWI_
MATCH_ Z_ CODA - Match option: z-coda handling.
- KIWI_
NUM_ THREADS - Option key for setting worker threads through
set_option. - KIWI_
TYPO_ BASIC_ TYPO_ SET - Typo preset: basic typo set.
- KIWI_
TYPO_ BASIC_ TYPO_ SET_ WITH_ CONTINUAL - Typo preset: basic + continual typo sets.
- KIWI_
TYPO_ BASIC_ TYPO_ SET_ WITH_ CONTINUAL_ AND_ LENGTHENING - Typo preset: basic + continual + lengthening typo sets.
- KIWI_
TYPO_ CONTINUAL_ TYPO_ SET - Typo preset: continual typo set.
- KIWI_
TYPO_ LENGTHENING_ TYPO_ SET - Typo preset: lengthening typo set.
- KIWI_
TYPO_ WITHOUT_ TYPO - Typo preset: disable typo correction.