Skip to main content

Crate kiwi_rs

Crate kiwi_rs 

Source
Expand description

Rust bindings for Kiwi C API.

This crate provides a high-level API that is convenient for day-to-day use, while still exposing lower-level handles for advanced scenarios.

§Quick Start

use kiwi_rs::Kiwi;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let kiwi = Kiwi::init()?;
    let tokens = kiwi.tokenize("아버지가방에들어가신다.")?;
    for token in tokens {
        println!("{}/{}", token.form, token.tag);
    }
    Ok(())
}

§Initialization Paths

kiwi-rs supports two common initialization modes:

  1. Automatic bootstrap via Kiwi::init
  • Uses local paths first.
  • If unavailable, downloads a matching Kiwi library/model pair into cache.
  1. Explicit setup via Kiwi::from_config
  • For controlled deployments with fixed library/model paths.
use kiwi_rs::{Kiwi, KiwiConfig};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = KiwiConfig::default()
        .with_library_path("/path/to/libkiwi.dylib")
        .with_model_path("/path/to/models/cong/base");
    let kiwi = Kiwi::from_config(config)?;
    let _tokens = kiwi.tokenize("형태소 분석 예시")?;
    Ok(())
}

§Offset And Unit Rules

  • For UTF-8 APIs, offsets are character indices (based on str.chars()), not byte indices.
  • UTF-16 APIs accept &[u16], but returned text in this crate is converted back to Rust UTF-8 String.

§Environment Variables

  • KIWI_LIBRARY_PATH: explicit dynamic library path.
  • KIWI_MODEL_PATH: explicit model directory path.
  • KIWI_RS_VERSION: version used by Kiwi::init bootstrap (latest by default).
  • KIWI_RS_CACHE_DIR: cache root used by Kiwi::init bootstrap.

Structs§

AnalysisCandidate
One analysis candidate, including probability and token list.
AnalyzeOptions
Options for analyze* and tokenize* APIs.
BuilderConfig
Builder-time configuration for constructing a crate::Kiwi instance.
ExtractedWord
Candidate extracted word from extract_words* builder APIs.
GlobalConfig
Global runtime parameters for Kiwi inference behavior.
Kiwi
High-level Kiwi analyzer instance.
KiwiBuilder
Builder used to configure dictionaries/rules and then construct Kiwi.
KiwiConfig
Top-level configuration used by crate::Kiwi::from_config.
KiwiLibrary
Handle to a loaded Kiwi dynamic library plus resolved function table.
KiwiTypo
Typo model/preset handle used when building Kiwi.
MorphemeInfo
Morpheme metadata from dictionary lookup APIs.
MorphemeSense
Morpheme information with resolved string fields.
MorphemeSet
Morpheme id set used as a blocklist in analysis/tokenization APIs.
PreAnalyzedToken
Pre-analyzed token element passed to crate::KiwiBuilder::add_pre_analyzed_word.
PreparedJoinMorphs
Reusable join input for high-throughput join calls.
PreparedJoiner
Reusable joiner handle bound to a specific morph sequence.
Pretokenized
Container for user-supplied token spans used during analysis overrides.
Sentence
Sentence split result used by split_into_sents*_with_options.
SentenceBoundary
Begin/end boundary for a sentence in character offsets.
SimilarityPair
(id, score) pair returned by similarity and prediction APIs.
SwTokenizer
Subword tokenizer model handle opened from Kiwi-compatible tokenizer files.
Token
A single morpheme token produced by Kiwi analysis.
TokenInfo
Low-level token metadata returned by Kiwi C API.
UserWord
A user dictionary entry consumed by crate::KiwiBuilder::add_user_words.

Enums§

KiwiError
Error type returned by kiwi-rs public APIs.

Constants§

KIWI_BUILD_DEFAULT
Default build option mask.
KIWI_BUILD_DEFAULT_WITH_CONG
Default build options with CoNg model.
KIWI_BUILD_INTEGRATE_ALLOMORPH
Build option: integrate allomorph variants.
KIWI_BUILD_LOAD_DEFAULT_DICT
Build option: load bundled default dictionary.
KIWI_BUILD_LOAD_MULTI_DICT
Build option: load multi-word dictionary.
KIWI_BUILD_LOAD_TYPO_DICT
Build option: load typo dictionary.
KIWI_BUILD_MODEL_TYPE_CONG
Build option: CoNg model type.
KIWI_BUILD_MODEL_TYPE_CONG_GLOBAL
Build option: global CoNg model type.
KIWI_BUILD_MODEL_TYPE_DEFAULT
Build option: default model type.
KIWI_BUILD_MODEL_TYPE_KNLM
Build option: KNLM model type.
KIWI_BUILD_MODEL_TYPE_LARGEST
Build option: largest model type.
KIWI_BUILD_MODEL_TYPE_SBG
Build option: SBG model type.
KIWI_DIALECT_ALL
Dialect mask containing all supported dialect flags.
KIWI_DIALECT_ARCHAIC
Dialect flag: archaic expressions.
KIWI_DIALECT_CHUNGCHEONG
Dialect flag: Chungcheong.
KIWI_DIALECT_GANGWON
Dialect flag: Gangwon.
KIWI_DIALECT_GYEONGGI
Dialect flag: Gyeonggi.
KIWI_DIALECT_GYEONGSANG
Dialect flag: Gyeongsang.
KIWI_DIALECT_HAMGYEONG
Dialect flag: Hamgyeong.
KIWI_DIALECT_HWANGHAE
Dialect flag: Hwanghae.
KIWI_DIALECT_JEJU
Dialect flag: Jeju.
KIWI_DIALECT_JEOLLA
Dialect flag: Jeolla.
KIWI_DIALECT_PYEONGAN
Dialect flag: Pyeongan.
KIWI_DIALECT_STANDARD
Dialect mask: standard language only.
KIWI_MATCH_ALL
Common default match options.
KIWI_MATCH_ALL_WITH_NORMALIZING
KIWI_MATCH_ALL with coda normalization.
KIWI_MATCH_COMPATIBLE_JAMO
Match option: emit compatible jamo.
KIWI_MATCH_EMAIL
Match option: email detection.
KIWI_MATCH_HASHTAG
Match option: hashtag detection.
KIWI_MATCH_JOIN_ADJ_SUFFIX
Match option: join adjective suffixes.
KIWI_MATCH_JOIN_ADV_SUFFIX
Match option: join adverb suffixes.
KIWI_MATCH_JOIN_AFFIX
Match option convenience mask for all affix-join flags.
KIWI_MATCH_JOIN_NOUN_PREFIX
Match option: join noun prefixes.
KIWI_MATCH_JOIN_NOUN_SUFFIX
Match option: join noun suffixes.
KIWI_MATCH_JOIN_VERB_SUFFIX
Match option: join verb suffixes.
KIWI_MATCH_JOIN_V_SUFFIX
Match option convenience mask for verb/adjective suffix joins.
KIWI_MATCH_MENTION
Match option: mention detection.
KIWI_MATCH_MERGE_SAISIOT
Match option: merge saisiot.
KIWI_MATCH_NORMALIZE_CODA
Match option: normalize coda.
KIWI_MATCH_SERIAL
Match option: serial number detection.
KIWI_MATCH_SPLIT_COMPLEX
Match option: split complex morphemes.
KIWI_MATCH_SPLIT_SAISIOT
Match option: split saisiot.
KIWI_MATCH_URL
Match option: URL detection.
KIWI_MATCH_Z_CODA
Match option: z-coda handling.
KIWI_NUM_THREADS
Option key for setting worker threads through set_option.
KIWI_TYPO_BASIC_TYPO_SET
Typo preset: basic typo set.
KIWI_TYPO_BASIC_TYPO_SET_WITH_CONTINUAL
Typo preset: basic + continual typo sets.
KIWI_TYPO_BASIC_TYPO_SET_WITH_CONTINUAL_AND_LENGTHENING
Typo preset: basic + continual + lengthening typo sets.
KIWI_TYPO_CONTINUAL_TYPO_SET
Typo preset: continual typo set.
KIWI_TYPO_LENGTHENING_TYPO_SET
Typo preset: lengthening typo set.
KIWI_TYPO_WITHOUT_TYPO
Typo preset: disable typo correction.

Type Aliases§

Analysis
Alias kept for readability in user code.
Result
Convenience alias used throughout the crate.