kiwi-rs
한국어 README | kiwipiepy parity (EN) | kiwipiepy parity (KO)
Rust bindings for Kiwi via the official C API (include/kiwi/capi.h).
Current support status
As of February 16, 2026:
- C API symbol loading: complete (
101/101symbols incapi.hare loaded) - Core high-level usage: implemented (
init/new/from_config,analyze/tokenize/split/join,MorphemeSet,Pretokenized, typo APIs,SwTokenizer, CoNg APIs) - kiwipiepy full surface parity: partial (Python/C++-specific layers still missing)
Installation
[]
= "0.1"
Runtime setup options
Option 1: automatic bootstrap in code
Kiwi::init() tries local paths first, then downloads a matching release pair (library + model) into cache.
use Kiwi;
Environment variables used by bootstrap:
KIWI_RS_VERSION(default:latest, e.g.v0.22.2)KIWI_RS_CACHE_DIR(default: OS cache directory)
External commands required by bootstrap:
- Common:
curl,tar - Windows zip extraction:
powershell(Expand-Archive)
Option 2: helper installer scripts
Linux/macOS:
Windows (PowerShell):
cd kiwi-rs
powershell -NoProfile -ExecutionPolicy Bypass -File .\scripts\install_kiwi.ps1
Installer options:
KIWI_VERSION/-Version(default:latest)KIWI_PREFIX/-Prefix(default:$HOME/.local/kiwion Unix,%LOCALAPPDATA%\\kiwion Windows)KIWI_MODEL_VARIANT/-ModelVariant(default:base)
Manual path configuration
Env-based (Kiwi::new)
KIWI_LIBRARY_PATH: dynamic library pathKIWI_MODEL_PATH: model directory path
Config-based (Kiwi::from_config)
use ;
API overview
Core
- Initialization:
Kiwi::init,Kiwi::new,Kiwi::from_config,Kiwi::init_direct - Analyze/tokenize:
analyze*,tokenize*,analyze_many*,tokenize_many* - Sentence split:
split_into_sents*,split_into_sents_with_options* - Join/spacing:
join*,space*,glue*
Advanced
- Builder: user words, alias words, pre-analyzed words, dictionary loading, regex rules, extract APIs
- Constraints:
MorphemeSet,Pretokenized - Typo:
KiwiTypo, default typo sets, cost controls - Subword:
SwTokenizer - CoNg: similarity/context/prediction/context-id conversion
UTF-16 and optional API checks
Kiwi::supports_utf16_apiKiwi::supports_analyze_mwKiwiLibrary::supports_builder_init_stream
Examples
What each example is for:
| Example | What you learn | Key APIs | Notes |
|---|---|---|---|
basic |
End-to-end quick start (init + tokenize) | Kiwi::init, Kiwi::tokenize |
Demonstrates cache bootstrap behavior when assets are missing. |
analyze_options |
How candidate analysis options change output | AnalyzeOptions, Kiwi::analyze_with_options |
Shows top_n, match_options, and candidate probabilities. |
builder_custom_words |
Building a custom analyzer with user lexicon/rules | KiwiLibrary::builder, add_user_words, add_re_rule |
Uses builder-time customization APIs. |
typo_build |
Enabling typo-aware analysis | default_typo_set, build_with_typo_and_default_options |
Prints typo-related token metadata. |
blocklist_and_pretokenized |
Blocking specific morphemes and forcing token spans | new_morphset, new_pretokenized, tokenize_with_blocklist_and_pretokenized |
Useful for domain constraints and deterministic spans. |
split_sentences |
Sentence segmentation with per-sentence token/sub-sentence structures | split_into_sents_with_options |
Shows the Sentence return surface (text/start/end/tokens/subs). |
utf16_api |
UTF-16 analysis/tokenization/sentence split path | supports_utf16_api, analyze_utf16*, tokenize_utf16*, split_into_sents_utf16* |
Includes runtime feature check for UTF-16 support. |
native_batch |
Native callback-based batch analysis route | analyze_many_via_native, analyze_many_utf16_via_native |
Useful for higher-throughput multi-line processing. |
sw_tokenizer |
Subword tokenizer encode/decode flow | open_sw_tokenizer, encode_with_offsets, decode |
Requires tokenizer.json path argument. |
morpheme_semantics |
Morpheme ID lookup and CoNg semantic utilities | find_morphemes, morpheme, most_similar_morphemes, to_context_id |
Shows semantic APIs that operate on morpheme/context IDs. |
kiwipiepy parity
Detailed matrix:
- English:
docs/kiwipiepy_parity.md - Korean:
docs/kiwipiepy_parity.ko.md
In short, kiwi-rs already covers most C API-backed workflows, while Python/C++-specific layers (template/dataset/ngram utilities) remain out of scope for a pure C API binding.
Common errors
-
failed to load library- Library path is invalid or inaccessible. Set
KIWI_LIBRARY_PATHexplicitly or useKiwi::init().
- Library path is invalid or inaccessible. Set
-
Cannot open extract.mdl for WordDetector- Model path is wrong. Point
KIWI_MODEL_PATH(or config model path) to the directory containing model files.
- Model path is wrong. Point
-
reading type 'Ds' failed(iostream-style errors)- Library/model version mismatch. Use matching assets from the same Kiwi release tag.
Local quality checks
License
Same as Kiwi: LGPL v3.