Crate opencc_jieba_rs

Expand description

§opencc-jieba-rs

opencc-jieba-rs is a high-performance Rust library for Chinese text conversion, segmentation, and keyword extraction. It integrates Jieba for word segmentation and a multi-stage OpenCC-style dictionary system for converting between different Chinese variants.

§Features

Simplified ↔ Traditional Chinese conversion (including Taiwan, Hong Kong, Japanese variants)
Multi-pass dictionary-based phrase replacement
Fast and accurate word segmentation using Jieba
Keyword extraction using TF-IDF or TextRank
Optional punctuation conversion (e.g., 「」 ↔ “”)

§Example

use opencc_jieba_rs::OpenCC;

let opencc = OpenCC::new();
let s = opencc.s2t("“春眠不觉晓，处处闻啼鸟。”", true);
println!("{}", s); // -> "「春眠不覺曉，處處聞啼鳥。」"

§Use Cases

Text normalization for NLP and search engines
Cross-regional Chinese content adaptation
Automatic subtitle or document localization

§Crate Status

🚀 Fast and parallelized
🧪 Battle-tested on multi-million character corpora
📦 Ready for crates.io and docs.rs publication

Modules§

dictionary_lib

Structs§

OpenCC: The main struct for performing Chinese text conversion and segmentation.

Functions§

find_max_utf8_length: Returns the maximum valid UTF-8 byte length for a string slice, ensuring no partial characters.

Crate opencc_jieba_rs

Crate opencc_jieba_rs Copy item path

§opencc-jieba-rs

§Features

§Example

§Use Cases

§Crate Status

Modules§

Structs§

Functions§

Crate opencc_jieba_rs