Expand description
§opencc-jieba-rs
opencc-jieba-rs
is a high-performance Rust library for Chinese text conversion,
segmentation, and keyword extraction. It integrates Jieba for word segmentation
and a multi-stage OpenCC-style dictionary system for converting between different Chinese variants.
§Features
- Simplified ↔ Traditional Chinese conversion (including Taiwan, Hong Kong, Japanese variants)
- Multi-pass dictionary-based phrase replacement
- Fast and accurate word segmentation using Jieba
- Keyword extraction using TF-IDF or TextRank
- Optional punctuation conversion (e.g., 「」 ↔ “”)
§Example
use opencc_jieba_rs::OpenCC;
let opencc = OpenCC::new();
let s = opencc.s2t("“春眠不觉晓,处处闻啼鸟。”", true);
println!("{}", s); // -> "「春眠不覺曉,處處聞啼鳥。」"
§Use Cases
- Text normalization for NLP and search engines
- Cross-regional Chinese content adaptation
- Automatic subtitle or document localization
§Crate Status
- 🚀 Fast and parallelized
- 🧪 Battle-tested on multi-million character corpora
- 📦 Ready for crates.io and docs.rs publication
Modules§
Structs§
- OpenCC
- The main struct for performing Chinese text conversion and segmentation.
Functions§
- find_
max_ utf8_ length - Returns the maximum valid UTF-8 byte length for a string slice, ensuring no partial characters.