Expand description
Fast Chinese text conversion with OpenCC dictionaries and forward maximum matching segmentation.
opencc-fmmseg converts between Simplified Chinese, Traditional Chinese,
Taiwan, Hong Kong, and Japanese kanji variants using bundled OpenCC-style
dictionaries. The default constructor loads a compressed dictionary embedded
in the crate, so normal use does not require runtime dictionary files.
§Quick Start
use opencc_fmmseg::{OpenCC, OpenccConfig};
let converter = OpenCC::new();
let traditional = converter.convert_with_config(
"汉字转换测试",
OpenccConfig::S2t,
false,
);
assert_eq!(traditional, "漢字轉換測試");§Choosing an API
OpenCCis the main converter type.OpenccConfigis the recommended Rust configuration API.OpenCC::convertaccepts OpenCC-style strings such as"s2t"and is useful for CLI/config-file compatibility.DictionaryMaxlengthandCustomDictSpecare for advanced users who need custom dictionaries or externally generated dictionary artifacts.
§Supported Configurations
| Config | Method | Meaning |
|---|---|---|
s2t | OpenCC::s2t | Simplified to Traditional |
t2s | OpenCC::t2s | Traditional to Simplified |
s2tw / s2twp | OpenCC::s2tw / OpenCC::s2twp | Simplified to Taiwan Traditional |
tw2s / tw2sp | OpenCC::tw2s / OpenCC::tw2sp | Taiwan Traditional to Simplified |
s2hk / t2hk | OpenCC::s2hk / OpenCC::t2hk | To Hong Kong Traditional variants |
hk2s / hk2t | OpenCC::hk2s / OpenCC::hk2t | Hong Kong variants to Simplified/Traditional |
t2tw / t2twp | OpenCC::t2tw / OpenCC::t2twp | Traditional to Taiwan variants |
tw2t / tw2tp | OpenCC::tw2t / OpenCC::tw2tp | Taiwan variants to Traditional |
t2jp / jp2t | OpenCC::t2jp / OpenCC::jp2t | Traditional and Japanese kanji variants |
§Custom Dictionaries
use opencc_fmmseg::{
CustomDictMode, CustomDictSpec, DictSlot, DictionaryMaxlength, OpenCC,
};
let dictionary = DictionaryMaxlength::from_zstd()?
.with_custom_dicts(&[CustomDictSpec {
slot: DictSlot::STPhrases,
pairs: vec![("帕兰蒂尔".to_string(), "柏蘭蒂爾".to_string())],
mode: CustomDictMode::Append,
}])?;
let converter = OpenCC::from_dictionary(dictionary);
assert_eq!(
converter.convert("帕兰蒂尔", "s2t", false),
"柏蘭蒂爾"
);
§Error Reporting
Most high-level conversion methods return a String for compatibility with
the C and scripting-language bindings. Non-fatal setup or configuration
errors are recorded in OpenCC::get_last_error. Dictionary construction
APIs return Result with DictionaryError.
Re-exports§
pub use crate::dictionary_lib::CustomDictFileSpec;pub use crate::dictionary_lib::CustomDictMode;pub use crate::dictionary_lib::CustomDictSpec;pub use crate::dictionary_lib::DictSlot;pub use crate::dictionary_lib::DictionaryError;pub use crate::dictionary_lib::DictionaryMaxlength;
Modules§
- dictionary_
lib - Dictionary utilities for managing multiple OpenCC lexicons.
Internal dictionary-processing utilities for
opencc-fmmseg.
Macros§
- debug_
note - Print a developer note to stderr in debug builds; no-op in release.
Structs§
- Delimiter
Set - Compact, hot-path-friendly delimiter set optimized for per-character membership tests.
- Dict
Refs - Holds up to three conversion rounds. Each round carries its own
dictionaries,
max_len, and prebuiltStarterUnion. - OpenCC
- Central interface for performing OpenCC-based conversion with segmentation.
Enums§
- Opencc
Config - OpenCC conversion configuration (strongly-typed).
Functions§
- find_
max_ utf8_ len_ bytes - Finds a safe UTF-8 boundary within a raw byte slice, limited by a maximum byte count.
- find_
max_ utf8_ length - Finds a valid UTF-8 boundary within the given string, limited by a maximum byte count.
- for_
each_ len_ dec - Iterates viable phrase lengths in descending order using a starter bitmask,
stopping early if the callback returns
true. - is_
delimiter - Checks whether a character is treated as a segmentation delimiter.