Skip to main content

Crate opencc_fmmseg

Crate opencc_fmmseg 

Source
Expand description

Fast Chinese text conversion with OpenCC dictionaries and forward maximum matching segmentation.

opencc-fmmseg converts between Simplified Chinese, Traditional Chinese, Taiwan, Hong Kong, and Japanese kanji variants using bundled OpenCC-style dictionaries. The default constructor loads a compressed dictionary embedded in the crate, so normal use does not require runtime dictionary files.

§Quick Start

use opencc_fmmseg::{OpenCC, OpenccConfig};

let converter = OpenCC::new();

let traditional = converter.convert_with_config(
    "汉字转换测试",
    OpenccConfig::S2t,
    false,
);

assert_eq!(traditional, "漢字轉換測試");

§Choosing an API

  • OpenCC is the main converter type.
  • OpenccConfig is the recommended Rust configuration API.
  • OpenCC::convert accepts OpenCC-style strings such as "s2t" and is useful for CLI/config-file compatibility.
  • DictionaryMaxlength and CustomDictSpec are for advanced users who need custom dictionaries or externally generated dictionary artifacts.

§Supported Configurations

ConfigMethodMeaning
s2tOpenCC::s2tSimplified to Traditional
t2sOpenCC::t2sTraditional to Simplified
s2tw / s2twpOpenCC::s2tw / OpenCC::s2twpSimplified to Taiwan Traditional
tw2s / tw2spOpenCC::tw2s / OpenCC::tw2spTaiwan Traditional to Simplified
s2hk / t2hkOpenCC::s2hk / OpenCC::t2hkTo Hong Kong Traditional variants
hk2s / hk2tOpenCC::hk2s / OpenCC::hk2tHong Kong variants to Simplified/Traditional
t2tw / t2twpOpenCC::t2tw / OpenCC::t2twpTraditional to Taiwan variants
tw2t / tw2tpOpenCC::tw2t / OpenCC::tw2tpTaiwan variants to Traditional
t2jp / jp2tOpenCC::t2jp / OpenCC::jp2tTraditional and Japanese kanji variants

§Custom Dictionaries

use opencc_fmmseg::{
    CustomDictMode, CustomDictSpec, DictSlot, DictionaryMaxlength, OpenCC,
};

let dictionary = DictionaryMaxlength::from_zstd()?
    .with_custom_dicts(&[CustomDictSpec {
        slot: DictSlot::STPhrases,
        pairs: vec![("帕兰蒂尔".to_string(), "柏蘭蒂爾".to_string())],
        mode: CustomDictMode::Append,
    }])?;

let converter = OpenCC::from_dictionary(dictionary);
assert_eq!(
    converter.convert("帕兰蒂尔", "s2t", false),
    "柏蘭蒂爾"
);

§Error Reporting

Most high-level conversion methods return a String for compatibility with the C and scripting-language bindings. Non-fatal setup or configuration errors are recorded in OpenCC::get_last_error. Dictionary construction APIs return Result with DictionaryError.

Re-exports§

pub use crate::dictionary_lib::CustomDictFileSpec;
pub use crate::dictionary_lib::CustomDictMode;
pub use crate::dictionary_lib::CustomDictSpec;
pub use crate::dictionary_lib::DictSlot;
pub use crate::dictionary_lib::DictionaryError;
pub use crate::dictionary_lib::DictionaryMaxlength;

Modules§

dictionary_lib
Dictionary utilities for managing multiple OpenCC lexicons. Internal dictionary-processing utilities for opencc-fmmseg.

Macros§

debug_note
Print a developer note to stderr in debug builds; no-op in release.

Structs§

DelimiterSet
Compact, hot-path-friendly delimiter set optimized for per-character membership tests.
DictRefs
Holds up to three conversion rounds. Each round carries its own dictionaries, max_len, and prebuilt StarterUnion.
OpenCC
Central interface for performing OpenCC-based conversion with segmentation.

Enums§

OpenccConfig
OpenCC conversion configuration (strongly-typed).

Functions§

find_max_utf8_len_bytes
Finds a safe UTF-8 boundary within a raw byte slice, limited by a maximum byte count.
find_max_utf8_length
Finds a valid UTF-8 boundary within the given string, limited by a maximum byte count.
for_each_len_dec
Iterates viable phrase lengths in descending order using a starter bitmask, stopping early if the callback returns true.
is_delimiter
Checks whether a character is treated as a segmentation delimiter.