Expand description
zhconv-rs converts Chinese between Traditional, Simplified and regional variants, using rulesets sourced from zhConversion.php by MediaWiki and Chinese Wikipedia and OpenCC, which are merged, flattened and then precompiled into Aho-Corasick automata by daachorse for single-pass, linear-time conversions.
The non-default feature opencc enables additional OpenCC dictionaries. Unlike other
implementations, dictionaries cannot be chosen (enabled or disabled partly) at runtime since
they are merged and precompiled into separate automata for each target variant.
As with MediaWiki and OpenCC, the accuracy is generally acceptable while limited.
The converter optionally supports additional conversion rules in MediaWiki syntax (refer to conversion groups
and manual conversion rules on Chinese
Wikipedia), external rules defined line by line, and custom conversions defined by (FROM, TO)
pairs. Prebuilding converter with custom rules or dictionaries is not yet supported.
§Usage
The crate is on crates.io.
[dependencies]
zhconv = { version = "?", features = ["opencc"] } # enable additional OpenCC dictionaries§Example
Basic conversion:
use zhconv::{zhconv, Variant};
assert_eq!(zhconv("天干物燥 小心火烛", "zh-Hant".parse().unwrap()), "天乾物燥 小心火燭");
assert_eq!(zhconv("鼠曲草", Variant::ZhHant), "鼠麴草");
assert_eq!(zhconv("阿拉伯联合酋长国", Variant::ZhHant), "阿拉伯聯合酋長國");
assert_eq!(zhconv("阿拉伯联合酋长国", Variant::ZhTW), "阿拉伯聯合大公國");With MediaWiki conversion syntax:
use zhconv::{zhconv_mw, Variant};
assert_eq!(zhconv_mw("天-{干}-物燥 小心火烛", "zh-Hant".parse::<Variant>().unwrap()), "天干物燥 小心火燭");
assert_eq!(zhconv_mw("-{zh-tw:鼠麴草;zh-cn:香茅}-是菊科草本植物。", Variant::ZhCN), "香茅是菊科草本植物。");
assert_eq!(zhconv_mw("菊科草本植物包括-{zh-tw:鼠麴草;zh-cn:香茅;}-等。", Variant::ZhTW), "菊科草本植物包括鼠麴草等。");Set global rules inline (note that such rules always apply globally regardless of their location, unlike in MediaWiki where they affect only the text that follows):
use zhconv::{zhconv_mw, Variant};
assert_eq!(zhconv_mw("-{H|zh:馬;zh-cn:鹿;}-馬克思主義", Variant::ZhCN), "鹿克思主义"); // add
assert_eq!(zhconv_mw("&二極體\n-{-|zh-hans:二极管; zh-hant:二極體}-\n", Variant::ZhCN), "&二极体\n\n"); // removeTo load or add additional conversion rules such as CGroups or (FROM, TO) pairs,
see ZhConverterBuilder.
Other useful function:
use zhconv::{is_hans, is_hans_confidence, infer_variant, infer_variant_confidence};
assert!(!is_hans("秋冬濁而春夏清,晞於朝而生於夕"));
assert!(is_hans_confidence("滴瀝明花苑,葳蕤泫竹叢") < 0.5);
println!("{}", infer_variant("錦字緘愁過薊水,寒衣將淚到遼城"));
println!("{:?}", infer_variant_confidence("zhconv-rs 中文简繁及地區詞轉換"));Re-exports§
pub use self::converters::get_builtin_converter;pub use self::tables::get_builtin_tables;pub use self::variant::Variant;
Modules§
- converters
- Built-in converters built from
tables. - pagerules
- Struct to extract global rules from wikitext.
- rule
- Structs and functions for processing conversion rule, as is defined in ConverterRule.php.
- tables
- Built-in conversion tables sourced from zhConversion.php (maintained by MediaWiki and Chinese Wikipedia) and OpenCC.
- variant
- Structs for handling variants and mapping of variants.
Structs§
- ZhConverter
- A ZhConverter, built by
ZhConverterBuilder. - ZhConverter
Builder - A builder that helps build a
ZhConverter.
Traits§
- Truncated
Around - A helper trait that truncates a str around a specified index in constant time (
O(1)), intended to be used withis_hansand etc.
Functions§
- infer_
variant - Determine the Chinese variant of the input text.
- infer_
variant_ confidence - Determine the Chinese variant of the input text with confidence.
- is_hans
- Determine whether the given text looks like Simplified Chinese over Traditional Chinese.
- is_
hans_ confidence - Determine whether the given text looks like Simplified Chinese over Traditional Chinese.
- zhconv
- Helper function for general conversion using built-in converters.
- zhconv_
mw - Helper function for general conversion, activating wikitext support.