Expand description
zhconv-rs converts Chinese between Traditional, Simplified and regional variants, using rulesets sourced from MediaWiki/Wikipedia and OpenCC, which are merged, flattened and then precompiled into Aho-Corasick automata by daachorse for single-pass, linear-time conversions.
As with MediaWiki and OpenCC, the accuracy is generally acceptable, but remains limited. The converter optionally supports MediaWiki conversion syntax (ref: 1, 2).
§Usage
[dependencies]
# Bundle converters prebuilt from conversion tables sourced from MediaWiki (GPLv2.0+).
zhconv = { version = ... } # by default, features = ["compress", "mediawiki"].
# Bundle converters prebuilt from conversion tables sourced from OpenCC instead (Apache2.0).
zhconv = { version = ..., default-features = false, features = ["compress", "opencc"]}
# Combine conversion tables for one or more specific target variant(s) arbitrarily.
zhconv = { version = ..., default-features = false, features = ["compress", "opencc-hant", "mediawiki-hant", "opencc-hans", "mediawiki-tw"]}§Example
Convert simply:
use zhconv::{zhconv, Variant};
assert_eq!(zhconv("天干物燥 小心火烛", "zh-Hant".parse().unwrap()), "天乾物燥 小心火燭");
assert_eq!(zhconv("鼠曲草", Variant::ZhHant), "鼠麴草");
assert_eq!(zhconv("阿拉伯联合酋长国", Variant::ZhHant), "阿拉伯聯合酋長國");
assert_eq!(zhconv("阿拉伯联合酋长国", Variant::ZhTW), "阿拉伯聯合大公國");Using MediaWiki conversion syntax:
use zhconv::{zhconv_mw, Variant};
assert_eq!(zhconv_mw("天-{干}-物燥 小心火烛", "zh-Hant".parse::<Variant>().unwrap()), "天干物燥 小心火燭");
assert_eq!(zhconv_mw("-{zh-tw:鼠麴草;zh-cn:香茅}-是菊科草本植物。", Variant::ZhCN), "香茅是菊科草本植物。");
assert_eq!(zhconv_mw("菊科草本植物包括-{zh-tw:鼠麴草;zh-cn:香茅;}-等。", Variant::ZhTW), "菊科草本植物包括鼠麴草等。");And more (note that such global rules always apply globally regardless of their location, unlike in MediaWiki where they affect only the text that follows):
use zhconv::{zhconv_mw, Variant};
assert_eq!(zhconv_mw("-{H|zh:馬;zh-cn:鹿;}-馬克思主義", Variant::ZhCN), "鹿克思主义"); // add
assert_eq!(zhconv_mw("&二極體\n-{-|zh-hans:二极管; zh-hant:二極體}-\n", Variant::ZhCN), "&二极体\n\n"); // removeTo customize the converter & conversion with fine-grained control, see ZhConverterBuilder.
(De)Serialization of compiled converters is not supported yet.
Other useful function:
use zhconv::{is_hans, is_hans_confidence, infer_variant, infer_variant_confidence};
assert!(is_hans("清乾隆嘉庆间刻本"));
assert!(!is_hans("秋冬濁而春夏清,晞於朝而生於夕"));
assert!(is_hans_confidence("滴瀝明花苑,葳蕤泫竹叢") < 0.5);
println!("{}", infer_variant("錦字緘愁過薊水,寒衣將淚到遼城"));
println!("{:?}", infer_variant_confidence("zhconv-rs 中文简繁及地區詞轉換"));Re-exports§
pub use self::converters::get_builtin_converter;pub use self::tables::get_builtin_tables;pub use self::variant::Variant;
Modules§
- converters
- Built-in converters built from
tables. - pagerules
- Struct to extract global rules from wikitext.
- rule
- Structs and functions for processing conversion rule, as is defined in ConverterRule.php.
- tables
- Built-in conversion tables sourced from zhConversion.php (maintained by MediaWiki and Chinese Wikipedia) and OpenCC.
- variant
- Structs for handling variants and mapping of variants.
Structs§
- ZhConverter
- A ZhConverter, built by
ZhConverterBuilder. - ZhConverter
Builder - A builder that helps build a
ZhConverter.
Constants§
Traits§
- Truncated
Around - A helper trait that truncates a str around a specified index in constant time (
O(1)), intended to be used withis_hansand etc.
Functions§
- infer_
variant - Determine the Chinese variant of the input text.
- infer_
variant_ confidence - Determine the Chinese variant of the input text with confidence.
- is_hans
- Determine whether the given text looks like Simplified Chinese over Traditional Chinese.
- is_
hans_ confidence - Determine whether the given text looks like Simplified Chinese over Traditional Chinese.
- zhconv
- Helper function for general conversion using built-in converters.
- zhconv_
mw - Helper function for general conversion, activating MediaWiki conversion syntax support.