Skip to main content

Crate zhconv

Crate zhconv 

Source
Expand description

zhconv-rs converts Chinese between Traditional, Simplified and regional variants, using rulesets sourced from MediaWiki/Wikipedia and OpenCC, which are merged, flattened and then precompiled into Aho-Corasick automata by daachorse for single-pass, linear-time conversions.

As with MediaWiki and OpenCC, the accuracy is generally acceptable, but remains limited. The converter optionally supports MediaWiki conversion syntax (ref: 1, 2).

§Usage

[dependencies]
# Bundle converters prebuilt from conversion tables sourced from MediaWiki (GPLv2.0+).
zhconv = { version = ... } # by default, features = ["compress", "mediawiki"].
# Bundle converters prebuilt from conversion tables sourced from OpenCC instead (Apache2.0).
zhconv = { version = ..., default-features = false, features = ["compress", "opencc"]}
# Combine conversion tables for one or more specific target variant(s) arbitrarily.
zhconv = { version = ..., default-features = false, features = ["compress", "opencc-hant", "mediawiki-hant", "opencc-hans", "mediawiki-tw"]}

§Example

Convert simply:

use zhconv::{zhconv, Variant};
assert_eq!(zhconv("天干物燥 小心火烛", "zh-Hant".parse().unwrap()), "天乾物燥 小心火燭");
assert_eq!(zhconv("鼠曲草", Variant::ZhHant), "鼠麴草");
assert_eq!(zhconv("阿拉伯联合酋长国", Variant::ZhHant), "阿拉伯聯合酋長國");
assert_eq!(zhconv("阿拉伯联合酋长国", Variant::ZhTW), "阿拉伯聯合大公國");

Using MediaWiki conversion syntax:

use zhconv::{zhconv_mw, Variant};
assert_eq!(zhconv_mw("天-{干}-物燥 小心火烛", "zh-Hant".parse::<Variant>().unwrap()), "天干物燥 小心火燭");
assert_eq!(zhconv_mw("-{zh-tw:鼠麴草;zh-cn:香茅}-是菊科草本植物。", Variant::ZhCN), "香茅是菊科草本植物。");
assert_eq!(zhconv_mw("菊科草本植物包括-{zh-tw:鼠麴草;zh-cn:香茅;}-等。", Variant::ZhTW), "菊科草本植物包括鼠麴草等。");

And more (note that such global rules always apply globally regardless of their location, unlike in MediaWiki where they affect only the text that follows):

use zhconv::{zhconv_mw, Variant};
assert_eq!(zhconv_mw("-{H|zh:馬;zh-cn:鹿;}-馬克思主義", Variant::ZhCN), "鹿克思主义"); // add
assert_eq!(zhconv_mw("&二極體\n-{-|zh-hans:二极管; zh-hant:二極體}-\n", Variant::ZhCN), "&二极体\n\n"); // remove

To customize the converter & conversion with fine-grained control, see ZhConverterBuilder. (De)Serialization of compiled converters is not supported yet.

Other useful function:

use zhconv::{is_hans, is_hans_confidence, infer_variant, infer_variant_confidence};
assert!(is_hans("清乾隆嘉庆间刻本"));
assert!(!is_hans("秋冬濁而春夏清,晞於朝而生於夕"));
assert!(is_hans_confidence("滴瀝明花苑,葳蕤泫竹叢") < 0.5);
println!("{}", infer_variant("錦字緘愁過薊水,寒衣將淚到遼城"));
println!("{:?}", infer_variant_confidence("zhconv-rs 中文简繁及地區詞轉換"));

Re-exports§

pub use self::converters::get_builtin_converter;
pub use self::tables::get_builtin_tables;
pub use self::variant::Variant;

Modules§

converters
Built-in converters built from tables.
pagerules
Struct to extract global rules from wikitext.
rule
Structs and functions for processing conversion rule, as is defined in ConverterRule.php.
tables
Built-in conversion tables sourced from zhConversion.php (maintained by MediaWiki and Chinese Wikipedia) and OpenCC.
variant
Structs for handling variants and mapping of variants.

Structs§

ZhConverter
A ZhConverter, built by ZhConverterBuilder.
ZhConverterBuilder
A builder that helps build a ZhConverter.

Constants§

ENABLED_TARGET_VARIANTS

Traits§

TruncatedAround
A helper trait that truncates a str around a specified index in constant time (O(1)), intended to be used with is_hans and etc.

Functions§

infer_variant
Determine the Chinese variant of the input text.
infer_variant_confidence
Determine the Chinese variant of the input text with confidence.
is_hans
Determine whether the given text looks like Simplified Chinese over Traditional Chinese.
is_hans_confidence
Determine whether the given text looks like Simplified Chinese over Traditional Chinese.
zhconv
Helper function for general conversion using built-in converters.
zhconv_mw
Helper function for general conversion, activating MediaWiki conversion syntax support.