Struct zhconv::ZhConverter

source ·
pub struct ZhConverter { /* private fields */ }
Expand description

A ZhConverter, built by ZhConverterBuilder.

Implementations§

source§

impl ZhConverter

source

pub fn new( automaton: CharwiseDoubleArrayAhoCorasick<u32>, target_words: Vec<String> ) -> ZhConverter

Create a new converter from a automaton and a mapping.

It is provided for convenience and not expected to be called directly. ZhConverterBuilder would take care of these details.

source

pub fn with_target_variant( automaton: CharwiseDoubleArrayAhoCorasick<u32>, target_words: Vec<String>, variant: Variant ) -> ZhConverter

Create a new converter from a automaton and a mapping, as well as specifying a target variant to be used by convert_as_wikitext_basic and convert_as_wikitext_extended and related functions.

It is provided for convenience and not expected to be called directly. ZhConverterBuilder would take care of these details.

source

pub fn from_pairs( pairs: impl IntoIterator<Item = (impl Into<String>, impl Into<String>)> ) -> ZhConverter

Create a new converter of a sequence of (from, to) pairs.

It use ZhConverterBuilder internally.

source

pub fn from_pairs_with_target_variant( variant: Variant, pairs: impl IntoIterator<Item = (impl Into<String>, impl Into<String>)> ) -> ZhConverter

Create a new converter of a sequence of (from, to) pairs.

It takes a target variant to be used by convert_as_wikitext_basic and convert_as_wikitext_extended and related functions, in addition to from_pairs.

It use ZhConverterBuilder internally.

source

pub fn convert(&self, text: &str) -> String

Convert a text.

source

pub fn convert_to(&self, text: &str, output: &mut String)

Same as convert, except that it takes a &mut String as dest instead of returning a String.

source

pub fn convert_with_secondary_converter( &self, text: &str, secondary_converter: &ZhConverter ) -> String

Convert a text, a long with a secondary converter.

Conversion rules in the secondary converter shadow these existing ones in the original converter. For example, if the original converter contains a rule 香菜 -> 芫荽, and the the secondary converter contains a rule 香菜 -> 鹽須, the latter would take effect and 香菜 is converted to 鹽須.

The implementation match the text against the two converter alternatively, resulting in degraded performance. It would be better to build a new converter that combines the rulesets of both the two, especially when the secondary rulsets are non-trivial or the input text is large.

The worst-case time complexity of the implementation is O(n*m) where n and m are the length of the text and the maximum lengths of sources words in conversion rulesets (i.e. brute-force).

source

pub fn convert_to_with_secondary_converter( &self, text: &str, output: &mut String, secondary_converter: &ZhConverter )

Same as convert_to_with_secondary_converter, except that it takes a &mut String as dest instead of returning a String.

source

pub fn convert_as_wikitext_basic(&self, text: &str) -> String

Convert the given text, parsing and applying adhoc Mediawiki conversion rules in it.

Basic MediaWiki conversion rules like -{FOOBAR}- or -{zh-hant:FOO;zh-hans:BAR}- are supported.

Unlike convert_to_as_wikitext_extended, rules with additional flags like {H|zh-hant:FOO;zh-hans:BAR} that sets global rules are simply ignored. And, it does not try to skip HTML code blocks like <code></code> and <script></script>.

source

pub fn convert_as_wikitext_extended(&self, text: &str) -> String

Convert the given text, parsing and applying adhoc and global MediaWiki conversion rules in it.

Unlike convert_to_as_wikitext_basic, all flags documented in Help:高级字词转换语法 are supported. And it tries to skip HTML code blocks such as <code></code> and <script></script>.

Limitations

The internal implementation are intendedly replicating the behavior of LanguageConverter.php in MediaWiki. But it is not fully compliant with MediaWiki and providing NO PROTECTION over XSS attacks.

Compared to the plain convert, this is known to be MUCH SLOWER due to the inevitable nature of the implementation decision made by MediaWiki.

source

pub fn convert_to_as_wikitext_basic(&self, text: &str, output: &mut String)

Same as convert_to_as_wikitext_basic, except that it takes a &mut String as dest instead of returning a String.

source

pub fn convert_to_as_wikitext_extended(&self, text: &str, output: &mut String)

Same as convert_to_as_wikitext_extended, except that it takes a &mut String as dest instead of returning a String.

source

pub fn convert_as_wikitext( &self, text: &str, secondary_converter_builder: &mut Option<ZhConverterBuilder<'_>>, skip_html_code_blocks: bool, apply_global_rules: bool ) -> String

The general implementation of MediaWiki syntax-aware conversion.

Equivalent to convert_as_wikitext_basic if addtional_conv_lines is set empty and both skip_html_code_blocks and apply_global_rules are set to false.

Equivalent to convert_as_wikitext_extended, otherwise.

addtional_conv_lines looks like:

zh-cn:天堂执法者; zh-hk:夏威夷探案; zh-tw:檀島警騎2.0;
zh-cn:史蒂芬·'史蒂夫'·麦格瑞特; zh-tw:史提夫·麥加雷; zh-hk:麥星帆;
zh-cn:丹尼尔·'丹尼/丹诺'·威廉姆斯; zh-tw:丹尼·威廉斯; zh-hk:韋丹尼;
source

pub fn convert_to_as_wikitext( &self, text: &str, output: &mut String, secondary_converter_builder: &mut Option<ZhConverterBuilder<'_>>, skip_html_code_blocks: bool, apply_global_rules: bool )

Same as convert_as_wikitext, except that it takes a &mut String as dest instead of returning a String.

source

pub fn count_matched(&self, text: &str) -> usize

Count the sum of lengths of matched source words to be substituted in the given text, in bytes.

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for Twhere T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for Twhere T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for Twhere T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for Twhere U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.