Struct zhconv::ZhConverter
source · pub struct ZhConverter { /* private fields */ }Expand description
A ZhConverter, built by ZhConverterBuilder.
Implementations§
source§impl ZhConverter
impl ZhConverter
sourcepub fn new(
automaton: CharwiseDoubleArrayAhoCorasick<u32>,
target_words: Vec<String>
) -> ZhConverter
pub fn new( automaton: CharwiseDoubleArrayAhoCorasick<u32>, target_words: Vec<String> ) -> ZhConverter
Create a new converter from a automaton and a mapping.
It is provided for convenience and not expected to be called directly.
ZhConverterBuilder would take care of these
details.
sourcepub fn with_target_variant(
automaton: CharwiseDoubleArrayAhoCorasick<u32>,
target_words: Vec<String>,
variant: Variant
) -> ZhConverter
pub fn with_target_variant( automaton: CharwiseDoubleArrayAhoCorasick<u32>, target_words: Vec<String>, variant: Variant ) -> ZhConverter
Create a new converter from a automaton and a mapping, as well as specifying a target
variant to be used by convert_as_wikitext_basic and
convert_as_wikitext_extended and related functions.
It is provided for convenience and not expected to be called directly.
ZhConverterBuilder would take care of these details.
sourcepub fn from_pairs(
pairs: impl IntoIterator<Item = (impl Into<String>, impl Into<String>)>
) -> ZhConverter
pub fn from_pairs( pairs: impl IntoIterator<Item = (impl Into<String>, impl Into<String>)> ) -> ZhConverter
Create a new converter of a sequence of (from, to) pairs.
It use ZhConverterBuilder internally.
sourcepub fn from_pairs_with_target_variant(
variant: Variant,
pairs: impl IntoIterator<Item = (impl Into<String>, impl Into<String>)>
) -> ZhConverter
pub fn from_pairs_with_target_variant( variant: Variant, pairs: impl IntoIterator<Item = (impl Into<String>, impl Into<String>)> ) -> ZhConverter
Create a new converter of a sequence of (from, to) pairs.
It takes a target variant to be used by convert_as_wikitext_basic
and convert_as_wikitext_extended and related
functions, in addition to from_pairs.
It use ZhConverterBuilder internally.
sourcepub fn convert_to(&self, text: &str, output: &mut String)
pub fn convert_to(&self, text: &str, output: &mut String)
Same as convert, except that it takes a &mut String as dest instead of returning a String.
sourcepub fn convert_with_secondary_converter(
&self,
text: &str,
secondary_converter: &ZhConverter
) -> String
pub fn convert_with_secondary_converter( &self, text: &str, secondary_converter: &ZhConverter ) -> String
Convert a text, a long with a secondary converter.
Conversion rules in the secondary converter shadow these existing ones in the original
converter.
For example, if the original converter contains a rule 香菜 -> 芫荽, and the the secondary
converter contains a rule 香菜 -> 鹽須, the latter would take effect and 香菜 is converted
to 鹽須.
The implementation match the text against the two converter alternatively, resulting in degraded performance. It would be better to build a new converter that combines the rulesets of both the two, especially when the secondary rulsets are non-trivial or the input text is large.
The worst-case time complexity of the implementation is O(n*m) where n and m are the
length of the text and the maximum lengths of sources words in conversion rulesets (i.e.
brute-force).
sourcepub fn convert_to_with_secondary_converter(
&self,
text: &str,
output: &mut String,
secondary_converter: &ZhConverter
)
pub fn convert_to_with_secondary_converter( &self, text: &str, output: &mut String, secondary_converter: &ZhConverter )
Same as convert_to_with_secondary_converter, except
that it takes a &mut String as dest instead of returning a String.
sourcepub fn convert_as_wikitext_basic(&self, text: &str) -> String
pub fn convert_as_wikitext_basic(&self, text: &str) -> String
Convert the given text, parsing and applying adhoc Mediawiki conversion rules in it.
Basic MediaWiki conversion rules like -{FOOBAR}- or -{zh-hant:FOO;zh-hans:BAR}- are
supported.
Unlike convert_to_as_wikitext_extended, rules
with additional flags like {H|zh-hant:FOO;zh-hans:BAR} that sets global rules are simply
ignored. And, it does not try to skip HTML code blocks like <code></code> and
<script></script>.
sourcepub fn convert_as_wikitext_extended(&self, text: &str) -> String
pub fn convert_as_wikitext_extended(&self, text: &str) -> String
Convert the given text, parsing and applying adhoc and global MediaWiki conversion rules in it.
Unlike convert_to_as_wikitext_basic, all flags
documented in Help:高级字词转换语法
are supported. And it tries to skip HTML code blocks such as <code></code> and
<script></script>.
Limitations
The internal implementation are intendedly replicating the behavior of LanguageConverter.php in MediaWiki. But it is not fully compliant with MediaWiki and providing NO PROTECTION over XSS attacks.
Compared to the plain convert, this is known to be MUCH SLOWER due to the inevitable
nature of the implementation decision made by MediaWiki.
sourcepub fn convert_to_as_wikitext_basic(&self, text: &str, output: &mut String)
pub fn convert_to_as_wikitext_basic(&self, text: &str, output: &mut String)
Same as convert_to_as_wikitext_basic, except that
it takes a &mut String as dest
instead of returning a String.
sourcepub fn convert_to_as_wikitext_extended(&self, text: &str, output: &mut String)
pub fn convert_to_as_wikitext_extended(&self, text: &str, output: &mut String)
Same as convert_to_as_wikitext_extended, except
that it takes a &mut String as dest instead of returning a String.
sourcepub fn convert_as_wikitext(
&self,
text: &str,
secondary_converter_builder: &mut Option<ZhConverterBuilder<'_>>,
skip_html_code_blocks: bool,
apply_global_rules: bool
) -> String
pub fn convert_as_wikitext( &self, text: &str, secondary_converter_builder: &mut Option<ZhConverterBuilder<'_>>, skip_html_code_blocks: bool, apply_global_rules: bool ) -> String
The general implementation of MediaWiki syntax-aware conversion.
Equivalent to convert_as_wikitext_basic if
addtional_conv_lines is set empty and both skip_html_code_blocks and
apply_global_rules are set to false.
Equivalent to convert_as_wikitext_extended,
otherwise.
addtional_conv_lines looks like:
zh-cn:天堂执法者; zh-hk:夏威夷探案; zh-tw:檀島警騎2.0;
zh-cn:史蒂芬·'史蒂夫'·麦格瑞特; zh-tw:史提夫·麥加雷; zh-hk:麥星帆;
zh-cn:丹尼尔·'丹尼/丹诺'·威廉姆斯; zh-tw:丹尼·威廉斯; zh-hk:韋丹尼;
sourcepub fn convert_to_as_wikitext(
&self,
text: &str,
output: &mut String,
secondary_converter_builder: &mut Option<ZhConverterBuilder<'_>>,
skip_html_code_blocks: bool,
apply_global_rules: bool
)
pub fn convert_to_as_wikitext( &self, text: &str, output: &mut String, secondary_converter_builder: &mut Option<ZhConverterBuilder<'_>>, skip_html_code_blocks: bool, apply_global_rules: bool )
Same as convert_as_wikitext, except
that it takes a &mut String as dest instead of returning a String.
sourcepub fn count_matched(&self, text: &str) -> usize
pub fn count_matched(&self, text: &str) -> usize
Count the sum of lengths of matched source words to be substituted in the given text, in bytes.