1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
//! zhconv-rs converts Chinese between Traditional, Simplified and regional variants, using
//! rulesets sourced from [MediaWiki/Wikipedia](https://github.com/wikimedia/mediawiki/blob/master/includes/Languages/Data/ZhConversion.php)
//! and [OpenCC](https://github.com/BYVoid/OpenCC/tree/master/data),
//! which are merged, flattened and then precompiled into [Aho-Corasick](https://en.wikipedia.org/wiki/Aho–Corasick_algorithm)
//! automata by [daachorse](https://github.com/daac-tools/daachorse) for single-pass, linear-time
//! conversions.
//!
//! As with MediaWiki and OpenCC, the accuracy is generally acceptable, but remains limited.
//! The converter optionally supports MediaWiki conversion syntax (ref: [1](https://zh.wikipedia.org/wiki/Module:CGroup),
//! [2](https://zh.wikipedia.org/wiki/Help:高级字词转换语法)).
//!
//! ## Usage
//! ```toml
//! [dependencies]
//! # Bundle converters prebuilt from conversion tables sourced from MediaWiki (GPLv2.0+).
//! zhconv = { version = ... } # by default, features = ["compress", "mediawiki"].
//! # Bundle converters prebuilt from conversion tables sourced from OpenCC instead (Apache2.0).
//! zhconv = { version = ..., default-features = false, features = ["compress", "opencc"]}
//! # Combine conversion tables for one or more specific target variant(s) arbitrarily.
//! zhconv = { version = ..., default-features = false, features = ["compress", "opencc-hant", "mediawiki-hant", "opencc-hans", "mediawiki-tw"]}
//! ```
//!
//! ### Example
//! Convert simply:
//! ```
//! # #[cfg(any(feature = "mediawiki", feature = "opencc"))]
//! # {
//! use zhconv::{zhconv, Variant};
//! assert_eq!(zhconv("天干物燥 小心火烛", "zh-Hant".parse().unwrap()), "天乾物燥 小心火燭");
//! assert_eq!(zhconv("鼠曲草", Variant::ZhHant), "鼠麴草");
//! assert_eq!(zhconv("阿拉伯联合酋长国", Variant::ZhHant), "阿拉伯聯合酋長國");
//! assert_eq!(zhconv("阿拉伯联合酋长国", Variant::ZhTW), "阿拉伯聯合大公國");
//! # }
//! ```
//!
//! Using MediaWiki conversion syntax:
//! ```
//! # #[cfg(any(feature = "mediawiki", feature = "opencc"))]
//! # {
//! use zhconv::{zhconv_mw, Variant};
//! assert_eq!(zhconv_mw("天-{干}-物燥 小心火烛", "zh-Hant".parse::<Variant>().unwrap()), "天干物燥 小心火燭");
//! assert_eq!(zhconv_mw("-{zh-tw:鼠麴草;zh-cn:香茅}-是菊科草本植物。", Variant::ZhCN), "香茅是菊科草本植物。");
//! assert_eq!(zhconv_mw("菊科草本植物包括-{zh-tw:鼠麴草;zh-cn:香茅;}-等。", Variant::ZhTW), "菊科草本植物包括鼠麴草等。");
//! # }
//! ```
//! And more (note that such global rules always apply globally regardless of their
//! location, unlike in MediaWiki where they affect only the text that follows):
//! ```
//! # #[cfg(any(feature = "mediawiki", feature = "opencc"))]
//! # {
//! use zhconv::{zhconv_mw, Variant};
//! assert_eq!(zhconv_mw("-{H|zh:馬;zh-cn:鹿;}-馬克思主義", Variant::ZhCN), "鹿克思主义"); // add
//! assert_eq!(zhconv_mw("&二極體\n-{-|zh-hans:二极管; zh-hant:二極體}-\n", Variant::ZhCN), "&二极体\n\n"); // remove
//! # }
//! ```
//!
//! To customize the converter & conversion with fine-grained control, see [`ZhConverterBuilder`].
//! (De)Serialization of compiled converters is not supported yet.
//!
//! Other useful function:
//! ```
//! # #[cfg(any(feature = "mediawiki", feature = "opencc"))]
//! # {
//! use zhconv::{is_hans, is_hans_confidence, infer_variant, infer_variant_confidence};
//! assert!(is_hans("清乾隆嘉庆间刻本"));
//! assert!(!is_hans("秋冬濁而春夏清,晞於朝而生於夕"));
//! assert!(is_hans_confidence("滴瀝明花苑,葳蕤泫竹叢") < 0.5);
//! println!("{}", infer_variant("錦字緘愁過薊水,寒衣將淚到遼城"));
//! println!("{:?}", infer_variant_confidence("zhconv-rs 中文简繁及地區詞轉換"));
//! # }
//! ```
use for_wasm;
for_wasm!
pub use ;
pub use get_builtin_converter;
use *;
pub use get_builtin_tables;
pub use Variant;
pub const ENABLED_TARGET_VARIANTS: & = &;
/// Helper function for general conversion using built-in converters.
///
/// Built-in converters are pre-built, lazily loaded and cached for later use. For fine-grained
/// control and custom conversion rules, check [`ZhConverter`] and [`ZhConverterBuilder`].
/// Helper function for general conversion, activating MediaWiki conversion syntax support.
///
/// It function share the same built-in conversion converters as [`zhconv`](#method.zhconv), but
/// additionally supports conversion rules in MediaWiki syntax.
///
/// # Note
/// The implementation scans the input text at first to extract possible global rules like
/// `-{H|FOO BAR}-`.
/// If there are no global rules, the overall time complexity is `O(n + n)`.
/// Otherwise, the overall time complexity may degrade to `O(n + n * m)` in the worst case, where
/// `n` is input text length and `m` is the maximum lengths of source words in conversion rulesets.
/// In case global rules support are not expected, it is better to use
/// `get_builtin_converter(target).convert_as_wikitext_basic(text)` instead, which incurs no extra
/// overhead.
///
// /// Different from the implementation of MediaWiki, this crate use a automaton which makes it
// /// infeasible to mutate global rules during converting. So the function always searches the text
// /// for global rules such as `-{H|FOO BAR}-` in the first pass. If such rules exists, it build a
// /// new converter from the scratch with built-in conversion tables, which **takes extra time**.
// /// Otherwise, it just picks a built-in converter. Then it converts the text with the chosen
// /// converter during when non-global rules are parsed and applied.
///
/// For fine-grained control and custom conversion rules, check [`ZhConverterBuilder`].
///
/// Although it is designed to replicate the behavior of the MediaWiki implementation, it is not
/// fully compliant.
/// Determine whether the given text looks like Simplified Chinese over Traditional Chinese.
///
/// Equivalent to `is_hans_confidence(text) > 0.5`.
/// Determine whether the given text looks like Simplified Chinese over Traditional Chinese.
///
/// The return value is a real number in the range `[0, 1]` (inclusive) that indicates
/// confidence level. A value close to 1 indicate high confidence. A value close to 0
/// indicates low confidence. `0.5` indicates undeterminable (half-half).
/// If there is no enough input, `NaN` is returned.
/// Determine the Chinese variant of the input text.
///
/// # Limitations
/// Since the built-in conversion tables does not have actual rules specific to `zh-SG` / `zh-MO` /
/// `zh-MY`, they would never be returned.
///
/// The accuracy has not been assessed. Avoid relying on this for serious purposes.
/// Determine the Chinese variant of the input text with confidence.
///
/// # Returns
/// An array of `(variant, confidence_level)`, in descendent order of `confidence_level`, where
/// `confidence_level` is in the range `[0, 1]` (inclusive). `NaN` is returned if there is no
/// enough input.
///
/// # Limitations
/// The returned `confidence_level` of script variants (`ZhHant` and `ZhHans`) are always greater
/// than region variants (`ZhTW`, `ZhCN` and `ZhHK`) with the current implementation.
///
/// The accuracy has not been assessed. Avoid relying on this for serious purposes.
// /// Note that, unlike [`is_hans_confidence`](is_hans_confidence), a `confidence_level` greater
// /// than `0.5` might not imply high enough likelihood.
/// A helper trait that truncates a str around a specified index in constant time (`O(1)`),
/// intended to be used with `is_hans` and etc.