1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
// This file is part of ICU4X. For terms of use, please see the file
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).
// Various collation-related algorithms and constants in this file are
// adapted from ICU4C and, therefore, are subject to the ICU license as
// described in LICENSE.
// https://github.com/unicode-org/icu4x/blob/main/documents/process/boilerplate.md#library-annotations
//! Comparing strings according to language-dependent conventions.
//!
//! This module is published as its own crate ([`icu_collator`](https://docs.rs/icu_collator/latest/icu_collator/))
//! and as part of the [`icu`](https://docs.rs/icu/latest/icu/) crate. See the latter for more details on the ICU4X project.
//! `Collator` is the main structure of the component. It accepts a set of arguments
//! which allow it to collect necessary data from the data provider, and once
//! instantiated, can be used to compare strings.
//!
//! Refer to the ICU User Guide sections for Collation that give an
//! [introduction](https://unicode-org.github.io/icu/userguide/collation/) and explain
//! [basic concepts](https://unicode-org.github.io/icu/userguide/collation/concepts.html).
//!
//! # Examples
//!
//! As its most basic purpose, `Collator` offers locale-aware ordering:
//!
//! ```
//! use core::cmp::Ordering;
//! use icu::collator::{options::*, *};
//! use icu::locale::locale;
//!
//! let mut options = CollatorOptions::default();
//! options.strength = Some(Strength::Primary);
//! let collator_es =
//! Collator::try_new(locale!("es-u-co-trad").into(), options).unwrap();
//!
//! // "pollo" > "polvo" in traditional Spanish
//! assert_eq!(collator_es.compare("pollo", "polvo"), Ordering::Greater);
//!
//! let mut options = CollatorOptions::default();
//! options.strength = Some(Strength::Primary);
//! let collator_en = Collator::try_new(locale!("en").into(), options).unwrap();
//!
//! // "pollo" < "polvo" according to English rules
//! assert_eq!(collator_en.compare("pollo", "polvo"), Ordering::Less);
//! ```
//!
//! ## Examples of `CollatorOptions`
//!
//! The [`CollatorOptions`] struct configures specific custom behavior for the `Collator`. See docs
//! for [`CollatorOptions`] for more details. Some basic descriptions and examples are below.
//!
//! ## Strength
//!
//! The collation strength indicates how many levels to compare. The primary
//! level considers base letters, i.e. 'a' and 'b' are unequal but 'E' and 'é'
//! are equal, with higher levels dealing with distinctions such as accents
//! and case.
//!
//! If an lower level isn't equal, the lower level is decisive.
//! If the comparison result is equal on one level,
//! but the collator's strength input value is higher than that,
//! then the collator comparison iteratively proceeds to the next higher level.
//!
//! Note that lowering the strength value given to the collator means that more user-perceptible
//! differences will compare as equal. This may make sense when sorting more complex structures
//! where the string to be compared is just one field, and ties between strings
//! that differ only in case, accent, or similar are resolved by comparing some
//! secondary field in the larger structure to be sorted.
//!
//! Therefore, if the sort is just a string sort without some other field for
//! resolving ties, lowering the strength means that factors that don't make
//! sense to the user (such as the order of items prior to sorting with a stable
//! sort algorithm or the internal details of a sorting algorithm that doesn't
//! provide the stability property) affect the relative order of strings that
//! do have user-perceptible differences particularly in accents or case.
//!
//! Lowering the strength is less of a perfomance optimization than it may seem
//! directly from the above description. As described above, in the case
//! of identical strings to be compared, the algorithm has to work though all
//! the levels, from primary up to the provided strength value given to collator, without an early exit. However, this
//! collator implements an identical prefix optimization, which examines the
//! code units of the strings to be compared to skip the identical prefix before
//! starting the actual collation algorithm. When the strings to be compared
//! are identical on the byte level, they are found to be equal without the
//! actual collation algorithm running at all! Therefore, the strength setting
//! only has an effect (whether order effect or performance effect) for
//! comparisons where the strings to be compared are not equal on the byte level
//! but are equal on the primary level/strength. The common cases are that
//! a comparison is decided on the primary level or the strings are byte
//! equal, which narrows the performance effect of lowering the strength
//! setting.
//!
//! ```
//! use core::cmp::Ordering;
//! use icu::collator::{options::*, *};
//!
//! // Primary Level
//!
//! let mut options_l1 = CollatorOptions::default();
//! options_l1.strength = Some(Strength::Primary);
//! let collator_l1 =
//! Collator::try_new(Default::default(), options_l1).unwrap();
//!
//! assert_eq!(collator_l1.compare("a", "b"), Ordering::Less); // primary
//! assert_eq!(collator_l1.compare("as", "às"), Ordering::Equal); // secondary
//! assert_eq!(collator_l1.compare("às", "at"), Ordering::Less);
//! assert_eq!(collator_l1.compare("ao", "Ao"), Ordering::Equal); // tertiary
//! assert_eq!(collator_l1.compare("Ao", "aò"), Ordering::Equal);
//! assert_eq!(collator_l1.compare("A", "Ⓐ"), Ordering::Equal);
//!
//! // Secondary Level
//!
//! let mut options_l2 = CollatorOptions::default();
//! options_l2.strength = Some(Strength::Secondary);
//! let collator_l2 =
//! Collator::try_new(Default::default(), options_l2).unwrap();
//!
//! assert_eq!(collator_l2.compare("a", "b"), Ordering::Less); // primary
//! assert_eq!(collator_l2.compare("as", "às"), Ordering::Less); // secondary
//! assert_eq!(collator_l2.compare("às", "at"), Ordering::Less);
//! assert_eq!(collator_l2.compare("ao", "Ao"), Ordering::Equal); // tertiary
//! assert_eq!(collator_l2.compare("Ao", "aò"), Ordering::Less);
//! assert_eq!(collator_l2.compare("A", "Ⓐ"), Ordering::Equal);
//!
//! // Tertiary Level
//!
//! let mut options_l3 = CollatorOptions::default();
//! options_l3.strength = Some(Strength::Tertiary);
//! let collator_l3 =
//! Collator::try_new(Default::default(), options_l3).unwrap();
//!
//! assert_eq!(collator_l3.compare("a", "b"), Ordering::Less); // primary
//! assert_eq!(collator_l3.compare("as", "às"), Ordering::Less); // secondary
//! assert_eq!(collator_l3.compare("às", "at"), Ordering::Less);
//! assert_eq!(collator_l3.compare("ao", "Ao"), Ordering::Less); // tertiary
//! assert_eq!(collator_l3.compare("Ao", "aò"), Ordering::Less);
//! assert_eq!(collator_l3.compare("A", "Ⓐ"), Ordering::Less);
//! ```
//!
//! ## Alternate Handling
//!
//! Allows alternate handling for certain customized collation orderings, including the option to
//! ignore the special handling for the strings of such customizations. Specifically,
//! alternate handling is used to control the handling of the so-called **variable** characters in the
//! Unicode Collation Algorithm: whitespace, punctuation and symbols.
//!
//! Note that `AlternateHandling::ShiftTrimmed` and `AlternateHandling::Blanked` are
//! unimplemented. The default is `AlternateHandling::NonIgnorable`, except
//! for Thai, whose default is `AlternateHandling::Shifted`.
//!
//! ```
//! use core::cmp::Ordering;
//! use icu::collator::{*, options::*};
//!
//! // If alternate handling is set to `NonIgnorable`, then differences among
//! // these characters are of the same importance as differences among letters.
//!
//! let mut options_3n = CollatorOptions::default();
//! options_3n.strength = Some(Strength::Tertiary);
//! options_3n.alternate_handling = Some(AlternateHandling::NonIgnorable);
//! let collator_3n =
//! Collator::try_new(Default::default(), options_3n).unwrap();
//!
//! assert_eq!(collator_3n.compare("di Silva", "Di Silva"), Ordering::Less);
//! assert_eq!(collator_3n.compare("Di Silva", "diSilva"), Ordering::Less);
//! assert_eq!(collator_3n.compare("diSilva", "U.S.A."), Ordering::Less);
//! assert_eq!(collator_3n.compare("U.S.A.", "USA"), Ordering::Less);
//!
//! // If alternate handling is set to `Shifted`, then these characters are of only minor
//! // importance. The Shifted value is often used in combination with Strength
//! // set to Quaternary.
//!
//! let mut options_3s = CollatorOptions::default();
//! options_3s.strength = Some(Strength::Tertiary);
//! options_3s.alternate_handling = Some(AlternateHandling::Shifted);
//! let collator_3s =
//! Collator::try_new(Default::default(), options_3s).unwrap();
//!
//! assert_eq!(collator_3s.compare("di Silva", "diSilva"), Ordering::Equal);
//! assert_eq!(collator_3s.compare("diSilva", "Di Silva"), Ordering::Less);
//! assert_eq!(collator_3s.compare("Di Silva", "U.S.A."), Ordering::Less);
//! assert_eq!(collator_3s.compare("U.S.A.", "USA"), Ordering::Equal);
//!
//! let mut options_4s = CollatorOptions::default();
//! options_4s.strength = Some(Strength::Quaternary);
//! options_4s.alternate_handling = Some(AlternateHandling::Shifted);
//! let collator_4s =
//! Collator::try_new(Default::default(), options_4s).unwrap();
//!
//! assert_eq!(collator_4s.compare("di Silva", "diSilva"), Ordering::Less);
//! assert_eq!(collator_4s.compare("diSilva", "Di Silva"), Ordering::Less);
//! assert_eq!(collator_4s.compare("Di Silva", "U.S.A."), Ordering::Less);
//! assert_eq!(collator_4s.compare("U.S.A.", "USA"), Ordering::Less);
//! ```
//!
//! ## Case Level
//!
//! Whether to distinguish case in sorting, even for sorting levels higher than tertiary,
//! without having to use tertiary level just to enable case level differences.
//!
//! ```
//! use core::cmp::Ordering;
//! use icu::collator::{*, options::*};
//!
//! // Primary
//!
//! let mut options = CollatorOptions::default();
//! options.strength = Some(Strength::Primary);
//! options.case_level = Some(CaseLevel::Off);
//! let primary =
//! Collator::try_new(Default::default(),
//! options).unwrap();
//!
//! assert_eq!(primary.compare("ⓓⓔⓐⓛ", "DEAL"), Ordering::Equal);
//! assert_eq!(primary.compare("dejavu", "dejAvu"), Ordering::Equal);
//! assert_eq!(primary.compare("dejavu", "déjavu"), Ordering::Equal);
//!
//! // Primary with case level on
//!
//! options.strength = Some(Strength::Primary);
//! options.case_level = Some(CaseLevel::On);
//! let primary_and_case =
//! Collator::try_new(Default::default(),
//! options).unwrap();
//!
//! assert_eq!(primary_and_case.compare("ⓓⓔⓐⓛ", "DEAL"), Ordering::Less);
//! assert_eq!(primary_and_case.compare("dejavu", "dejAvu"), Ordering::Less);
//! assert_eq!(primary_and_case.compare("dejavu", "déjavu"), Ordering::Equal);
//!
//! // Secondary with case level on
//!
//! options.strength = Some(Strength::Secondary);
//! options.case_level = Some(CaseLevel::On);
//! let secondary_and_case =
//! Collator::try_new(Default::default(),
//! options).unwrap();
//!
//! assert_eq!(secondary_and_case.compare("ⓓⓔⓐⓛ", "DEAL"), Ordering::Less);
//! assert_eq!(secondary_and_case.compare("dejavu", "dejAvu"), Ordering::Less);
//! assert_eq!(secondary_and_case.compare("dejavu", "déjavu"), Ordering::Less); // secondary difference
//!
//! // Tertiary
//!
//! options.strength = Some(Strength::Tertiary);
//! options.case_level = Some(CaseLevel::Off);
//! let tertiary =
//! Collator::try_new(Default::default(),
//! options).unwrap();
//!
//! assert_eq!(tertiary.compare("ⓓⓔⓐⓛ", "DEAL"), Ordering::Less);
//! assert_eq!(tertiary.compare("dejavu", "dejAvu"), Ordering::Less);
//! assert_eq!(tertiary.compare("dejavu", "déjavu"), Ordering::Less);
//! ```
//!
//!
//! ## Backward second level
//!
//! Compare the second level in backward order. The default is `false` (off), except for Canadian
//! French.
//!
//! ## Examples of `CollatorPreferences`
//!
//! The [`CollatorPreferences`] struct configures specific custom behavior for the `Collator`, like
//! [`CollatorOptions`]. However, unlike `CollatorOptions`, this set of preferences can also be set
//! implicitly by the locale. See docs for [`CollatorPreferences`] for more details.
//! Some basic descriptions and examples are below.
//!
//! ## Case First
//!
//! Whether to swap the ordering of uppercase and lowercase.
//!
//! ```
//! use core::cmp::Ordering;
//! use icu::collator::preferences::*;
//! use icu::collator::{options::*, *};
//!
//! // Use the locale's default.
//!
//! let mut prefs_no_case = CollatorPreferences::default();
//! prefs_no_case.case_first = Some(CollationCaseFirst::False);
//! let collator_no_case =
//! Collator::try_new(prefs_no_case, Default::default()).unwrap();
//! assert_eq!(collator_no_case.compare("ab", "AB"), Ordering::Less);
//!
//! // Lowercase is less
//!
//! let mut prefs_lower_less = CollatorPreferences::default();
//! prefs_lower_less.case_first = Some(CollationCaseFirst::Lower);
//! let collator_lower_less =
//! Collator::try_new(prefs_lower_less, Default::default()).unwrap();
//! assert_eq!(collator_lower_less.compare("ab", "AB"), Ordering::Less);
//!
//! // Uppercase is less
//!
//! let mut prefs_upper_greater = CollatorPreferences::default();
//! prefs_upper_greater.case_first = Some(CollationCaseFirst::Upper);
//! let collator_upper_greater =
//! Collator::try_new(prefs_upper_greater, Default::default()).unwrap();
//! assert_eq!(collator_upper_greater.compare("AB", "ab"), Ordering::Less);
//! ```
//!
//! ## Numeric
//!
//! When set to `true` (on), any sequence of decimal
//! digits is sorted at a primary level according to the
//! numeric value.
//!
//! ```
//! use core::cmp::Ordering;
//! use icu::collator::preferences::*;
//! use icu::collator::{options::*, *};
//!
//! // Numerical sorting off
//!
//! let mut prefs_num_off = CollatorPreferences::default();
//! prefs_num_off.numeric_ordering = Some(CollationNumericOrdering::False);
//! let collator_num_off =
//! Collator::try_new(prefs_num_off, Default::default()).unwrap();
//! assert_eq!(collator_num_off.compare("a10b", "a2b"), Ordering::Less);
//!
//! // Numerical sorting on
//!
//! let mut prefs_num_on = CollatorPreferences::default();
//! prefs_num_on.numeric_ordering = Some(CollationNumericOrdering::True);
//! let collator_num_on =
//! Collator::try_new(prefs_num_on, Default::default()).unwrap();
//! assert_eq!(collator_num_on.compare("a10b", "a2b"), Ordering::Greater);
//! ```
//!
//! [`CollatorOptions`]: options::CollatorOptions
extern crate alloc;
// NOTE: The Pernosco debugger has special knowledge
// of the `CharacterAndClass` struct inside the `elements`
// module. Please do not change the crate-module-qualified
// name of that struct without coordination.
pub use Collator;
pub use CollatorBorrowed;
pub use CollatorPreferences;
pub use CollationKeySink;
/// Locale preferences used by this crate