Module case

Module case 

Source
Expand description

§Case folding

Case folding, i.e. mapping strings to a canonical form for string comparison, typically results in lowercase characters; however, characters in the Cherokee script resolve to uppercase characters. Case folding isn’t context-, language-, or locale-sensitive; however, you can specify whether to use mappings for languages like Turkish.

Currently, only simple case folding is supported. Simple case folding does not handle some special letter cases that have multiple characters, like Maße cannot match MASSE.

The API is CharCaseExt::to_simple_fold_case() and StrCaseExt::to_simple_fold_case(), for example:

use ib_unicode::case::StrCaseExt;

assert_eq!("βίος".to_simple_fold_case(), "βίοσ");
assert_eq!("Βίοσ".to_simple_fold_case(), "βίοσ");
assert_eq!("ΒΊΟΣ".to_simple_fold_case(), "βίοσ");
  • Unicode version: 16.0.0.
  • Performance: The default implementation uses the same algorithm as the unicase crate, which is compact but a bit slow, especially on miss paths. You can enable the perf-case-fold feature to use a faster algorithm.

Simple case folding is also used by the regex crate.

§Mono lowercase

The “mono lowercase” mentioned in this module refers to the single-char lowercase mapping of a Unicode character. This is different from Unicode’s simple case folding in that it always results in lowercase characters, and does not normalize different lower cases of a character to the same one (e.g. σ and ς are kept).

For example:

use ib_unicode::case::StrCaseExt;

assert_eq!("βίος".to_mono_lowercase(), "βίος");
assert_eq!("Βίοσ".to_mono_lowercase(), "βίοσ");
assert_eq!("ΒΊΟΣ".to_mono_lowercase(), "βίοσ");
  • Unicode version: 16.0.0.
  • Compared to char::to_lowercase()/str::to_lowercase() in std: the same, except that İ is mapped to i instead of i\u{307}.
    • Σ always maps to σ instead of conditionally ς, unlike in str::to_lowercase(). This may be changed if the need arises.
    • to_mono_lowercase() is also much faster if perf-case-map feature is enabled.
  • Compared to simple case folding: Besides normalization, the covered characters are basically the same, except that there is no İ in simple case folding but the following ones:
    • ΐ, ΐ
    • ΰ, ΰ
    • ſt, st

Traits§

CharCaseExt
StrCaseExt