#[non_exhaustive]pub struct TransformOptions {
pub unassigned_codepoint_handling: UnassignedCodepointHandling,
pub ignore: bool,
pub case_fold: bool,
pub grapheme_boundary_markers: bool,
pub compat: bool,
pub composition: Option<CompositionOptions>,
pub lump: bool,
pub nlf_conversion: Option<NlfConversionMode>,
pub strip_control_codes: bool,
pub stable: bool,
}
Expand description
Options for the map
, decompose_buffer
, and decompose_char
functions.
Used to flexibly support multiple transformations through a single interface.
Some options are specific to composition/decomposition,
and are stored in CompositionOptions
.
§Limitation
Certain options are only supported in the advanced interface, because they have the potential to produce invalid UTF8.
This currently includes the grapheme_boundary_markers
option,
and unassigned_codepoint_handling
set to UnassignedCodepointHandling::Allow
.
Fields (Non-exhaustive)§
This struct is marked as non-exhaustive
Struct { .. }
syntax; cannot be matched against without a wildcard ..
; and struct update syntax will not work.unassigned_codepoint_handling: UnassignedCodepointHandling
Specify how to handle unassigned codepoints.
By default, this is set to UnassignedCodepointHandling::Forbid
.
ignore: bool
Strip “default ignorable characters” such as SOFT-HYPHEN or ZERO-WIDTH-SPACE..
This is equivalent to the UTF8PROC_IGNORE
option in the C library.
case_fold: bool
Apply Unicode case-folding, to be able to do a case-insensitive string comparison.
This is equivalent to the UTF8PROC_CASEFOLD
option in the C library.
grapheme_boundary_markers: bool
Inserts marker values at the beginning of each sequence which is representing a single grapheme cluster (see UAX#29)..
This is only usable in the advanced
interface,
because it produces invalid UTF8 or codepoints.
Using this option in the simple interface will panic.
The same functionality is also available through the crate::grapheme
module.
This is equivalent to the UTF8PROC_CHARBOUND
option in the C library.
compat: bool
Replace certain characters with their compatibility decomposition.
This is used to implement NFKD and NFKC Unicode normalization.
This is equivalent to the UTF8PROC_COMPAT
option in the C library.
composition: Option<CompositionOptions>
If not None
, enables composition/decomposition of control characters.
Use CompositionOptions::compose
and CompositionOptions::decompose
for default compose/decompose options.
Equivalent to either UTF8PROC_COMPOSE
or UTF8PROC_DECOMPOSE
in the C library,
depending on the CompositionDirection
.
lump: bool
Lump certain characters together.
For example, HYPHEN U+2010 and MINUS U+2212 are converted to ASCII “-”.
Documented in lump.md
in the utf8proc repository (link valid as of version v2.10.0).
If the nlf_conversion
option is set,
this includes a transformation of paragraph and
line separators to ASCII line-feed (LF).
nlf_conversion: Option<NlfConversionMode>
Customize the conversion of NLF-sequences (LF, CRLF, CR, NEL).
If this is None
, no conversions are applied.
Can be used to customize the strip_control_codes
option.
strip_control_codes: bool
Strips and/or converts control characters.
NLF-sequences are transformed into spaces, except if of the
nlf_conversion
option is specified.
HorizontalTab
(HT) and FormFeed
(FF)
are treated as a NLF-sequence in this case.
All other control characters are simply removed.
stable: bool
Prohibit combining characters that would violate Unicode versioning stability.
Trait Implementations§
Source§impl Clone for TransformOptions
impl Clone for TransformOptions
Source§fn clone(&self) -> TransformOptions
fn clone(&self) -> TransformOptions
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read more