#[non_exhaustive]pub struct TransformOptions {
pub unassigned_codepoint_handling: UnassignedCodepointHandling,
pub ignore: bool,
pub case_fold: bool,
pub grapheme_boundary_markers: bool,
pub compat: bool,
pub composition: Option<CompositionOptions>,
pub lump: bool,
pub nlf_conversion: Option<NlfConversionMode>,
pub strip_control_codes: bool,
pub stable: bool,
}Expand description
Options for the map, decompose_buffer, and decompose_char functions.
Used to flexibly support multiple transformations through a single interface.
Some options are specific to composition/decomposition,
and are stored in CompositionOptions.
§Limitation
Certain options are only supported in the advanced interface, because they have the potential to produce invalid UTF8.
This currently includes the grapheme_boundary_markers option,
and unassigned_codepoint_handling set to UnassignedCodepointHandling::Allow.
Fields (Non-exhaustive)§
This struct is marked as non-exhaustive
Struct { .. } syntax; cannot be matched against without a wildcard ..; and struct update syntax will not work.unassigned_codepoint_handling: UnassignedCodepointHandlingSpecify how to handle unassigned codepoints.
By default, this is set to UnassignedCodepointHandling::Forbid.
ignore: boolStrip “default ignorable characters” such as SOFT-HYPHEN or ZERO-WIDTH-SPACE..
This is equivalent to the UTF8PROC_IGNORE option in the C library.
case_fold: boolApply Unicode case-folding, to be able to do a case-insensitive string comparison.
This is equivalent to the UTF8PROC_CASEFOLD option in the C library.
grapheme_boundary_markers: boolInserts marker values at the beginning of each sequence which is representing a single grapheme cluster (see UAX#29)..
This is only usable in the advanced interface,
because it produces invalid UTF8 or codepoints.
Using this option in the simple interface will panic.
The same functionality is also available through the crate::grapheme module.
This is equivalent to the UTF8PROC_CHARBOUND option in the C library.
compat: boolReplace certain characters with their compatibility decomposition.
This is used to implement NFKD and NFKC Unicode normalization.
This is equivalent to the UTF8PROC_COMPAT option in the C library.
composition: Option<CompositionOptions>If not None, enables composition/decomposition of control characters.
Use CompositionOptions::compose and CompositionOptions::decompose
for default compose/decompose options.
Equivalent to either UTF8PROC_COMPOSE or UTF8PROC_DECOMPOSE in the C library,
depending on the CompositionDirection.
lump: boolLump certain characters together.
For example, HYPHEN U+2010 and MINUS U+2212 are converted to ASCII “-”.
Documented in lump.md in the utf8proc repository (link valid as of version v2.10.0).
If the nlf_conversion option is set,
this includes a transformation of paragraph and
line separators to ASCII line-feed (LF).
nlf_conversion: Option<NlfConversionMode>Customize the conversion of NLF-sequences (LF, CRLF, CR, NEL).
If this is None, no conversions are applied.
Can be used to customize the strip_control_codes option.
strip_control_codes: boolStrips and/or converts control characters.
NLF-sequences are transformed into spaces, except if of the
nlf_conversion option is specified.
HorizontalTab (HT) and FormFeed (FF)
are treated as a NLF-sequence in this case.
All other control characters are simply removed.
stable: boolProhibit combining characters that would violate Unicode versioning stability.
Trait Implementations§
Source§impl Clone for TransformOptions
impl Clone for TransformOptions
Source§fn clone(&self) -> TransformOptions
fn clone(&self) -> TransformOptions
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more