Crate utf8proc_sys

Source

Expand description

Unsafe native bindings to the utf8proc library.

WARNING: Right now this crate only supports static linking.

Structs§

__BindgenBitfieldUnit
utf8proc_bidi_class_t: Bidirectional character classes.
utf8proc_boundclass_t: Boundclass property. (TR29)
utf8proc_category_t: Unicode categories.
utf8proc_decomp_type_t: Decomposition type.
utf8proc_indic_conjunct_break_t: Indic_Conjunct_Break property. (TR44)
utf8proc_option_t: Option flags used by several functions in the library.
utf8proc_property_struct: Struct containing information about a codepoint.

Constants§

UTF8PROC_ERROR_INVALIDOPTS: Invalid options have been used.
UTF8PROC_ERROR_INVALIDUTF8: The given string is not a legal UTF-8 string.
UTF8PROC_ERROR_NOMEM: Memory could not be allocated.
UTF8PROC_ERROR_NOTASSIGNED: The UTF8PROC_REJECTNA flag was set and an unassigned codepoint was found. */
UTF8PROC_ERROR_OVERFLOW: The given string is too long to be processed.

Statics§

utf8proc_utf8class^⚠: Array containing the byte lengths of a UTF-8 encoded codepoint based on the first byte.

Functions§

utf8proc_NFC^⚠: NFC normalization (@ref UTF8PROC_COMPOSE).
utf8proc_NFD^⚠: @name Unicode normalization
utf8proc_NFKC^⚠: NFKC normalization (@ref UTF8PROC_COMPOSE and @ref UTF8PROC_COMPAT).
utf8proc_NFKC_Casefold^⚠: NFKC_Casefold normalization (@ref UTF8PROC_COMPOSE and @ref UTF8PROC_COMPAT and @ref UTF8PROC_CASEFOLD and @ref UTF8PROC_IGNORE).
utf8proc_NFKD^⚠: NFKD normalization (@ref UTF8PROC_DECOMPOSE and @ref UTF8PROC_COMPAT).
utf8proc_category^⚠: Return the Unicode category for the codepoint (one of the @ref utf8proc_category_t constants.)
utf8proc_category_string^⚠: Return the two-letter (nul-terminated) Unicode category string for the codepoint (e.g. "Lu" or "Co").
utf8proc_charwidth^⚠: Given a codepoint, return a character width analogous to wcwidth(codepoint), except that a width of 0 is returned for non-printable codepoints instead of -1 as in wcwidth.
utf8proc_charwidth_ambiguous^⚠: Given a codepoint, return whether it has East Asian width class A (Ambiguous)
utf8proc_codepoint_valid^⚠: Check if a codepoint is valid (regardless of whether it has been assigned a value by the current Unicode standard).
utf8proc_decompose^⚠: The same as utf8proc_decompose_char(), but acts on a whole UTF-8 string and orders the decomposed sequences correctly.
utf8proc_decompose_char^⚠: Decompose a codepoint into an array of codepoints.
utf8proc_decompose_custom^⚠: The same as utf8proc_decompose(), but also takes a custom_func mapping function that is called on each codepoint in str before any other transformations (along with a custom_data pointer that is passed through to custom_func). The custom_func argument is ignored if it is NULL. See also utf8proc_map_custom().
utf8proc_encode_char^⚠: Encodes the codepoint as an UTF-8 string in the byte array pointed to by dst. This array must be at least 4 bytes long.
utf8proc_errmsg^⚠: Returns an informative error string for the given utf8proc error code (e.g. the error codes returned by utf8proc_map()).
utf8proc_get_property^⚠: Look up the properties for a given codepoint.
utf8proc_grapheme_break^⚠: Same as utf8proc_grapheme_break_stateful(), except without support for the Unicode 9 additions to the algorithm. Supported for legacy reasons.
utf8proc_grapheme_break_stateful^⚠: Given a pair of consecutive codepoints, return whether a grapheme break is permitted between them (as defined by the extended grapheme clusters in UAX#29).
utf8proc_islower^⚠: Given a codepoint c, return 1 if the codepoint corresponds to a lower-case character and 0 otherwise.
utf8proc_isupper^⚠: Given a codepoint c, return 1 if the codepoint corresponds to an upper-case character and 0 otherwise.
utf8proc_iterate^⚠: Reads a single codepoint from the UTF-8 sequence being pointed to by str. The maximum number of bytes read is strlen, unless strlen is negative (in which case up to 4 bytes are read).
utf8proc_map^⚠: Maps the given UTF-8 string pointed to by str to a new UTF-8 string, allocated dynamically by malloc and returned via dstptr.
utf8proc_map_custom^⚠: Like utf8proc_map(), but also takes a custom_func mapping function that is called on each codepoint in str before any other transformations (along with a custom_data pointer that is passed through to custom_func). The custom_func argument is ignored if it is NULL.
utf8proc_normalize_utf32^⚠: Normalizes the sequence of length codepoints pointed to by buffer in-place (i.e., the result is also stored in buffer).
utf8proc_reencode^⚠: Reencodes the sequence of length codepoints pointed to by buffer UTF-8 data in-place (i.e., the result is also stored in buffer). Can optionally normalize the UTF-32 sequence prior to UTF-8 conversion.
utf8proc_tolower^⚠: Given a codepoint c, return the codepoint of the corresponding lower-case character, if any; otherwise (if there is no lower-case variant, or if c is not a valid codepoint) return c.
utf8proc_totitle^⚠: Given a codepoint c, return the codepoint of the corresponding title-case character, if any; otherwise (if there is no title-case variant, or if c is not a valid codepoint) return c.
utf8proc_toupper^⚠: Given a codepoint c, return the codepoint of the corresponding upper-case character, if any; otherwise (if there is no upper-case variant, or if c is not a valid codepoint) return c.
utf8proc_unicode_version^⚠: Returns the utf8proc supported Unicode version as a string MAJOR.MINOR.PATCH.
utf8proc_version^⚠: Returns the utf8proc API version as a string MAJOR.MINOR.PATCH (http://semver.org format), possibly with a “-dev” suffix for development versions.

Type Aliases§

utf8proc_bool
utf8proc_custom_func: Function pointer type passed to utf8proc_map_custom() and utf8proc_decompose_custom(), which is used to specify a user-defined mapping of codepoints to be applied in conjunction with other mappings.
utf8proc_int8_t
utf8proc_int16_t
utf8proc_int32_t
utf8proc_property_t: Struct containing information about a codepoint.
utf8proc_propval_t: Holds the value of a property.
utf8proc_size_t
utf8proc_ssize_t
utf8proc_uint8_t
utf8proc_uint16_t
utf8proc_uint32_t

Crate utf8proc_sys

Crate utf8proc_sys Copy item path

Structs§

Constants§

Statics§

Functions§

Type Aliases§

Crate utf8proc_sys