Expand description
Unsafe native bindings to the utf8proc library.
WARNING: Right now this crate only supports static linking.
Structs§
- __
Bindgen Bitfield Unit - utf8proc_
bidi_ class_ t - Bidirectional character classes.
- utf8proc_
boundclass_ t - Boundclass property. (TR29)
- utf8proc_
category_ t - Unicode categories.
- utf8proc_
decomp_ type_ t - Decomposition type.
- utf8proc_
indic_ conjunct_ break_ t - Indic_Conjunct_Break property. (TR44)
- utf8proc_
option_ t - Option flags used by several functions in the library.
- utf8proc_
property_ struct - Struct containing information about a codepoint.
Constants§
- UTF8PROC_
ERROR_ INVALIDOPTS - Invalid options have been used.
- UTF8PROC_
ERROR_ INVALIDUT F8 - The given string is not a legal UTF-8 string.
- UTF8PROC_
ERROR_ NOMEM - Memory could not be allocated.
- UTF8PROC_
ERROR_ NOTASSIGNED - The
UTF8PROC_REJECTNAflag was set and an unassigned codepoint was found. */ - UTF8PROC_
ERROR_ OVERFLOW - The given string is too long to be processed.
Statics§
- utf8proc_
utf8class ⚠ - Array containing the byte lengths of a UTF-8 encoded codepoint based on the first byte.
Functions§
- utf8proc_
NFC ⚠ - NFC normalization (@ref UTF8PROC_COMPOSE).
- utf8proc_
NFD ⚠ - @name Unicode normalization
- utf8proc_
NFKC ⚠ - NFKC normalization (@ref UTF8PROC_COMPOSE and @ref UTF8PROC_COMPAT).
- utf8proc_
NFKC_ ⚠Casefold - NFKC_Casefold normalization (@ref UTF8PROC_COMPOSE and @ref UTF8PROC_COMPAT and @ref UTF8PROC_CASEFOLD and @ref UTF8PROC_IGNORE).
- utf8proc_
NFKD ⚠ - NFKD normalization (@ref UTF8PROC_DECOMPOSE and @ref UTF8PROC_COMPAT).
- utf8proc_
category ⚠ - Return the Unicode category for the codepoint (one of the @ref utf8proc_category_t constants.)
- utf8proc_
category_ ⚠string - Return the two-letter (nul-terminated) Unicode category string for
the codepoint (e.g.
"Lu"or"Co"). - utf8proc_
charwidth ⚠ - Given a codepoint, return a character width analogous to
wcwidth(codepoint), except that a width of 0 is returned for non-printable codepoints instead of -1 as inwcwidth. - utf8proc_
charwidth_ ⚠ambiguous - Given a codepoint, return whether it has East Asian width class A (Ambiguous)
- utf8proc_
codepoint_ ⚠valid - Check if a codepoint is valid (regardless of whether it has been assigned a value by the current Unicode standard).
- utf8proc_
decompose ⚠ - The same as utf8proc_decompose_char(), but acts on a whole UTF-8 string and orders the decomposed sequences correctly.
- utf8proc_
decompose_ ⚠char - Decompose a codepoint into an array of codepoints.
- utf8proc_
decompose_ ⚠custom - The same as utf8proc_decompose(), but also takes a
custom_funcmapping function that is called on each codepoint instrbefore any other transformations (along with acustom_datapointer that is passed through tocustom_func). Thecustom_funcargument is ignored if it isNULL. See also utf8proc_map_custom(). - utf8proc_
encode_ ⚠char - Encodes the codepoint as an UTF-8 string in the byte array pointed
to by
dst. This array must be at least 4 bytes long. - utf8proc_
errmsg ⚠ - Returns an informative error string for the given utf8proc error code (e.g. the error codes returned by utf8proc_map()).
- utf8proc_
get_ ⚠property - Look up the properties for a given codepoint.
- utf8proc_
grapheme_ ⚠break - Same as utf8proc_grapheme_break_stateful(), except without support for the Unicode 9 additions to the algorithm. Supported for legacy reasons.
- utf8proc_
grapheme_ ⚠break_ stateful - Given a pair of consecutive codepoints, return whether a grapheme break is permitted between them (as defined by the extended grapheme clusters in UAX#29).
- utf8proc_
islower ⚠ - Given a codepoint
c, return1if the codepoint corresponds to a lower-case character and0otherwise. - utf8proc_
isupper ⚠ - Given a codepoint
c, return1if the codepoint corresponds to an upper-case character and0otherwise. - utf8proc_
iterate ⚠ - Reads a single codepoint from the UTF-8 sequence being pointed to by
str. The maximum number of bytes read isstrlen, unlessstrlenis negative (in which case up to 4 bytes are read). - utf8proc_
map ⚠ - Maps the given UTF-8 string pointed to by
strto a new UTF-8 string, allocated dynamically bymallocand returned viadstptr. - utf8proc_
map_ ⚠custom - Like utf8proc_map(), but also takes a
custom_funcmapping function that is called on each codepoint instrbefore any other transformations (along with acustom_datapointer that is passed through tocustom_func). Thecustom_funcargument is ignored if it isNULL. - utf8proc_
normalize_ ⚠utf32 - Normalizes the sequence of
lengthcodepoints pointed to bybufferin-place (i.e., the result is also stored inbuffer). - utf8proc_
reencode ⚠ - Reencodes the sequence of
lengthcodepoints pointed to bybufferUTF-8 data in-place (i.e., the result is also stored inbuffer). Can optionally normalize the UTF-32 sequence prior to UTF-8 conversion. - utf8proc_
tolower ⚠ - Given a codepoint
c, return the codepoint of the corresponding lower-case character, if any; otherwise (if there is no lower-case variant, or ifcis not a valid codepoint) returnc. - utf8proc_
totitle ⚠ - Given a codepoint
c, return the codepoint of the corresponding title-case character, if any; otherwise (if there is no title-case variant, or ifcis not a valid codepoint) returnc. - utf8proc_
toupper ⚠ - Given a codepoint
c, return the codepoint of the corresponding upper-case character, if any; otherwise (if there is no upper-case variant, or ifcis not a valid codepoint) returnc. - utf8proc_
unicode_ ⚠version - Returns the utf8proc supported Unicode version as a string MAJOR.MINOR.PATCH.
- utf8proc_
version ⚠ - Returns the utf8proc API version as a string MAJOR.MINOR.PATCH (http://semver.org format), possibly with a “-dev” suffix for development versions.
Type Aliases§
- utf8proc_
bool - utf8proc_
custom_ func - Function pointer type passed to utf8proc_map_custom() and utf8proc_decompose_custom(), which is used to specify a user-defined mapping of codepoints to be applied in conjunction with other mappings.
- utf8proc_
int8_ t - utf8proc_
int16_ t - utf8proc_
int32_ t - utf8proc_
property_ t - Struct containing information about a codepoint.
- utf8proc_
propval_ t - Holds the value of a property.
- utf8proc_
size_ t - utf8proc_
ssize_ t - utf8proc_
uint8_ t - utf8proc_
uint16_ t - utf8proc_
uint32_ t