pub struct Gc(/* private fields */);Expand description
A slice of a single Unicode grapheme cluster (GC) (akin to str).
A grapheme cluster is a single visual “unit” in Unicode text, and is composed of at least one Unicode code point, possibly more.
This type is a wrapper around str that enforces the additional invariant that it will always contain exactly one grapheme cluster. This allows some operations (such as extracting the base code point) simpler.
§Why Grapheme Clusters?
The simplest example is the distinction between “é” (“Latin Small Letter E with Acute”) and “é” (“Latin Small Letter E”, “Combining Acute Accent”): the first is one code point, the second is two.
In Rust, the char type is a single code point. As a result, treating it as a “character” is incorrect for the same reason that using u8 is: it excludes many legitimate characters. It can also cause issues whereby naive algorithms may corrupt text by considering components of a grapheme cluster separately. For example, truncating a string to “10 characters” using chars can lead to logical characters being broken apart, potentially changing their meaning.
One inconvenience when dealing with grapheme clusters in Rust is that they are not accurately represented by any type more-so than a regular &str. However, operations that might make sense on an individual character (such as asking whether it is in the ASCII range, or is numeric) don’t make sense on a full string. In addition, a &str can be empty or contain more than one grapheme cluster.
Hence, this type guarantees that it always represents exactly one Unicode grapheme cluster.
Implementations§
Source§impl Gc
impl Gc
Sourcepub fn from_str(s: &str) -> Option<&Gc>
pub fn from_str(s: &str) -> Option<&Gc>
Create a new Gc from the given string slice.
The slice must contain exactly one grapheme cluster. In the event that the input is empty, or contains more than one grapheme cluster, this function will return None.
See: split_from.
Sourcepub unsafe fn from_str_unchecked(s: &str) -> &Gc
pub unsafe fn from_str_unchecked(s: &str) -> &Gc
Create a new Gc from the given string slice.
This function does not check to ensure the provided slice is a single, valid grapheme cluster.
Sourcepub fn split_from(s: &str) -> Option<(&Gc, &str)>
pub fn split_from(s: &str) -> Option<(&Gc, &str)>
Try to split a single grapheme cluster from the start of s.
Returns None if the given string was empty.
Sourcepub fn has_marks(&self) -> bool
pub fn has_marks(&self) -> bool
Does this grapheme cluster have additional marks applied to it?
This is true if the cluster is comprised of more than a single code point.
Sourcepub fn base_char(&self) -> char
pub fn base_char(&self) -> char
Returns the “base” code point.
That is, this returns the first code point in the cluster.
Sourcepub fn base(&self) -> &Gc
pub fn base(&self) -> &Gc
Returns the “base” code point as a grapheme cluster.
This is equivalent to converting this GC into a string slice, then slicing off the bytes that make up the first code point.
Sourcepub fn mark_str(&self) -> &str
pub fn mark_str(&self) -> &str
Returns the combining marks as a string slice.
The result of this method may be empty, or of arbitrary length.
Sourcepub fn char_indices(&self) -> CharIndices<'_>
pub fn char_indices(&self) -> CharIndices<'_>
An iterator over the code points of this grapheme cluster, and their associated byte offsets.
Sourcepub fn to_lowercase(&self) -> ToLowercase<'_>
pub fn to_lowercase(&self) -> ToLowercase<'_>
Returns an iterator over the code points in the lower case equivalent of this grapheme cluster.
Sourcepub fn to_uppercase(&self) -> ToUppercase<'_>
pub fn to_uppercase(&self) -> ToUppercase<'_>
Returns an iterator over the code points in the upper case equivalent of this grapheme cluster.