Utf32Str

Enum Utf32Str 

Source
pub enum Utf32Str<'a> {
    Ascii(&'a [u8]),
    Unicode(&'a [char]),
}
Expand description

A UTF-32 encoded (char array) string that is used as an input to (fuzzy) matching.

This is mostly intended as an internal string type, but some methods are exposed for convenience. We make the following API guarantees for Utf32Str(ing)s produced from a string using one of its From<T> constructors for string types T or from the Utf32Str::new method.

  1. The Ascii variant contains a byte buffer which is guaranteed to be a valid string slice.
  2. It is guaranteed that the string slice internal to the Ascii variant is identical to the original string.
  3. The length of a Utf32Str(ing) is exactly the number of graphemes in the original string.

Since Utf32Str(ing)s variants may be constructed directly, you must not make these assumptions when handling Utf32Str(ing)s of unknown origin.

§Caveats

Despite the name, this type is quite far from being a true string type. Here are some examples demonstrating this.

§String conversions are not round-trip

In the presence of a multi-codepoint grapheme (e.g. "u\u{0308}" which is u + COMBINING_DIAERESIS), the trailing codepoints are truncated.

assert_eq!(Utf32String::from("u\u{0308}").to_string(), "u");

§Indexing is done by grapheme

Indexing into a string is done by grapheme rather than by codepoint.

assert!(Utf32String::from("au\u{0308}").len() == 2);

§A Unicode variant may be produced by all-ASCII characters.

Since the windows-style newline \r\n is ASCII only but considered to be a single grapheme, strings containing \r\n will still result in a Unicode variant.

let s = Utf32String::from("\r\n");
assert!(!s.slice(..).is_ascii());
assert!(s.len() == 1);
assert!(s.slice(..).get(0) == '\n');

§Design rationale

Usually Rust’s UTF-8 encoded strings are great. However, since fuzzy matching operates on codepoints (ideally, it should operate on graphemes but that’s too much hassle to deal with), we want to quickly iterate over codepoints (up to 5 times) during matching.

Doing codepoint segmentation on the fly not only blows trough the cache (lookup tables and I-cache) but also has nontrivial runtime compared to the matching itself. Furthermore there are many extra optimizations available for ASCII only text, but checking each match has too much overhead.

Of course, this comes at extra memory cost as we usually still need the UTF-8 encoded variant for rendering. In the (dominant) case of ASCII-only text we don’t require a copy. Furthermore fuzzy matching usually is applied while the user is typing on the fly so the same item is potentially matched many times (making the the up-front cost more worth it). That means that its basically always worth it to pre-segment the string.

For usecases that only match (a lot of) strings once its possible to keep char buffer around that is filled with the presegmented chars.

Another advantage of this approach is that the matcher will naturally produce grapheme indices (instead of utf8 offsets) anyway. With a codepoint basic representation like this the indices can be used directly

Variants§

§

Ascii(&'a [u8])

A string represented as ASCII encoded bytes.

Correctness invariant: must only contain valid ASCII (<= 127)

§

Unicode(&'a [char])

A string represented as an array of unicode codepoints (basically UTF-32).

Implementations§

Source§

impl<'a> Utf32Str<'a>

Source

pub fn new(str: &'a str, buf: &'a mut Vec<char>) -> Utf32Str<'a>

Convenience method to construct a Utf32Str from a normal UTF-8 str

Source

pub fn len(self) -> usize

Returns the number of characters in this string.

Source

pub fn is_empty(self) -> bool

Returns whether this string is empty.

Source

pub fn slice(self, range: impl RangeBounds<usize>) -> Utf32Str<'a>

Creates a slice with a string that contains the characters in the specified character range.

Source

pub fn slice_u32(self, range: impl RangeBounds<u32>) -> Utf32Str<'a>

Same as slice but accepts a u32 range for convenience since those are the indices returned by the matcher.

Source

pub fn is_ascii(self) -> bool

Returns whether this string only contains graphemes which are single ASCII chars.

This is almost equivalent to the string being ASCII, except with the additional requirement that the string cannot contain a windows-style newline \r\n which is treated as a single grapheme.

Source

pub fn get(self, n: u32) -> char

Returns the nth character in this string, zero-indexed

Source

pub fn chars(self) -> Chars<'a>

Returns an iterator over the characters in this string

Trait Implementations§

Source§

impl<'a> Clone for Utf32Str<'a>

Source§

fn clone(&self) -> Utf32Str<'a>

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Utf32Str<'_>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more
Source§

impl Display for Utf32Str<'_>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more
Source§

impl<'a> Hash for Utf32Str<'a>

Source§

fn hash<__H>(&self, state: &mut __H)
where __H: Hasher,

Feeds this value into the given Hasher. Read more
1.3.0 · Source§

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

Feeds a slice of this type into the given Hasher. Read more
Source§

impl<'a> Ord for Utf32Str<'a>

Source§

fn cmp(&self, other: &Utf32Str<'a>) -> Ordering

This method returns an Ordering between self and other. Read more
1.21.0 · Source§

fn max(self, other: Self) -> Self
where Self: Sized,

Compares and returns the maximum of two values. Read more
1.21.0 · Source§

fn min(self, other: Self) -> Self
where Self: Sized,

Compares and returns the minimum of two values. Read more
1.50.0 · Source§

fn clamp(self, min: Self, max: Self) -> Self
where Self: Sized,

Restrict a value to a certain interval. Read more
Source§

impl<'a> PartialEq for Utf32Str<'a>

Source§

fn eq(&self, other: &Utf32Str<'a>) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl<'a> PartialOrd for Utf32Str<'a>

Source§

fn partial_cmp(&self, other: &Utf32Str<'a>) -> Option<Ordering>

This method returns an ordering between self and other values if one exists. Read more
1.0.0 · Source§

fn lt(&self, other: &Rhs) -> bool

Tests less than (for self and other) and is used by the < operator. Read more
1.0.0 · Source§

fn le(&self, other: &Rhs) -> bool

Tests less than or equal to (for self and other) and is used by the <= operator. Read more
1.0.0 · Source§

fn gt(&self, other: &Rhs) -> bool

Tests greater than (for self and other) and is used by the > operator. Read more
1.0.0 · Source§

fn ge(&self, other: &Rhs) -> bool

Tests greater than or equal to (for self and other) and is used by the >= operator. Read more
Source§

impl<'a> Copy for Utf32Str<'a>

Source§

impl<'a> Eq for Utf32Str<'a>

Source§

impl<'a> StructuralPartialEq for Utf32Str<'a>

Auto Trait Implementations§

§

impl<'a> Freeze for Utf32Str<'a>

§

impl<'a> RefUnwindSafe for Utf32Str<'a>

§

impl<'a> Send for Utf32Str<'a>

§

impl<'a> Sync for Utf32Str<'a>

§

impl<'a> Unpin for Utf32Str<'a>

§

impl<'a> UnwindSafe for Utf32Str<'a>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T> ToString for T
where T: Display + ?Sized,

Source§

fn to_string(&self) -> String

Converts the given value to a String. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.