Struct SuffixTable

Source

pub struct SuffixTable<'s, 't> { /* private fields */ }

Expand description

A suffix table is a sequence of lexicographically sorted suffixes.

The lifetimes 's and 't (respectively) refer to the text and suffix indices when borrowed.

This is distinct from a suffix array in that it only contains suffix indices. It has no “enhanced” information like the inverse suffix table or least-common-prefix lengths (LCP array). This representation limits what you can do (and how fast), but it uses very little memory (4 bytes per character in the text).

§Construction

Suffix array construction is done in O(n) time and in O(kn) space, where k is the number of unique characters in the text. (More details below.) The specific algorithm implemented is from (Nong et al., 2009), but I actually used the description found in (Shrestha et al., 2014), because it is much more accessible to someone who is not used to reading algorithms papers.

The main thrust of the algorithm is that of “reduce and conquer.” Namely, it reduces the problem of finding lexicographically sorted suffixes to a smaller subproblem, and solves it recursively. The subproblem is to find the suffix array of a smaller string, where that string is composed by naming contiguous regions of the original text. If there are any duplicate names, then the algorithm proceeds recursively. If there are no duplicate names (base case), then the suffix array of the subproblem is already computed. In essence, this “inductively sorts” suffixes of the original text with several linear scans over the text. Because of the number of linear scans, the performance of construction is heavily tied to cache performance (and this is why u32 is used to represent the suffix index instead of a u64).

The space usage is roughly 6 bytes per character. (The optimal bound is 5 bytes per character, although that may be for a small constant alphabet.) 4 bytes comes from the suffix array itself. The extra 2 bytes comes from storing the suffix type of each character (1 byte) and information about bin boundaries, where the number of bins is equal to the number of unique characters in the text. This doesn’t formally imply another byte of overhead, but in practice, the alphabet can get quite large when solving the subproblems mentioned above (even if the alphabet of the original text is very small).

Struct SuffixTableCopy item path

§Construction

Implementations§

impl<'s, 't> SuffixTable<'s, 't>

pub fn new<S>(text: S) -> SuffixTable<'s, 't>where S: Into<Cow<'s, str>>,

§Panics

pub fn from_parts<S, T>(text: S, table: T) -> SuffixTable<'s, 't>where S: Into<Cow<'s, str>>, T: Into<Cow<'t, [u32]>>,

pub fn into_parts(self) -> (Cow<'s, str>, Cow<'t, [u32]>)

pub fn lcp_lens(&self) -> Vec<u32>

pub fn table(&self) -> &[u32]

pub fn text(&self) -> &str

pub fn len(&self) -> usize

pub fn is_empty(&self) -> bool

pub fn suffix(&self, i: usize) -> &str

pub fn suffix_bytes(&self, i: usize) -> &[u8] ⓘ

pub fn contains(&self, query: &str) -> bool

§Example

pub fn positions(&self, query: &str) -> &[u32]

§Example

pub fn any_position(&self, query: &str) -> Option<u32>

§Example

Trait Implementations§

impl<'s, 't> Clone for SuffixTable<'s, 't>

fn clone(&self) -> SuffixTable<'s, 't>

fn clone_from(&mut self, source: &Self)

impl<'s, 't> Debug for SuffixTable<'s, 't>

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl<'s, 't> PartialEq for SuffixTable<'s, 't>

fn eq(&self, other: &SuffixTable<'s, 't>) -> bool

fn ne(&self, other: &Rhs) -> bool

impl<'s, 't> Eq for SuffixTable<'s, 't>

impl<'s, 't> StructuralPartialEq for SuffixTable<'s, 't>

Auto Trait Implementations§

impl<'s, 't> Freeze for SuffixTable<'s, 't>

impl<'s, 't> RefUnwindSafe for SuffixTable<'s, 't>

impl<'s, 't> Send for SuffixTable<'s, 't>

impl<'s, 't> Sync for SuffixTable<'s, 't>

impl<'s, 't> Unpin for SuffixTable<'s, 't>

impl<'s, 't> UnwindSafe for SuffixTable<'s, 't>

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct SuffixTable

pub fn new<S>(text: S) -> SuffixTable<'s, 't>
where S: Into<Cow<'s, str>>,

pub fn from_parts<S, T>(text: S, table: T) -> SuffixTable<'s, 't>
where S: Into<Cow<'s, str>>, T: Into<Cow<'t, [u32]>>,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,