# Struct suffix::SuffixTable [−] [src]

`pub struct SuffixTable<'s, 't> { /* fields omitted */ }`

A suffix table is a sequence of lexicographically sorted suffixes.

The lifetimes `'s` and `'t` (respectively) refer to the text and suffix indices when borrowed.

This is distinct from a suffix array in that it only contains suffix indices. It has no "enhanced" information like the inverse suffix table or least-common-prefix lengths (LCP array). This representation limits what you can do (and how fast), but it uses very little memory (4 bytes per character in the text).

# Construction

Suffix array construction is done in `O(n)` time and in `O(kn)` space, where `k` is the number of unique characters in the text. (More details below.) The specific algorithm implemented is from (Nong et al., 2009), but I actually used the description found in (Shrestha et al., 2014), because it is much more accessible to someone who is not used to reading algorithms papers.

The main thrust of the algorithm is that of "reduce and conquer." Namely, it reduces the problem of finding lexicographically sorted suffixes to a smaller subproblem, and solves it recursively. The subproblem is to find the suffix array of a smaller string, where that string is composed by naming contiguous regions of the original text. If there are any duplicate names, then the algorithm procedes recursively. If there are no duplicate names (base case), then the suffix array of the subproblem is already computed. In essence, this "inductively sorts" suffixes of the original text with several linear scans over the text. Because of the number of linear scans, the performance of construction is heavily tied to cache performance (and this is why `u32` is used to represent the suffix index instead of a `u64`).

The space usage is roughly `6` bytes per character. (The optimal bound is `5` bytes per character, although that may be for a small constant alphabet.) `4` bytes comes from the suffix array itself. The extra `2` bytes comes from storing the suffix type of each character (`1` byte) and information about bin boundaries, where the number of bins is equal to the number of unique characters in the text. This doesn't formally imply another byte of overhead, but in practice, the alphabet can get quite large when solving the subproblems mentioned above (even if the alphabet of the original text is very small).

## Methods

### `impl<'s, 't> SuffixTable<'s, 't>`[src]

#### `fn new<S>(text: S) -> SuffixTable<'s, 't> where    S: Into<Cow<'s, str>>, `

Creates a new suffix table for `text` in `O(n)` time and `O(kn)` space, where `k` is the size of the alphabet in the text.

The table stores either `S` or a `&S` and a lexicographically sorted list of suffixes. Each suffix is represented by a 32 bit integer and is a byte index into `text`.

# Panics

Panics if the `text` contains more than `2^32 - 1` bytes. This restriction is mostly artificial; there's no fundamental reason why suffix array construction algorithm can't use a `u64`. Nevertheless, `u32` was chosen for performance reasons. The performance of the construction algorithm is highly dependent on cache performance, which is degraded with a bigger number type. `u32` strikes a nice balance; it gets good performance while allowing most reasonably sized documents (~4GB).

#### `fn from_parts<S, T>(text: S, table: T) -> SuffixTable<'s, 't> where    S: Into<Cow<'s, str>>,    T: Into<Cow<'t, [u32]>>, `

Creates a new suffix table from an existing list of lexicographically sorted suffix indices.

Note that the invariant that `table` must be a suffix table of `text` is not checked! If it isn't, this will cause other operations on a suffix table to fail in weird ways.

This fails if the number of characters in `text` does not equal the number of suffixes in `table`.

#### `fn into_parts(self) -> (Cow<'s, str>, Cow<'t, [u32]>)`

Extract the parts of a suffix table.

This is useful to avoid copying when the suffix table is part of an intermediate computation.

#### `fn lcp_lens(&self) -> Vec<u32>`

Computes the LCP array.

#### `fn table(&self) -> &[u32]`

Return the suffix table.

Return the text.

#### `fn len(&self) -> usize`

Returns the number of suffixes in the table.

Alternatively, this is the number of bytes in the text.

#### `fn is_empty(&self) -> bool`

Returns `true` iff `self.len() == 0`.

#### `fn suffix(&self, i: usize) -> &str`

Returns the suffix at index `i`.

#### `fn suffix_bytes(&self, i: usize) -> &[u8]`

Returns the suffix bytes starting at index `i`.

#### `fn contains(&self, query: &str) -> bool`

Returns true if and only if `query` is in text.

This runs in `O(mlogn)` time, where `m == query.len()` and `n == self.len()`. (As far as this author knows, this is the best known bound for a plain suffix table.)

You should prefer this over `positions` when you only need to test existence (because it is faster).

# Example

Build a suffix array of some text and test existence of a substring:

```use suffix::SuffixTable;

let sa = SuffixTable::new("The quick brown fox.");
assert!(sa.contains("quick"));```

#### `fn positions(&self, query: &str) -> &[u32]`

Returns an unordered list of positions where `query` starts in `text`.

This runs in `O(mlogn)` time, where `m == query.len()` and `n == self.len()`. (As far as this author knows, this is the best known bound for a plain suffix table.)

Positions are byte indices into `text`.

If you just need to test existence, then use `contains` since it is faster.

# Example

Build a suffix array of some text and find all occurrences of a substring:

```use suffix::SuffixTable;

let sa = SuffixTable::new("The quick brown fox was very quick.");
assert_eq!(sa.positions("quick"), &[4, 29]);```

## Trait Implementations

### `impl<'s, 't> Clone for SuffixTable<'s, 't>`[src]

#### `fn clone(&self) -> SuffixTable<'s, 't>`

Returns a copy of the value. Read more

#### `fn clone_from(&mut self, source: &Self)`1.0.0

Performs copy-assignment from `source`. Read more

### `impl<'s, 't> PartialEq for SuffixTable<'s, 't>`[src]

#### `fn eq(&self, __arg_0: &SuffixTable<'s, 't>) -> bool`

This method tests for `self` and `other` values to be equal, and is used by `==`. Read more

#### `fn ne(&self, __arg_0: &SuffixTable<'s, 't>) -> bool`

This method tests for `!=`.

### `impl<'s, 't> Debug for SuffixTable<'s, 't>`[src]

#### `fn fmt(&self, f: &mut Formatter) -> Result`

Formats the value using the given formatter.