# Suffix Collections

[](LICENSE)
[](https://crates.io/crates/suff_collections)
[](https://docs.rs/suff_collections)
Fast realization of suffix array and suffix tree for substring search, longest common prefix array (lcp array).
## Unicode
The current implementation builds suffix structures using bytes and does not decode the string before or during construction in Unicode. But if Unicode string is [normalized](https://unicode.org/reports/tr15) before construction and search, then structures support Unicode (because all byte sequences are decoded unambiguously). Also search and lcp returns indexes as in byte array but not in Unicode decoded string.
## Example
* **SuffixTree**
```rust
use suff_collections::{array::*, tree::*, lcp::*};
let word: &str = "Some word";
let find: &str = "word";
let st: SuffixTree = SuffixTree::new(word);
let res: Option<usize> = st.find(find);
let lcp: LCP<usize> = st.lcp::<usize>();
let sa = SuffixArray::<usize>::from(st);
let mut ost: OnlineSuffixTree = OnlineSuffixTree::new();
ost.add(word);
let res: Option<usize> = ost.find(find);
let st: SuffixTree = ost.finish();
let graph = SuffixTree::new("mississippi").to_graphviz(&mut buff);
println!("{}", &graph);
```
SuffixTree for the word "mississippi"

* **SuffixArray**
```rust
use suff_collections::{array::*, tree::*, lcp::*};
let word: &str = "Some word\0";
let find: &str = "word";
let sa = SuffixArray::<usize>::new(word);
let lcp: LCP<usize> = sa.lcp();
let res: Option<usize> = sa.find(find);
let res_all: &[usize] = sa.find_all(find);
let res: Option<usize> = sa.find_big(&sa.lcp(), find);
let res_all: &[usize] = sa.find_all_big(&sa.lcp(), find);
let st = SuffixTree::from(sa);
let word = "mississippi";
let sa = SuffixArray::<u32>::new(word);
let lcp = sa.lcp();
println!("LCP index suffixe's");
sa.iter().zip(lcp.iter()).for_each(|(&sa, &lcp)| {
let suff = std::str::from_utf8(&word.as_bytes()[sa as usize..]).unwrap();
println!("{} {} {}", lcp, sa, suff);
}
);
```
SuffixArray and lcp for the word "mississippi"
```
LCP index suffixe's
0 11
0 10 i
1 7 ippi
1 4 issippi
4 1 ississippi
0 0 mississippi
0 9 pi
1 8 ppi
0 6 sippi
2 3 sissippi
1 5 ssippi
3 2 ssissippi
```
All construction and search work for O(n). For the suffix tree implementation the [Ukkonen algorithm][2] is taken and for the suffix array implementation the [SA-IS algorithm][1] is taken.
[1]: https://www.researchgate.net/profile/Daricks_Wai_Hong_Chan/publication/221577802_Linear_Suffix_Array_Construction_by_Almost_Pure_Induced-Sorting/links/00b495318a21ba484f000000/Linear-Suffix-Array-Construction-by-Almost-Pure-Induced-Sorting.pdf?origin=publication_detail
[2]: https://web.stanford.edu/~mjkay/gusfield.pdf