Struct bio::data_structures::fmindex::FMDIndex
source · pub struct FMDIndex<DBWT: Borrow<BWT>, DLess: Borrow<Less>, DOcc: Borrow<Occ>> { /* private fields */ }
Expand description
The FMD-Index for linear time search of supermaximal exact matches on forward and reverse strand of DNA texts (Li, 2012).
Implementations§
source§impl<DBWT: Borrow<BWT>, DLess: Borrow<Less>, DOcc: Borrow<Occ>> FMDIndex<DBWT, DLess, DOcc>
impl<DBWT: Borrow<BWT>, DLess: Borrow<Less>, DOcc: Borrow<Occ>> FMDIndex<DBWT, DLess, DOcc>
sourcepub fn smems(
&self,
pattern: &[u8],
i: usize,
l: usize
) -> Vec<(BiInterval, usize, usize)>
pub fn smems( &self, pattern: &[u8], i: usize, l: usize ) -> Vec<(BiInterval, usize, usize)>
Find supermaximal exact matches (of length >= l) of given pattern that overlap position i in the pattern. Complexity O(m) with pattern of length m.
Example
use bio::alphabets::dna;
use bio::data_structures::bwt::{bwt, less, Occ};
use bio::data_structures::fmindex::{FMDIndex, FMIndex};
use bio::data_structures::suffix_array::suffix_array;
let text = b"ATTC$GAAT$";
let alphabet = dna::n_alphabet();
let sa = suffix_array(text);
let bwt = bwt(text, &sa);
let less = less(&bwt, &alphabet);
let occ = Occ::new(&bwt, 3, &alphabet);
let fm = FMIndex::new(&bwt, &less, &occ);
let fmdindex = FMDIndex::from(fm);
let pattern = b"ATT";
let intervals = fmdindex.smems(pattern, 2, 0);
let forward_positions = intervals[0].0.forward().occ(&sa);
let revcomp_positions = intervals[0].0.revcomp().occ(&sa);
let pattern_position = intervals[0].1;
let smem_len = intervals[0].2;
assert_eq!(forward_positions, [0]);
assert_eq!(revcomp_positions, [6]);
assert_eq!(pattern_position, 0);
assert_eq!(smem_len, 3);
sourcepub fn all_smems(
&self,
pattern: &[u8],
l: usize
) -> Vec<(BiInterval, usize, usize)>
pub fn all_smems( &self, pattern: &[u8], l: usize ) -> Vec<(BiInterval, usize, usize)>
Find all supermaximal exact matches (of length >= l) of given pattern. Complexity O(m^2) with pattern of length m.
Example
use bio::alphabets::dna;
use bio::data_structures::bwt::{bwt, less, Occ};
use bio::data_structures::fmindex::{FMDIndex, FMIndex};
use bio::data_structures::suffix_array::suffix_array;
let text = b"ATTCGGGG$CCCCGAAT$";
let alphabet = dna::n_alphabet();
let sa = suffix_array(text);
let bwt = bwt(text, &sa);
let less = less(&bwt, &alphabet);
let occ = Occ::new(&bwt, 3, &alphabet);
let fm = FMIndex::new(&bwt, &less, &occ);
let fmdindex = FMDIndex::from(fm);
let pattern = b"ATTGGGG";
let intervals = fmdindex.all_smems(pattern, 0);
assert_eq!(intervals.len(), 2);
let solutions = vec![[0, 14, 0, 3], [4, 9, 3, 4]];
for (i, interval) in intervals.iter().enumerate() {
let forward_positions = interval.0.forward().occ(&sa);
let revcomp_positions = interval.0.revcomp().occ(&sa);
let pattern_position = interval.1;
let smem_len = interval.2;
assert_eq!(
[
forward_positions[0],
revcomp_positions[0],
pattern_position,
smem_len
],
solutions[i]
);
}
sourcepub fn init_interval_with(&self, a: u8) -> BiInterval
pub fn init_interval_with(&self, a: u8) -> BiInterval
Initialize interval with given start character.
sourcepub fn init_interval(&self) -> BiInterval
pub fn init_interval(&self) -> BiInterval
Initialize interval for empty pattern. The interval points at the whole suffix array.
sourcepub fn backward_ext(&self, interval: &BiInterval, a: u8) -> BiInterval
pub fn backward_ext(&self, interval: &BiInterval, a: u8) -> BiInterval
Backward extension of given interval with given character.
pub fn forward_ext(&self, interval: &BiInterval, a: u8) -> BiInterval
sourcepub unsafe fn from_fmindex_unchecked(
fmindex: FMIndex<DBWT, DLess, DOcc>
) -> FMDIndex<DBWT, DLess, DOcc>
pub unsafe fn from_fmindex_unchecked( fmindex: FMIndex<DBWT, DLess, DOcc> ) -> FMDIndex<DBWT, DLess, DOcc>
Construct a new instance of the FMD index (see Heng Li (2012) Bioinformatics)
without checking whether the text is over the DNA alphabet with N.
This expects a BWT that was created from a text over the DNA alphabet with N
(alphabets::dna::n_alphabet()
) consisting of the
concatenation with its reverse complement, separated by the sentinel symbol $
.
I.e., let T be the original text and R be its reverse complement.
Then, the expected text is T$R$. Further, multiple concatenated texts are allowed, e.g.
T1$R1$T2$R2$T3$R3$.
It is unsafe to construct an FMD index from an FM index that is not built on the DNA alphabet.
Trait Implementations§
source§impl<DBWT: Clone + Borrow<BWT>, DLess: Clone + Borrow<Less>, DOcc: Clone + Borrow<Occ>> Clone for FMDIndex<DBWT, DLess, DOcc>
impl<DBWT: Clone + Borrow<BWT>, DLess: Clone + Borrow<Less>, DOcc: Clone + Borrow<Occ>> Clone for FMDIndex<DBWT, DLess, DOcc>
source§impl<DBWT: Debug + Borrow<BWT>, DLess: Debug + Borrow<Less>, DOcc: Debug + Borrow<Occ>> Debug for FMDIndex<DBWT, DLess, DOcc>
impl<DBWT: Debug + Borrow<BWT>, DLess: Debug + Borrow<Less>, DOcc: Debug + Borrow<Occ>> Debug for FMDIndex<DBWT, DLess, DOcc>
source§impl<DBWT: Default + Borrow<BWT>, DLess: Default + Borrow<Less>, DOcc: Default + Borrow<Occ>> Default for FMDIndex<DBWT, DLess, DOcc>
impl<DBWT: Default + Borrow<BWT>, DLess: Default + Borrow<Less>, DOcc: Default + Borrow<Occ>> Default for FMDIndex<DBWT, DLess, DOcc>
source§impl<'de, DBWT, DLess, DOcc> Deserialize<'de> for FMDIndex<DBWT, DLess, DOcc>where
DBWT: Deserialize<'de> + Borrow<BWT>,
DLess: Deserialize<'de> + Borrow<Less>,
DOcc: Deserialize<'de> + Borrow<Occ>,
impl<'de, DBWT, DLess, DOcc> Deserialize<'de> for FMDIndex<DBWT, DLess, DOcc>where DBWT: Deserialize<'de> + Borrow<BWT>, DLess: Deserialize<'de> + Borrow<Less>, DOcc: Deserialize<'de> + Borrow<Occ>,
source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where __D: Deserializer<'de>,
source§impl<DBWT: Borrow<BWT>, DLess: Borrow<Less>, DOcc: Borrow<Occ>> FMIndexable for FMDIndex<DBWT, DLess, DOcc>
impl<DBWT: Borrow<BWT>, DLess: Borrow<Less>, DOcc: Borrow<Occ>> FMIndexable for FMDIndex<DBWT, DLess, DOcc>
source§fn backward_search<'b, P: Iterator<Item = &'b u8> + DoubleEndedIterator>(
&self,
pattern: P
) -> BackwardSearchResult
fn backward_search<'b, P: Iterator<Item = &'b u8> + DoubleEndedIterator>( &self, pattern: P ) -> BackwardSearchResult
BackwardSearchResult
enum that
contains the suffix array interval denoting exact occurrences of the given pattern
of length m in the text if it exists, or the suffix array interval denoting the
exact occurrences of a maximal matching suffix of the given pattern if it does
not exist. If none of the pattern can be matched, the BackwardSearchResult
is
Absent
.
Complexity: O(m). Read moresource§impl<DBWT: Borrow<BWT>, DLess: Borrow<Less>, DOcc: Borrow<Occ>> From<FMIndex<DBWT, DLess, DOcc>> for FMDIndex<DBWT, DLess, DOcc>
impl<DBWT: Borrow<BWT>, DLess: Borrow<Less>, DOcc: Borrow<Occ>> From<FMIndex<DBWT, DLess, DOcc>> for FMDIndex<DBWT, DLess, DOcc>
source§fn from(fmindex: FMIndex<DBWT, DLess, DOcc>) -> FMDIndex<DBWT, DLess, DOcc>
fn from(fmindex: FMIndex<DBWT, DLess, DOcc>) -> FMDIndex<DBWT, DLess, DOcc>
Construct a new instance of the FMD index (see Heng Li (2012) Bioinformatics).
This expects a BWT that was created from a text over the DNA alphabet with N
(alphabets::dna::n_alphabet()
) consisting of the
concatenation with its reverse complement, separated by the sentinel symbol $
.
I.e., let T be the original text and R be its reverse complement.
Then, the expected text is T$R$. Further, multiple concatenated texts are allowed, e.g.
T1$R1$T2$R2$T3$R3$.
source§impl<DBWT: Hash + Borrow<BWT>, DLess: Hash + Borrow<Less>, DOcc: Hash + Borrow<Occ>> Hash for FMDIndex<DBWT, DLess, DOcc>
impl<DBWT: Hash + Borrow<BWT>, DLess: Hash + Borrow<Less>, DOcc: Hash + Borrow<Occ>> Hash for FMDIndex<DBWT, DLess, DOcc>
source§impl<DBWT: Ord + Borrow<BWT>, DLess: Ord + Borrow<Less>, DOcc: Ord + Borrow<Occ>> Ord for FMDIndex<DBWT, DLess, DOcc>
impl<DBWT: Ord + Borrow<BWT>, DLess: Ord + Borrow<Less>, DOcc: Ord + Borrow<Occ>> Ord for FMDIndex<DBWT, DLess, DOcc>
1.21.0 · source§fn max(self, other: Self) -> Selfwhere
Self: Sized,
fn max(self, other: Self) -> Selfwhere Self: Sized,
source§impl<DBWT: PartialEq + Borrow<BWT>, DLess: PartialEq + Borrow<Less>, DOcc: PartialEq + Borrow<Occ>> PartialEq<FMDIndex<DBWT, DLess, DOcc>> for FMDIndex<DBWT, DLess, DOcc>
impl<DBWT: PartialEq + Borrow<BWT>, DLess: PartialEq + Borrow<Less>, DOcc: PartialEq + Borrow<Occ>> PartialEq<FMDIndex<DBWT, DLess, DOcc>> for FMDIndex<DBWT, DLess, DOcc>
source§impl<DBWT: PartialOrd + Borrow<BWT>, DLess: PartialOrd + Borrow<Less>, DOcc: PartialOrd + Borrow<Occ>> PartialOrd<FMDIndex<DBWT, DLess, DOcc>> for FMDIndex<DBWT, DLess, DOcc>
impl<DBWT: PartialOrd + Borrow<BWT>, DLess: PartialOrd + Borrow<Less>, DOcc: PartialOrd + Borrow<Occ>> PartialOrd<FMDIndex<DBWT, DLess, DOcc>> for FMDIndex<DBWT, DLess, DOcc>
1.0.0 · source§fn le(&self, other: &Rhs) -> bool
fn le(&self, other: &Rhs) -> bool
self
and other
) and is used by the <=
operator. Read moresource§impl<DBWT, DLess, DOcc> Serialize for FMDIndex<DBWT, DLess, DOcc>where
DBWT: Serialize + Borrow<BWT>,
DLess: Serialize + Borrow<Less>,
DOcc: Serialize + Borrow<Occ>,
impl<DBWT, DLess, DOcc> Serialize for FMDIndex<DBWT, DLess, DOcc>where DBWT: Serialize + Borrow<BWT>, DLess: Serialize + Borrow<Less>, DOcc: Serialize + Borrow<Occ>,
impl<DBWT: Copy + Borrow<BWT>, DLess: Copy + Borrow<Less>, DOcc: Copy + Borrow<Occ>> Copy for FMDIndex<DBWT, DLess, DOcc>
impl<DBWT: Eq + Borrow<BWT>, DLess: Eq + Borrow<Less>, DOcc: Eq + Borrow<Occ>> Eq for FMDIndex<DBWT, DLess, DOcc>
impl<DBWT: Borrow<BWT>, DLess: Borrow<Less>, DOcc: Borrow<Occ>> StructuralEq for FMDIndex<DBWT, DLess, DOcc>
impl<DBWT: Borrow<BWT>, DLess: Borrow<Less>, DOcc: Borrow<Occ>> StructuralPartialEq for FMDIndex<DBWT, DLess, DOcc>
Auto Trait Implementations§
impl<DBWT, DLess, DOcc> RefUnwindSafe for FMDIndex<DBWT, DLess, DOcc>where DBWT: RefUnwindSafe, DLess: RefUnwindSafe, DOcc: RefUnwindSafe,
impl<DBWT, DLess, DOcc> Send for FMDIndex<DBWT, DLess, DOcc>where DBWT: Send, DLess: Send, DOcc: Send,
impl<DBWT, DLess, DOcc> Sync for FMDIndex<DBWT, DLess, DOcc>where DBWT: Sync, DLess: Sync, DOcc: Sync,
impl<DBWT, DLess, DOcc> Unpin for FMDIndex<DBWT, DLess, DOcc>where DBWT: Unpin, DLess: Unpin, DOcc: Unpin,
impl<DBWT, DLess, DOcc> UnwindSafe for FMDIndex<DBWT, DLess, DOcc>where DBWT: UnwindSafe, DLess: UnwindSafe, DOcc: UnwindSafe,
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<Q, K> Equivalent<K> for Qwhere
Q: Eq + ?Sized,
K: Borrow<Q> + ?Sized,
impl<Q, K> Equivalent<K> for Qwhere Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,
source§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key
and return true
if they are equal.§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere SS: SubsetOf<SP>,
§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self
from the equivalent element of its
superset. Read more§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self
is actually part of its subset T
(and can be converted to it).§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset
but without any property checks. Always succeeds.§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self
to the equivalent element of its superset.