Trait Seq

Source
pub trait Seq<'s>:
    Copy
    + Eq
    + Ord {
    type SeqVec: SeqVec;

    const BASES_PER_BYTE: usize;
    const BITS_PER_CHAR: usize;

    // Required methods
    fn len(&self) -> usize;
    fn get(&self, _index: usize) -> u8;
    fn get_ascii(&self, _index: usize) -> u8;
    fn to_word(&self) -> usize;
    fn to_vec(&self) -> Self::SeqVec;
    fn slice(&self, range: Range<usize>) -> Self;
    fn iter_bp(self) -> impl ExactSizeIterator<Item = u8> + Clone;
    fn par_iter_bp(
        self,
        context: usize,
    ) -> (impl ExactSizeIterator<Item = u32x8> + Clone, Self);
    fn par_iter_bp_delayed(
        self,
        context: usize,
        delay: usize,
    ) -> (impl ExactSizeIterator<Item = (u32x8, u32x8)> + Clone, Self);
    fn par_iter_bp_delayed_2(
        self,
        context: usize,
        delay1: usize,
        delay2: usize,
    ) -> (impl ExactSizeIterator<Item = (u32x8, u32x8, u32x8)> + Clone, Self);
    fn cmp_lcp(&self, other: &Self) -> (Ordering, usize);

    // Provided method
    fn bits_per_char(&self) -> usize { ... }
}
Expand description

A non-owned slice of characters.

The represented character values are expected to be in [0, 2^b), but they can be encoded in various ways. E.g.:

  • A &[u8] of ASCII characters, returning 8-bit values.
  • An AsciiSeq of DNA characters ACGT, interpreted 2-bit values.
  • A PackedSeq of packed DNA characters (4 per byte), returning 2-bit values.

Each character is assumed to fit in 8 bits. Some functions take or return this ‘unpacked’ (ASCII) character.

Required Associated Constants§

Source

const BASES_PER_BYTE: usize

Number of encoded characters per byte of memory of the Seq.

Source

const BITS_PER_CHAR: usize

Number of bits b to represent each character returned by iter_bp and variants..

Required Associated Types§

Source

type SeqVec: SeqVec

The corresponding owned sequence type.

Required Methods§

Source

fn len(&self) -> usize

The length of the sequence in characters.

Source

fn get(&self, _index: usize) -> u8

Get the character at the given index.

Source

fn get_ascii(&self, _index: usize) -> u8

Get the ASCII character at the given index, without mapping to b-bit values.

Source

fn to_word(&self) -> usize

Convert a short sequence (kmer) to a packed representation as usize.

Source

fn to_vec(&self) -> Self::SeqVec

Convert to an owned version.

Source

fn slice(&self, range: Range<usize>) -> Self

Get a sub-slice of the sequence. range indicates character indices.

Source

fn iter_bp(self) -> impl ExactSizeIterator<Item = u8> + Clone

Iterate over the b-bit characters of the sequence.

Source

fn par_iter_bp( self, context: usize, ) -> (impl ExactSizeIterator<Item = u32x8> + Clone, Self)

Iterate over 8 chunks of b-bit characters of the sequence in parallel.

This splits the input into 8 chunks and streams over them in parallel. Returns a separate tail iterator over the remaining characters. The context can be e.g. the k-mer size being iterated. When context>1, consecutive chunks overlap by context-1 bases.

Expected to be implemented using SIMD instructions.

Source

fn par_iter_bp_delayed( self, context: usize, delay: usize, ) -> (impl ExactSizeIterator<Item = (u32x8, u32x8)> + Clone, Self)

Iterate over 8 chunks of the sequence in parallel, returning two characters offset by delay positions.

Returned pairs are (add, remove), and the first delay ‘remove’ characters are always 0.

For example, when the sequence starts as ABCDEF..., and delay=2, the first returned tuples in the first lane are: (b'A', 0), (b'B', 0), (b'C', b'A'), (b'D', b'B').

When context>1, consecutive chunks overlap by context-1 bases: the first context-1 ‘added’ characters of the second chunk overlap with the last context-1 ‘added’ characters of the first chunk.

Source

fn par_iter_bp_delayed_2( self, context: usize, delay1: usize, delay2: usize, ) -> (impl ExactSizeIterator<Item = (u32x8, u32x8, u32x8)> + Clone, Self)

Iterate over 8 chunks of the sequence in parallel, returning three characters: the char added, the one delay positions before, and the one delay2 positions before.

Requires delay1 <= delay2.

Returned pairs are (add, d1, d2). The first delay1 d1 characters and first delay2 d2 are always 0.

For example, when the sequence starts as ABCDEF..., and delay1=2 and delay2=3, the first returned tuples in the first lane are: (b'A', 0, 0), (b'B', 0, 0), (b'C', b'A', 0), (b'D', b'B', b'A').

When context>1, consecutive chunks overlap by context-1 bases: the first context-1 ‘added’ characters of the second chunk overlap with the last context-1 ‘added’ characters of the first chunk.

Source

fn cmp_lcp(&self, other: &Self) -> (Ordering, usize)

Compare and return the LCP of the two sequences.

Provided Methods§

Source

fn bits_per_char(&self) -> usize

Convenience function that returns b=Self::BITS_PER_CHAR.

Dyn Compatibility§

This trait is not dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety", so this trait is not object safe.

Implementations on Foreign Types§

Source§

impl<'s> Seq<'s> for &[u8]

Maps ASCII to [0, 4) on the fly. Prefer first packing into a PackedSeqVec for storage.

Source§

fn to_vec(&self) -> Vec<u8>

Convert to an owned version.

Source§

fn iter_bp(self) -> impl ExactSizeIterator<Item = u8> + Clone

Iter the ASCII characters.

Source§

fn par_iter_bp( self, context: usize, ) -> (impl ExactSizeIterator<Item = S> + Clone, Self)

Iter the ASCII characters in parallel.

Source§

const BASES_PER_BYTE: usize = 1usize

Source§

const BITS_PER_CHAR: usize = 8usize

Source§

type SeqVec = Vec<u8>

Source§

fn len(&self) -> usize

Source§

fn get(&self, index: usize) -> u8

Source§

fn get_ascii(&self, index: usize) -> u8

Source§

fn to_word(&self) -> usize

Source§

fn slice(&self, range: Range<usize>) -> Self

Source§

fn par_iter_bp_delayed( self, context: usize, delay: usize, ) -> (impl ExactSizeIterator<Item = (S, S)> + Clone, Self)

Source§

fn par_iter_bp_delayed_2( self, context: usize, delay1: usize, delay2: usize, ) -> (impl ExactSizeIterator<Item = (S, S, S)> + Clone, Self)

Source§

fn cmp_lcp(&self, other: &Self) -> (Ordering, usize)

Implementors§

Source§

impl<'s> Seq<'s> for AsciiSeq<'s>

Maps ASCII to [0, 4) on the fly. Prefer first packing into a PackedSeqVec for storage.

Source§

impl<'s> Seq<'s> for PackedSeq<'s>