logo
pub struct RankTransform {
    pub ranks: SymbolRanks,
}
Expand description

Tools based on transforming the alphabet symbols to their lexicographical ranks.

Lexicographical rank is computed using u8 representations, i.e. ASCII codes, of the input characters. For example, assuming that the alphabet consists of the symbols A, C, G, and T, this will yield ranks 0, 1, 2, 3 for them, respectively.

RankTransform can be used in to perform bit encoding for texts over a given alphabet via bio::data_structures::bitenc.

Fields

ranks: SymbolRanks

Implementations

Construct a new RankTransform.

Complexity: O(n), where n is the number of symbols in the alphabet.

Example
use bio::alphabets;

let dna_alphabet = alphabets::Alphabet::new(b"acgtACGT");
let dna_ranks = alphabets::RankTransform::new(&dna_alphabet);

Get the rank of symbol a.

This method panics for characters not contained in the alphabet.

Complexity: O(1)

Example
use bio::alphabets;

let dna_alphabet = alphabets::Alphabet::new(b"acgtACGT");
let dna_ranks = alphabets::RankTransform::new(&dna_alphabet);
assert_eq!(dna_ranks.get(65), 0); // "A"
assert_eq!(dna_ranks.get(116), 7); // "t"

Transform a given text into a vector of rank values.

Complexity: O(n), where n is the length of the text.

Example
use bio::alphabets;

let dna_alphabet = alphabets::Alphabet::new(b"ACGTacgt");
let dna_ranks = alphabets::RankTransform::new(&dna_alphabet);
let text = b"aAcCgGtT";
assert_eq!(dna_ranks.transform(text), vec![4, 0, 5, 1, 6, 2, 7, 3]);

Iterate over q-grams (substrings of length q) of given text. The q-grams are encoded as usize by storing the symbol ranks in log2(|A|) bits (with |A| being the alphabet size).

If q is larger than usize::BITS / log2(|A|), this method fails with an assertion.

Complexity: O(n), where n is the length of the text.

Example
use bio::alphabets;

let dna_alphabet = alphabets::Alphabet::new(b"ACGTacgt");
let dna_ranks = alphabets::RankTransform::new(&dna_alphabet);

let q_grams: Vec<usize> = dna_ranks.qgrams(2, b"ACGT").collect();
assert_eq!(q_grams, vec![1, 10, 19]);

Restore alphabet from transform.

Complexity: O(n), where n is the number of symbols in the alphabet.

Example
use bio::alphabets;

let dna_alphabet = alphabets::Alphabet::new(b"acgtACGT");
let dna_ranks = alphabets::RankTransform::new(&dna_alphabet);
assert_eq!(dna_ranks.alphabet().symbols, dna_alphabet.symbols);

Compute the number of bits required to encode the largest rank value.

For example, the alphabet b"ACGT" with 4 symbols has the maximal rank 3, which can be encoded in 2 bits.

This value can be used to create a data_structures::bitenc::BitEnc bit encoding tailored to the given alphabet.

Complexity: O(n), where n is the number of symbols in the alphabet.

Example
use bio::alphabets;

let dna_alphabet = alphabets::Alphabet::new(b"ACGT");
let dna_ranks = alphabets::RankTransform::new(&dna_alphabet);
assert_eq!(dna_ranks.get_width(), 2);
let dna_n_alphabet = alphabets::Alphabet::new(b"ACGTN");
let dna_n_ranks = alphabets::RankTransform::new(&dna_n_alphabet);
assert_eq!(dna_n_ranks.get_width(), 3);

Trait Implementations

Deserialize this value from the given Serde deserializer. Read more

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Performs the conversion.

Performs the conversion.

Should always be Self

The inverse inclusion map: attempts to construct self from the equivalent element of its superset. Read more

Checks if self is actually part of its subset T (and can be converted to it).

Use with care! Same as self.to_subset but without any property checks. Always succeeds.

The inclusion map: converts self to the equivalent element of its superset.

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.