Crate parasailors [] [src]

Background

parasailors is a set of Rust bindings to the parasail pairwise sequence alignment library, which is written in C. parasail uses vectorized/SIMD versions of the Smith-Waterman and Needleman-Wunsch algorithms for pairwise sequence alignment. parasail also includes a vectorized semi-global alignment algorithm which provides a global alignment for a query sequence and a local alignment for a reference sequence (useful with NGS reads that need to be mapped against a chromosome, for example).

WARNING: The bindings are currently in an immature state, and it's not recommended to use them for any published results or production systems without some independent verification of both the underlying algorithm implementations and this bindings library.

In the interest of ease of use, this crate provides a much simpler interface than the original library. The original C library provides dozens (hundreds?) of functions to use for alignment. Even though they only implement 3 algorithms, they vary based on which SIMD ISA is used, the integer width for the underlying calculations, whether statistics of the alignment are calculated, whether rows or columns from the dynamic programming matrix are returned, etc. However, the library also provides automatic SIMD feature detection (to dynamically dispatch functions based on CPU architecture), and an overflow-detecting method for picking the correct integer width for calculations. These dispatching functions are what are currently called in parasailors.

Usage

Nearly all parasail functions create a "profile" for your alignment query as a first step. However, this is wasteful when you may need to reuse a query profile across multiple reference alignments, so there is a family of functions which take a pointer to a profile instead of a raw query sequence. All parasailors functionality uses explicitly created profiles to encourage efficient reuse:

First, an exact matching example:

let query_sequence = b"AAAAAAAAAA";
let reference =      b"AAAAAAAAAACCCCCCCCCCGGGGGGGGGGTTTTTTTTTTTNNNNNNNNN";

We'll use an identity substitution matrix for scoring:

let identity_matrix = Matrix::new(MatrixType::Identity);

And construct a profile for querying the references:

let profile = Profile::new(query_sequence, &identity_matrix);

And now we can perform several different alignments with the same profile:

assert_eq!(10, local_alignment_score(&profile, reference, 1, 1));
assert_eq!(10, semi_global_alignment_score(&profile, reference, 1, 1));
assert_eq!(-30, global_alignment_score(&profile, reference, 1, 1));

And a non-matching alignment:

let reference = b"CCCCCCCCCCGGGGGGGGGGTTTTTTTTTTTNNNNNNNNN";
assert_eq!(0, local_alignment_score(&profile, reference, 1, 1));
assert_eq!(0, semi_global_alignment_score(&profile, reference, 1, 1));
assert_eq!(-30, global_alignment_score(&profile, reference, 1, 1));

Some more examples with differing query/reference relationships:

// a normal NGS read length
let query = b"AAAAAAAAAACCCCCCCCCCGGGGGGGGGGTTTTTTTTTTTNNNNNNNNN";
let profile = Profile::new(query, &identity_matrix);

// these should be exact matches, with score of 50
let reference = b"AAAAAAAAAACCCCCCCCCCGGGGGGGGGGTTTTTTTTTTTNNNNNNNNN";
assert_eq!(50, local_alignment_score(&profile, reference, 1, 1));
assert_eq!(50, semi_global_alignment_score(&profile, reference, 1, 1));
assert_eq!(50, global_alignment_score(&profile, reference, 1, 1));

// these should be inexact matches with 2 edits, with score of 48
let reference = b"AAAAAAAAAACCCCCCCCCCGGGGGGGGGGTTTTTCCTTTTTTNNNNNNNNN";
assert_eq!(48, local_alignment_score(&profile, reference, 1, 1));
assert_eq!(48, semi_global_alignment_score(&profile, reference, 1, 1));
assert_eq!(48, global_alignment_score(&profile, reference, 1, 1));

Also, we can just do one-off alignment which will automatically create and destroy the profile:

let query = b"AAAAAAAAAACCCCCCCCCCGGGGGGGGGGTTTTTTTTTTTNNNNNNNNN";
let reference = b"AAAAAAAAAACCCCCCCCCCGGGGGGGGGGTTTTTTTTTTTNNNNNNNNN";
assert_eq!(50, local_alignment_score_no_profile(reference, query, 1, 1, &identity_matrix));

Structs

AlignmentStats

Stores statistics from an alignment.

Matrix

A substitution matrix to use when aligning DNA or protein. Can be reused in many profiles.

Profile

A container for a parasail query profile. Can be reused to re-align the same sequence against multiple references.

Enums

MatrixType

Denotes the type of the substitution matrix. Use Identity for simple edit-distance calculations.

Functions

global_alignment_score

Provides a score for global pairwise alignment, using a vectorized version of Needleman-Wunsch.

local_alignment_score

Returns a score for local pairwise alignment using a vectorized version of Smith-Waterman.

local_alignment_score_no_profile

Returns a score for local pairwise alignment using a vectorized version of Smith-Waterman.

local_alignment_stats

Provides statistics for local pairwise alignment using a vectorized algorithm.

semi_global_alignment_score

Provides a score for semi-global pairwise alignment using a vectorized algorithm.

semi_global_alignment_stats

Provides statistics for semi-global pairwise alignment using a vectorized algorithm.