bio 0.10.0

A bioinformatics library for Rust. This library provides implementations of many algorithms and data structures that are useful for bioinformatics, but also in other fields.

Crate bio [] [src]

Rust-bio, a bioinformatics library for Rust.

This library provides implementations of many algorithms and data structures that are useful for bioinformatics. All provided implementations are rigorously tested via continuous integration. For installation instructions and a general overview, visit https://rust-bio.github.io.

Currently, rust-bio provides

  • most major pattern matching algorithms,
  • a convenient alphabet implementation,
  • pairwise alignment,
  • suffix arrays,
  • BWT and FM-Index,
  • FMD-Index for finding supermaximal exact matches,
  • a q-gram index,
  • an orf research algorithm,
  • a rank/select data structure,
  • FASTQ and FASTA and BED readers and writers,
  • helper functions for combinatorics and dealing with log probabilities.

Example

use bio::alphabets;
use bio::data_structures::suffix_array::suffix_array;
use bio::data_structures::bwt::{bwt, less, Occ};
use bio::data_structures::fmindex::{FMIndex, FMIndexable};

let text = b"ACGGATGCTGGATCGGATCGCGCTAGCTA$";
let pattern = b"ACCG";

// Create an FM-Index for a given text.
let alphabet = alphabets::dna::iupac_alphabet();
let sa = suffix_array(text);
let bwt = bwt(text, &sa);
let less = less(&bwt, &alphabet);
let occ = Occ::new(&bwt, 3, &alphabet);
let fmindex = FMIndex::new(&bwt, &less, &occ);

let interval = fmindex.backward_search(pattern.iter());
let positions = interval.occ(&sa);

Multithreaded Example

use bio::alphabets;
use bio::data_structures::suffix_array::suffix_array;
use bio::data_structures::bwt::{bwt, less, Occ};
use bio::data_structures::fmindex::{FMIndex, FMIndexable};
use std::sync::Arc;
use std::thread;

let text = b"ACGGATGCTGGATCGGATCGCGCTAGCTA$";
let patterns = vec![b"ACCG", b"TGCT"];

// Create an FM-Index for a given text.
let alphabet = alphabets::dna::iupac_alphabet();
let sa = suffix_array(text);
let bwt = Arc::new(bwt(text, &sa));
let less = Arc::new(less(bwt.as_ref(), &alphabet));
let occ = Arc::new(Occ::new(bwt.as_ref(), 3, &alphabet));
let fmindex = Arc::new(FMIndex::new(bwt, less, occ));

// Spawn threads to perform backward searches for each interval
let interval_calculators = patterns.into_iter().map(|pattern| {
    let fmindex = fmindex.clone();
    thread::spawn(move ||
        fmindex.backward_search(pattern.iter())
    )
}).collect::<Vec<_>>();

// Loop through the results, extracting the positions array for each pattern
for interval_calculator in interval_calculators {
    let positions = interval_calculator.join().unwrap().occ(&sa);
}

Documentation and further examples for each module can be found in the module descriptions below.

Modules

alignment

Various alignment and distance computing algorithms.

alphabets

Implementation of alphabets and useful utilities.

data_structures

Various useful data structures.

io

Readers and writers for common bioinformatics file formats.

pattern_matching

This module contains various useful pattern matching algorithms. The implementations are based on the lecture notes "Algorithmen auf Sequenzen", Kopczynski, Marschall, Martin and Rahmann, 2008 - 2015.

scores
seq_analysis

Sequence analysis algorithms.

stats

Mathematical and statistical tools.

utils

Common utilities.