1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242
// Copyright 2014-2016 Johannes Köster. // Licensed under the MIT license (http://opensource.org/licenses/MIT) // This file may not be copied, modified, or distributed // except according to those terms. #![doc( html_logo_url = "https://raw.githubusercontent.com/rust-bio/rust-bio/master/img/bioferris.svg", html_favicon_url = "https://raw.githubusercontent.com/rust-bio/rust-bio/master/img/bioferris.svg" )] //! # Rust-bio, a bioinformatics library for Rust. //! This library provides implementations of many algorithms and data structures //! that are useful for bioinformatics. //! All provided implementations are rigorously tested via continuous //! integration. //! //! For **getting started** with using `rust-bio`, see [the `Getting started` section below](#getting-started). //! For navigating the documentation of the available modules, see [the `Modules` section below](#modules). //! If you want to contribute to `rust-bio`, see [the `Contribute` section in the repo](https://github.com/rust-bio/rust-bio#contribute). //! //! Currently, rust-bio provides //! //! * most major pattern matching algorithms, //! * a convenient alphabet implementation, //! * pairwise alignment, //! * suffix arrays, //! * the [Burrows-Wheeler-transform (BWT)]() //! * the [Full-text index in Minute space index (FM-index)](https://doi.org/10.1109/SFCS.2000.892127), //! * FMD-Index for finding supermaximal exact matches, //! * a q-gram index, //! * utilities to work with [PSSMs](https://en.wikipedia.org/wiki/Position_weight_matrix), //! * an open reading frame (ORF) search algorithm, //! * a rank/select data structure, //! * [serde](https://github.com/serde-rs/serde) support for all data structures when built with `nightly` feature, //! * readers and writers for FASTQ, FASTA and BED, //! * helper functions for combinatorics and dealing with log probabilities, //! * an implementation of the Hidden Markov Model and related algorithms. //! //! For reading and writing SAM/BAM/CRAM, VCF/BCF files or tabix indexed files, have a look at [rust-htslib](https://docs.rs/rust-htslib). //! //! # Getting started //! //! We explain how to use Rust-Bio step-by-step. //! Users who already have experience with Rust can skip right to [Step 3: Use Rust-Bio in your project](https://docs.rs/bio/#step-3-use-rust-bio-in-your-project). //! Users who already know `rust-bio` might want to jump right into the [modules docs](https://docs.rs/bio/#modules) //! //! ## Step 1: Setting up Rust //! //! Rust can be installed following the instruction for [rustup](https://rustup.rs/). //! //! //! ## Step 2: Setting up a new Rust project //! //! Since Rust-Bio is a library, you need to setup your own new Rust project to use Rust-Bio. //! With Rust, projects and their dependencies are managed with the builtin package manager [Cargo](https://doc.rust-lang.org/cargo/index.html). //! To create a new Rust project, issue //! //! ```bash //! cargo new hello_world --bin //! cd hello_world //! ``` //! in your terminal. The flag `--bin` tells Cargo to create an executable project instead of a library. //! In [this section](https://doc.rust-lang.org/nightly/book/hello-cargo.html#a-new-project) of the Rust docs, you find details about what Cargo just created for you. //! //! Your new project can be compiled with //! ```bash //! cargo build //! ``` //! If dependencies in your project are out of date, update with //! ```bash //! cargo update //! ``` //! Execute the compiled code with //! ```bash //! cargo run //! ``` //! If you are new to Rust, we suggest to proceed with [learning Rust](https://www.rust-lang.org/learn) via the Rust docs. //! //! ## Step 3: Use Rust-Bio in your project //! //! To use Rust-Bio in your Rust project, add the following to your `Cargo.toml` //! //! ```toml //! [dependencies] //! bio = "*" //! ``` //! //! and import the crate from your source code: //! //! ```rust //! extern crate bio; //! ``` //! //! ## Example: FM-index and FASTQ //! //! An example of using `rust-bio`: //! //! ```rust //! // Import some modules //! use bio::alphabets; //! use bio::data_structures::bwt::{bwt, less, Occ}; //! use bio::data_structures::fmindex::{FMIndex, FMIndexable}; //! use bio::data_structures::suffix_array::suffix_array; //! use bio::io::fastq; //! use bio::io::fastq::FastqRead; //! use std::io; //! //! // a given text //! let text = b"ACAGCTCGATCGGTA$"; //! let pattern = b"ATCG"; //! //! // Create an FM-Index for the given text. //! //! // instantiate an alphabet //! let alphabet = alphabets::dna::iupac_alphabet(); //! // calculate a suffix array //! let sa = suffix_array(text); //! // calculate the Burrows-Wheeler-transform //! let bwt = bwt(text, &sa); //! // calculate the vectors less and Occ (occurrences) //! let less = less(&bwt, &alphabet); //! let occ = Occ::new(&bwt, 3, &alphabet); //! // set up FMIndex //! let fmindex = FMIndex::new(&bwt, &less, &occ); //! // do a backwards search for the pattern //! let interval = fmindex.backward_search(pattern.iter()); //! let positions = interval.occ(&sa); //! //! // Iterate over a FASTQ file, use the alphabet to validate read //! // sequences and search for exact matches in the FM-Index. //! //! // create FASTQ reader //! let mut reader = fastq::Reader::new(io::stdin()); //! let mut record = fastq::Record::new(); //! reader.read(&mut record).expect("Failed to parse record"); //! while !record.is_empty() { //! let check = record.check(); //! if check.is_err() { //! panic!("I got a rubbish record!") //! } //! // obtain sequence //! let seq = record.seq(); //! // check, whether seq is in the expected alphabet //! if alphabet.is_word(seq) { //! let interval = fmindex.backward_search(seq.iter()); //! let positions = interval.occ(&positions); //! } //! reader.read(&mut record).expect("Failed to parse record"); //! } //! ``` //! //! Documentation and further examples for each module can be found in the module descriptions below. //! //! //! ## Example: Multithreaded //! //! ```rust //! use bio::alphabets; //! use bio::data_structures::bwt::{bwt, less, Occ}; //! use bio::data_structures::fmindex::{FMIndex, FMIndexable}; //! use bio::data_structures::suffix_array::suffix_array; //! use std::sync::Arc; //! use std::thread; //! //! let text = b"ACGGATGCTGGATCGGATCGCGCTAGCTA$"; //! let patterns = vec![b"ACCG", b"TGCT"]; //! //! // Create an FM-Index for a given text. //! let alphabet = alphabets::dna::iupac_alphabet(); //! let sa = suffix_array(text); //! let bwt = Arc::new(bwt(text, &sa)); //! let less = Arc::new(less(bwt.as_ref(), &alphabet)); //! let occ = Arc::new(Occ::new(bwt.as_ref(), 3, &alphabet)); //! let fmindex = Arc::new(FMIndex::new(bwt, less, occ)); //! //! // Spawn threads to perform backward searches for each interval //! let interval_calculators = patterns //! .into_iter() //! .map(|pattern| { //! let fmindex = fmindex.clone(); //! thread::spawn(move || fmindex.backward_search(pattern.iter())) //! }) //! .collect::<Vec<_>>(); //! //! // Loop through the results, extracting the positions array for each pattern //! for interval_calculator in interval_calculators { //! let positions = interval_calculator.join().unwrap().occ(&sa); //! } //! ``` //! //! Documentation and further examples for each module can be found in the module descriptions below. //! //! # Benchmarks //! //! Since Rust-Bio is based on a compiled language, similar performance to C/C++ based libraries can be expected. Indeed, we find the pattern matching algorithms of Rust-Bio to perform in the range of the C++ library Seqan: //! //! | Algorithm | Rust-Bio | Seqan | //! | --------- | -------: | ------: | //! | BNDM | 77ms | 80ms | //! | Horspool | 122ms | 125ms | //! | BOM | 103ms | 107ms | //! | Shift-And | 241ms | 545ms | //! //! We measured 10000 iterations of searching pattern `GCGCGTACACACCGCCCG` in the sequence of the hg38 MT chromosome. //! Initialization time of each algorithm for the given pattern was included in each iteration. Benchmarks were conducted with *Cargo bench* for Rust-Bio and *Python timeit* for Seqan on an Intel Core i5-3427U CPU. //! Benchmarking Seqan from *Python timeit* entails an overhead of 1.46ms for calling a C++ binary. This overhead was subtracted from above Seqan run times. //! Note that this benchmark only compares the two libraries to exemplify that Rust-Bio has comparable speed to C++ libraries: all used algorithms have their advantages for specific text and pattern structures and lengths (see [the pattern matching section in the documentation](https://docs.rs/bio/0.28.2/bio/pattern_matching/index.html))./! #[macro_use] extern crate approx; #[macro_use] extern crate custom_derive; #[macro_use] extern crate lazy_static; #[macro_use] extern crate newtype_derive; #[macro_use] extern crate serde_derive; #[macro_use] extern crate strum_macros; #[macro_use] extern crate getset; #[cfg(feature = "phylogeny")] #[macro_use] extern crate pest_derive; pub mod alignment; pub mod alphabets; pub mod data_structures; pub mod io; pub mod pattern_matching; pub mod scores; pub mod seq_analysis; pub mod stats; pub mod utils;