logo
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
// Copyright 2014-2016 Johannes Köster.
// Licensed under the MIT license (http://opensource.org/licenses/MIT)
// This file may not be copied, modified, or distributed
// except according to those terms.

#![doc(
    html_logo_url = "https://raw.githubusercontent.com/rust-bio/rust-bio/master/img/bioferris.svg",
    html_favicon_url = "https://raw.githubusercontent.com/rust-bio/rust-bio/master/img/bioferris.svg"
)]

//! # Rust-bio, a bioinformatics library for Rust.
//! This library provides implementations of many algorithms and data structures
//! that are useful for bioinformatics.
//! All provided implementations are rigorously tested via continuous
//! integration.
//!
//! For **getting started** with using `rust-bio`, see [the `Getting started` section below](#getting-started).
//! For navigating the documentation of the available modules, see [the `Modules` section below](#modules).
//! If you want to contribute to `rust-bio`, see [the `Contribute` section in the repo](https://github.com/rust-bio/rust-bio#contribute).
//!
//! Currently, rust-bio provides
//!
//! * most major pattern matching algorithms,
//! * a convenient alphabet implementation,
//! * pairwise alignment,
//! * suffix arrays,
//! * the [Burrows-Wheeler-transform (BWT)]()
//! * the [Full-text index in Minute space index (FM-index)](https://doi.org/10.1109/SFCS.2000.892127),
//! * FMD-Index for finding supermaximal exact matches,
//! * a q-gram index,
//! * utilities to work with [PSSMs](https://en.wikipedia.org/wiki/Position_weight_matrix),
//! * an open reading frame (ORF) search algorithm,
//! * a rank/select data structure,
//! * [serde](https://github.com/serde-rs/serde) support for all data structures when built with `nightly` feature,
//! * readers and writers for FASTQ, FASTA and BED,
//! * helper functions for combinatorics and dealing with log probabilities,
//! * an implementation of the Hidden Markov Model and related algorithms.
//!
//! For reading and writing SAM/BAM/CRAM, VCF/BCF files or tabix indexed files, have a look at [rust-htslib](https://docs.rs/rust-htslib).
//!
//! # Getting started
//!
//! We explain how to use Rust-Bio step-by-step.
//! Users who already have experience with Rust can skip right to [Step 3: Use Rust-Bio in your project](https://docs.rs/bio/#step-3-use-rust-bio-in-your-project).
//! Users who already know `rust-bio` might want to jump right into the [modules docs](https://docs.rs/bio/#modules)
//!
//! ## Step 1: Setting up Rust
//!
//! Rust can be installed following the instruction for [rustup](https://rustup.rs/).
//!
//!
//! ## Step 2: Setting up a new Rust project
//!
//! Since Rust-Bio is a library, you need to setup your own new Rust project to use Rust-Bio.
//! With Rust, projects and their dependencies are managed with the builtin package manager [Cargo](https://doc.rust-lang.org/cargo/index.html).
//! To create a new Rust project, issue
//!
//! ```bash
//! cargo new hello_world --bin
//! cd hello_world
//! ```
//! in your terminal. The flag `--bin` tells Cargo to create an executable project instead of a library.
//! In [this section](https://doc.rust-lang.org/nightly/book/hello-cargo.html#a-new-project) of the Rust docs, you find details about what Cargo just created for you.
//!
//! Your new project can be compiled with
//! ```bash
//! cargo build
//! ```
//! If dependencies in your project are out of date, update with
//! ```bash
//! cargo update
//! ```
//! Execute the compiled code with
//! ```bash
//! cargo run
//! ```
//! If you are new to Rust, we suggest to proceed with [learning Rust](https://www.rust-lang.org/learn) via the Rust docs.
//!
//! ## Step 3: Use Rust-Bio in your project
//!
//! To use Rust-Bio in your Rust project, add the following to your `Cargo.toml`
//!
//! ```toml
//! [dependencies]
//! bio = "*"
//! ```
//!
//! and import the crate from your source code:
//!
//! ```rust
//! extern crate bio;
//! ```
//!
//! ## Example: FM-index and FASTQ
//!
//! An example of using `rust-bio`:
//!
//! ```rust
//! // Import some modules
//! use bio::alphabets;
//! use bio::data_structures::bwt::{bwt, less, Occ};
//! use bio::data_structures::fmindex::{BackwardSearchResult, FMIndex, FMIndexable};
//! use bio::data_structures::suffix_array::suffix_array;
//! use bio::io::fastq;
//! use bio::io::fastq::FastqRead;
//! use std::io;
//!
//! // a given text
//! let text = b"ACAGCTCGATCGGTA$";
//! let pattern = b"ATCG";
//!
//! // Create an FM-Index for the given text.
//!
//! // instantiate an alphabet
//! let alphabet = alphabets::dna::iupac_alphabet();
//! // calculate a suffix array
//! let sa = suffix_array(text);
//! // calculate the Burrows-Wheeler-transform
//! let bwt = bwt(text, &sa);
//! // calculate the vectors less and Occ (occurrences)
//! let less = less(&bwt, &alphabet);
//! let occ = Occ::new(&bwt, 3, &alphabet);
//! // set up FMIndex
//! let fmindex = FMIndex::new(&bwt, &less, &occ);
//! // do a backwards search for the pattern
//! let interval = fmindex.backward_search(pattern.iter());
//! let mut partial_match_len = 0;
//! // get the locations where the pattern matched (completely in this case).
//! let positions = match interval {
//!     BackwardSearchResult::Complete(saint) => saint.occ(&sa),
//!     BackwardSearchResult::Partial(saint, l) => {
//!         partial_match_len = l;
//!         saint.occ(&sa)
//!     }
//!     BackwardSearchResult::Absent => Vec::new(),
//! };
//! // Iterate over a FASTQ file, use the alphabet to validate read
//! // sequences and search for exact matches in the FM-Index.
//!
//! // create FASTQ reader
//! let mut reader = fastq::Reader::new(io::stdin());
//! let mut record = fastq::Record::new();
//! let mut partial_match_len = 0;
//! reader.read(&mut record).expect("Failed to parse record");
//! while !record.is_empty() {
//!     let check = record.check();
//!     if check.is_err() {
//!         panic!("I got a rubbish record!")
//!     }
//!     // obtain sequence
//!     let seq = record.seq();
//!     // check, whether seq is in the expected alphabet
//!     if alphabet.is_word(seq) {
//!         let interval = fmindex.backward_search(seq.iter());
//!         // get the positions where seq matched completely
//!         // or where the maximal matching suffix of seq occurred.
//!         let positions = match interval {
//!             BackwardSearchResult::Complete(saint) => saint.occ(&sa),
//!             BackwardSearchResult::Partial(saint, l) => {
//!                 partial_match_len = l;
//!                 saint.occ(&sa)
//!             }
//!             BackwardSearchResult::Absent => Vec::new(),
//!         };
//!     }
//!     reader.read(&mut record).expect("Failed to parse record");
//! }
//! ```
//!
//! Documentation and further examples for each module can be found in the module descriptions below.
//!
//!
//! ## Example: Multithreaded
//!
//! ```rust
//! use bio::alphabets;
//! use bio::data_structures::bwt::{bwt, less, Occ};
//! use bio::data_structures::fmindex::{BackwardSearchResult,FMIndex, FMIndexable};
//! use bio::data_structures::suffix_array::suffix_array;
//! use std::sync::Arc;
//! use std::thread;
//!
//! let text = b"ACGGATGCTGGATCGGATCGCGCTAGCTA$";
//! let patterns = vec![b"ACCG", b"TGCT"];
//!
//! // Create an FM-Index for a given text.
//! let alphabet = alphabets::dna::iupac_alphabet();
//! let sa = suffix_array(text);
//! let bwt = Arc::new(bwt(text, &sa));
//! let less = Arc::new(less(bwt.as_ref(), &alphabet));
//! let occ = Arc::new(Occ::new(bwt.as_ref(), 3, &alphabet));
//! let fmindex = Arc::new(FMIndex::new(bwt, less, occ));
//!
//! // Spawn threads to perform backward searches for each interval
//! let interval_calculators = patterns
//!     .into_iter()
//!     .map(|pattern| {
//!         let fmindex = fmindex.clone();
//!         thread::spawn(move || fmindex.backward_search(pattern.iter()))
//!     })
//!     .collect::<Vec<_>>();
//!
//! // Loop through the results, extracting the positions array for each pattern
//! for interval_calculator in interval_calculators {
//!     let positions = match interval_calculator.join().unwrap() {
//!         BackwardSearchResult::Complete(saint) => saint.occ(&sa),
//!         _ => Vec::new()
//!     };
//! }
//! ```
//!
//! Documentation and further examples for each module can be found in the module descriptions below.
//!
//! # Benchmarks
//!
//! Since Rust-Bio is based on a compiled language, similar performance to C/C++ based libraries can be expected. Indeed, we find the pattern matching algorithms of Rust-Bio to perform in the range of the C++ library Seqan:
//!
//! | Algorithm | Rust-Bio | Seqan   |
//! | --------- | -------: | ------: |
//! | BNDM      | 77ms     | 80ms    |
//! | Horspool  | 122ms    | 125ms   |
//! | BOM       | 103ms    | 107ms   |
//! | Shift-And | 241ms    | 545ms   |
//!
//! We measured 10000 iterations of searching pattern `GCGCGTACACACCGCCCG` in the sequence of the hg38 MT chromosome.
//! Initialization time of each algorithm for the given pattern was included in each iteration. Benchmarks were conducted with *Cargo bench* for Rust-Bio and *Python timeit* for Seqan on an Intel Core i5-3427U CPU.
//! Benchmarking Seqan from *Python timeit* entails an overhead of 1.46ms for calling a C++ binary. This overhead was subtracted from above Seqan run times.
//! Note that this benchmark only compares the two libraries to exemplify that Rust-Bio has comparable speed to C++ libraries: all used algorithms have their advantages for specific text and pattern structures and lengths (see [the pattern matching section in the documentation](https://docs.rs/bio/0.28.2/bio/pattern_matching/index.html))./!

#[macro_use]
extern crate approx;

#[macro_use]
extern crate custom_derive;

#[macro_use]
extern crate lazy_static;

#[macro_use]
extern crate newtype_derive;

#[macro_use]
extern crate serde_derive;

#[macro_use]
extern crate strum_macros;

#[macro_use]
extern crate getset;

#[cfg(feature = "phylogeny")]
#[macro_use]
extern crate pest_derive;

pub mod alignment;
pub mod alphabets;
pub mod data_structures;
pub mod io;
pub mod pattern_matching;
pub mod scores;
pub mod seq_analysis;
pub mod stats;
pub mod utils;