[−][src]Crate needletail
Needletail is a crate to quickly and easily parse sequences out of streams/files and manipulate and analyse that data.
A contrived example of how to use it:
extern crate needletail; use needletail::{parse_sequence_path, Sequence}; use std::env; fn main() { let filename = "tests/data/28S.fasta"; // you could also read the filename from the command arguments like: // let filename: String = env::args().nth(1).unwrap(); let mut n_bases = 0; let mut n_valid_kmers = 0; parse_sequence_path( filename, |_| {}, |seq| { // seq.id is the name of the record // seq.seq is the base sequence // seq.qual is an optional quality score // keep track of the total number of bases n_bases += seq.seq.len(); // normalize to make sure all the bases are consistantly capitalized let norm_seq = seq.normalize(false); // we make a reverse complemented copy of the sequence first for // `canonical_kmers` to draw the complemented sequences from. let rc = norm_seq.reverse_complement(); // now we keep track of the number of AAAAs (or TTTTs via // canonicalization) in the file; note we also get the postion (i.0; // in the event there were `N`-containing kmers that were skipped) // and whether the sequence was complemented (i.2) in addition to // the canonical kmer (i.1) for (_, kmer, _) in norm_seq.canonical_kmers(4, &rc) { if kmer == b"AAAA" { n_valid_kmers += 1; } } }, ) .expect("parsing failed"); println!("There are {} bases in your file.", n_bases); println!("There are {} AAAAs in your file.", n_valid_kmers); }
Re-exports
pub use formats::parse_sequence_path; |
pub use formats::parse_sequence_reader; |
pub use sequence::Sequence; |
pub use sequence_record::SequenceRecord; |
Modules
bitkmer | Compact binary representations of nucleic acid kmers |
formats | Functions for reading sequence data from FASTA and FASTQ formats. |
kmer | Functions for splitting sequences into fixed-width moving windows (kmers) and utilities for dealing with these kmers. |
sequence | Generic functions for working with (primarily nucleic acid) sequences |
sequence_record | For working with sequences that have identifiers and optionally quality information. |
Structs
ParseError | The only error type that needletail returns |
Enums
ParseErrorType | The type of error that occured during file parsing |