Skip to main content

Crate fastx

Crate fastx 

Source
Expand description

Low-overhead readers for FASTA and FASTQ sequence files.

FastX provides efficient parsing of FASTA and FASTQ formatted files, which are commonly used in bioinformatics to store nucleotide or protein sequences.

§Features

  • Zero-copy parsing where possible
  • Support for gzip-compressed files (.gz)
  • Iterator-based API for easy processing
  • Manual read API for fine-grained control
  • Automatic format detection

§Format Overview

FASTA format: Used to represent nucleotide or peptide sequences. Each record starts with > followed by a name/description line, then sequence data.

>sequence_name description
AGCTTAGCTAGCTACGATCG

FASTQ format: Like FASTA but includes quality scores for each base. Each record has four lines: name (starting with @), sequence, separator (+), and quality.

@sequence_name
AGCTTAGCTAGCTACGATCG
+
!''*((((***+))%%%++

§Examples

use std::io::BufReader;
use std::fs::File;
use fastx::FastX::{fasta_for_each, FastXRead};

let reader = BufReader::new(File::open("sequences.fasta").unwrap());
fasta_for_each(reader, |record| {
    println!("{} - length: {}", record.id(), record.seq_len());
}).unwrap();

§Iterator-based reading (convenient)

use std::io::BufReader;
use std::fs::File;
use fastx::FastX::{fasta_iter, FastXRead};

let reader = BufReader::new(File::open("sequences.fasta").unwrap());
for result in fasta_iter(reader) {
    let record = result.unwrap();
    println!("{} - length: {}", record.id(), record.seq_len());
}

§Manual reading

use std::io::BufRead;
use fastx::FastX::{FastARecord, FastXRead, reader_from_path};
use std::path::Path;

let mut reader = reader_from_path(Path::new("sequences.fasta")).unwrap();
let mut record = FastARecord::default();

while record.read(&mut *reader).unwrap() > 0 {
    println!("{}", record);
}

§Working with gzip-compressed files

use fastx::FastX::{reader_from_path, fastq_iter, FastXRead};
use std::path::Path;

// Automatically detects .gz extension and decompresses
let reader = reader_from_path(Path::new("sequences.fastq.gz")).unwrap();
for result in fastq_iter(reader) {
    let record = result.unwrap();
    println!("{}", record.id());
}

§Random access with indexed files

use fastx::indexed::IndexedFastXReader;
use fastx::FastX::FastXRead;
use std::path::Path;

// Open an indexed FASTA file (requires .fai and .gzi files)
let mut reader = IndexedFastXReader::from_path(Path::new("data.fasta.gz")).unwrap();

// Fetch a specific sequence by ID
let record = reader.fetch("chr1").unwrap();
println!("{}: {} bp", record.id(), record.seq_len());

// Fetch a specific region
let region = reader.fetch_range("chr1", 1000, 2000).unwrap();
println!("Region: {} bp", region.len());

Modules§

FastX
bgzf
Blocked GZip Format (BGZF) reader with seeking support.
fai
FASTA index (.fai) parser.
gzi
BGZF gzip index (.gzi) parser.
indexed
Indexed FASTA/FASTQ reader for random access by sequence ID.
remote
Remote file reader with HTTP range request support and block-level caching.