prseq (Rust)
High-performance Rust library for FASTA and FASTQ sequence parsing.
Overview
prseq is a Rust library providing fast, memory-efficient parsers for FASTA and FASTQ sequence formats. It features:
- High Performance: Zero-copy parsing where possible with optimized buffered I/O
- Streaming Iterators: Process files larger than available RAM
- Automatic Compression: Built-in support for gzip and bzip2
- Flexible Input: Works with files, stdin, or any
Readtrait - Format Support: Full FASTA and FASTQ with multi-line sequences
This library also powers the Python prseq package, which provides Python bindings and CLI tools.
Installation
Add to your Cargo.toml:
[]
= "0.0.6"
Rust API Reference
FASTA Parsing
use ;
use File;
// Read all records into memory
let records = read_fasta?;
for record in records
// Stream records (memory efficient)
let mut reader = from_file?;
for result in reader
// Read from stdin
let mut reader = from_stdin?;
for result in reader
// Performance tuning
let mut reader = from_file_with_capacity?;
// Works with any Read trait
let file = open?;
let mut reader = from_reader_with_capacity?;
FASTQ Parsing
use ;
use File;
// Read all records into memory
let records = read_fastq?;
for record in records
// Stream records (memory efficient)
let mut reader = from_file?;
for result in reader
// Read from stdin
let mut reader = from_stdin?;
// Performance tuning for different read lengths
let mut reader = from_file_with_capacity?; // Short reads
let mut reader = from_file_with_capacity?; // Long reads
// Works with any Read trait (including compressed streams)
use GzDecoder;
let file = open?;
let decoder = new;
let mut reader = from_reader_with_capacity?;
Development
Building
Testing
Publishing
Format Support
FASTA Format
- Header lines starting with
> - Multi-line sequences (automatic concatenation)
- Empty lines ignored
- Compression: gzip (.gz), bzip2 (.bz2)
FASTQ Format
- 4-line format:
@header,sequence,+[optional_header],quality - Multi-line sequences and quality scores
- Optional header validation on
+line - Automatic sequence/quality length validation
- Compression: gzip (.gz), bzip2 (.bz2)
Python Bindings
For Python users, see the Python prseq package which provides:
- Pythonic API with full type hints
- Command-line tools (
fasta-info,fastq-stats, etc.) - Easy installation via pip/uv
Links
- Main Project README - Project overview, features, and performance benchmarks
- Python Package README - Python API and CLI documentation
- Crates.io
- GitHub Repository
License
This project is licensed under the MIT License - see the LICENSE file for details.