Expand description
Extract byte ranges from BAM files and convert to interleaved FASTQ format.
This library provides efficient extraction of specific byte ranges from BAM (Binary Alignment/Map) files and converts them to interleaved FASTQ format. This enables parallel processing of large BAM files by splitting them into chunks that can be processed independently.
§Key Features
- Block-aligned extraction: Automatically aligns to BGZF block boundaries for valid data
- Paired-read aware: Ensures read pairs are kept together across chunk boundaries
- Interleaved FASTQ output: Compatible with
samtools fastqinterleaved format - Barcode support: Preserves BC tags in FASTQ headers
§Example
use std::io::stdout;
use bamslice::process_blocks;
let mut output = stdout();
let read_count = process_blocks(
"input.bam",
0, // start offset
1_000_000, // end offset (1MB chunk)
&mut output
)?;
println!("Extracted {read_count} reads");§BGZF Block Format
BAM files use BGZF (Blocked GNU Zip Format) compression. Each block is independently compressed, allowing random access. This library scans for valid BGZF block headers using an 8-byte signature and validates blocks by decompressing and checking for valid BAM records.
§Implementation Notes
- Read pairs are kept together by reading one extra record past the end boundary if needed
Functions§
- process_
blocks - Process a byte range from a BAM file and output interleaved FASTQ.