Crate bamslice

Crate bamslice 

Source
Expand description

Extract byte ranges from BAM files and convert to interleaved FASTQ format.

This library provides efficient extraction of specific byte ranges from BAM (Binary Alignment/Map) files and converts them to interleaved FASTQ format. This enables parallel processing of large BAM files by splitting them into chunks that can be processed independently.

§Key Features

  • Block-aligned extraction: Automatically aligns to BGZF block boundaries for valid data
  • Paired-read aware: Ensures read pairs are kept together across chunk boundaries
  • Interleaved FASTQ output: Compatible with samtools fastq interleaved format
  • Barcode support: Preserves BC tags in FASTQ headers

§Example

use std::io::stdout;
use bamslice::process_blocks;

let mut output = stdout();
let read_count = process_blocks(
    "input.bam",
    0,           // start offset
    1_000_000,   // end offset (1MB chunk)
    &mut output
)?;
println!("Extracted {read_count} reads");

§BGZF Block Format

BAM files use BGZF (Blocked GNU Zip Format) compression. Each block is independently compressed, allowing random access. This library scans for valid BGZF block headers using an 8-byte signature and validates blocks by decompressing and checking for valid BAM records.

§Implementation Notes

  • Read pairs are kept together by reading one extra record past the end boundary if needed

Functions§

process_blocks
Process a byte range from a BAM file and output interleaved FASTQ.