biometal
ARM-native bioinformatics library with streaming architecture and evidence-based optimization
What Makes biometal Different?
Most bioinformatics tools require you to download entire datasets before analysis. biometal streams data directly from the network, enabling analysis of terabyte-scale datasets on consumer hardware without downloading.
Key Features
-
Streaming Architecture (Rule 5)
- Constant ~5 MB memory footprint regardless of dataset size
- Analyze 5TB datasets on laptops without downloading
- 99.5% memory reduction compared to batch processing
-
ARM-Native Performance (Rule 1)
- 16-25× speedup using ARM NEON SIMD
- Works across Mac (Apple Silicon), AWS Graviton, Ampere, Raspberry Pi
- Automatic fallback to scalar code on x86_64
-
Network Streaming (Rule 6)
- Stream directly from HTTP/HTTPS sources
- SRA toolkit integration (no local copy needed)
- Smart LRU caching minimizes network requests
- Background prefetching hides latency
-
Intelligent I/O (Rules 3-4)
- 6.5× speedup from parallel bgzip decompression
- Additional 2.5× from memory-mapped I/O (large files on macOS)
- Combined 16.3× I/O speedup
-
Evidence-Based Design
- Every optimization validated with statistical rigor (N=30, 95% CI)
- 1,357 experiments, 40,710 measurements
- Full methodology: apple-silicon-bio-bench
Quick Start
Rust Installation
[]
= "1.0"
Python Installation
# Install from PyPI
# Then import as 'biometal'
Note: The package name is
biometal-rson PyPI (thebiometalname was already taken), but you import it asbiometalin your Python code. See FAQ for details.
Alternative - Build from source:
Requirements:
- Python 3.9+ (tested on 3.14)
- Rust toolchain (for building from source)
Usage
Rust: Basic Usage
use FastqStream;
// Stream FASTQ from local file (constant memory)
let stream = from_path?;
for record in stream
Network Streaming
use DataSource;
use FastqStream;
// Stream directly from URL (no download!)
let source = Http;
let stream = new?;
// Analyze 5TB dataset without downloading
for record in stream
SRA Streaming (No Download!)
use DataSource;
use FastqStream;
// Stream directly from NCBI SRA (no local download!)
let source = Sra; // E. coli dataset
let stream = new?;
for record in stream
Operations with Auto-Optimization
use operations;
// ARM NEON automatically enabled on ARM platforms
let counts = base_counting?;
let gc = gc_content?;
// 16-25× faster on ARM, automatic scalar fallback on x86_64
Python: Basic Usage
# Stream FASTQ from local file (constant memory)
=
# Process one record at a time
# Memory stays constant at ~5 MB
=
Python: ARM NEON Operations
# ARM NEON automatically enabled on ARM platforms
# 16-25× faster on Mac ARM, automatic scalar fallback on x86_64
# GC content calculation
= b
= # 20.3× speedup on ARM
# Base counting
= # 16.7× speedup on ARM
# Quality scoring
=
= # 25.1× speedup on ARM
# K-mer extraction (for ML preprocessing)
=
Python: Example Workflow
# Analyze FASTQ file with streaming (constant memory)
=
= 0
= 0.0
= 0
# Count bases (ARM NEON accelerated)
=
+=
# Calculate GC content (ARM NEON accelerated)
=
+=
# Check quality (ARM NEON accelerated)
+= 1
K-mer Operations (Evidence-Based)
biometal provides k-mer operations optimized based on ASBB Entry 034 findings.
Key finding: K-mer operations are data-structure-bound (hash+HashMap), not compute-bound. Unlike element-wise operations (base counting, GC content), k-mers spend 50-60% of runtime on hash computation and 30-40% on data structure operations. Therefore, NEON/GPU provide no benefit.
Rust: K-mer Operations
use ;
// 1. Simple k-mer extraction (scalar-only, optimal)
let sequence = b"ATGCATGCATGC";
let kmers = extract_kmers; // Returns Vec<Vec<u8>>
// 2. Minimizers (minimap2-style, scalar-only)
let minimizers = extract_minimizers; // k=6, w=10
for minimizer in minimizers
// 3. K-mer spectrum (frequency counting, scalar-only)
let sequences = vec!;
let spectrum = kmer_spectrum; // HashMap<Vec<u8>, usize>
// 4. Parallel extraction (opt-in for large datasets, 2.2× speedup)
let extractor = with_parallel; // 4 threads (optimal per Entry 034)
let large_dataset: = /* 10K+ sequences */;
let kmers = extractor.extract; // 2.2× faster
Python: K-mer Operations
# 1. Simple k-mer extraction (scalar-only, optimal)
= b
= # Returns list[bytes]
# 2. Minimizers (minimap2-style, scalar-only)
=
# 3. K-mer spectrum (frequency counting, scalar-only)
=
= # Returns dict
# 4. Parallel extraction (opt-in for large datasets, 2.2× speedup)
=
= # 10K+ sequences
= # 2.2× faster
Evidence (Entry 034):
- Minimizers: 1.02-1.26× (NEON/Parallel) → Scalar-only
- K-mer Spectrum: 0.95-1.88× (sometimes SLOWER with parallel!) → Scalar-only
- K-mer Extraction: 2.19-2.38× (Parallel-4t) → Opt-in parallel
This validates minimap2's scalar design and identifies a 2.2× optimization opportunity for DNABert preprocessing.
Sequence Manipulation Operations (Phase 4)
biometal provides comprehensive sequence manipulation primitives for read processing pipelines. All operations maintain production quality with proper error handling.
Python: Sequence Operations
# 1. Reverse complement (standard molecular biology operation)
= b
= # b"GCATGCAT"
# 2. Complement only (preserves 5'→3' orientation)
= # b"TACGTACG"
# 3. Reverse only (no complementation)
= # b"CGTAGCTA"
# 4. Sequence validation
# 5. Count invalid bases (for QC)
=
Python: Record Operations
=
# 1. Extract region [start, end)
=
# 2. Reverse complement record (preserves quality alignment)
=
# Both sequence AND quality are reversed
# 3. Get sequence length
=
# 4. Length filtering
# 5. Convert FASTQ → FASTA (drops quality scores)
=
break
Python: Quality-Based Trimming
=
# 1. Fixed position trimming
= # Remove first 10bp
= # Remove last 5bp
=
# 2. Quality-based trimming (Phred+33, Q20 = 99% accuracy)
=
=
=
# 3. Sliding window trimming (Trimmomatic-style)
# Trim when 4bp window average drops below Q20
=
# 4. QC pipeline: trim + length filter
=
# Keep only ≥50bp after trimming
break
Python: Quality-Based Masking
=
# Mask low-quality bases with 'N' (preserves length unlike trimming)
=
# Count masked bases (for QC metrics)
=
= /
# Quality filter: reject if >10% masked
break
Python: Complete QC Pipeline
# Quality control pipeline: trim → filter → mask
=
= 0
= 0
= 0
# Step 1: Quality-based trimming (Q20, Trimmomatic-style)
=
# Step 2: Length filter (keep 50-150bp)
+= 1
continue
# Step 3: Mask remaining low-quality bases
=
# Step 4: Final QC check (<10% masked)
= /
+= 1
continue
+= 1
# Write to output or process further
Use Cases:
- Trimming: Remove low-quality ends before alignment (preserves high-quality core)
- Masking: Variant calling pipelines (preserves read structure for alignment)
- Region extraction: Extract specific genomic windows or features
- Reverse complement: Convert reads to correct strand orientation
- FASTQ→FASTA: Convert after quality filtering for downstream tools
Performance
Memory Efficiency
| Dataset Size | Traditional | biometal | Reduction |
|---|---|---|---|
| 100K sequences | 134 MB | 5 MB | 96.3% |
| 1M sequences | 1,344 MB | 5 MB | 99.5% |
| 5TB dataset | 5,000 GB | 5 MB | 99.9999% |
ARM NEON Speedup (Mac Apple Silicon)
Optimized for Apple Silicon - All optimizations validated on Mac M3 Max (1,357 experiments, N=30):
| Operation | Scalar | NEON | Speedup |
|---|---|---|---|
| Base counting | 315 Kseq/s | 5,254 Kseq/s | 16.7× |
| GC content | 294 Kseq/s | 5,954 Kseq/s | 20.3× |
| Quality filter | 245 Kseq/s | 6,143 Kseq/s | 25.1× |
Cross-Platform Performance (Validated Nov 2025)
| Platform | Base Counting | GC Content | Quality | Status |
|---|---|---|---|---|
| Mac M3 (target) | 16.7× | 20.3× | 25.1× | ✅ Optimized |
| AWS Graviton | 10.7× | 6.9× | 1.9× | ✅ Works (portable) |
| x86_64 Intel | 1.0× | 1.0× | 1.0× | ✅ Works (portable) |
Note: biometal is optimized for Mac ARM (consumer hardware democratization). Other platforms are supported with correct, production-ready code but not specifically optimized. See Cross-Platform Testing Results for details.
I/O Optimization
| File Size | Standard | Optimized | Speedup |
|---|---|---|---|
| Small (<50 MB) | 12.3s | 1.9s | 6.5× |
| Large (≥50 MB) | 12.3s | 0.75s | 16.3× |
Democratizing Bioinformatics
biometal addresses four barriers that lock researchers out of genomics:
1. Economic Barrier
- Problem: Most tools require $50K+ servers
- Solution: Consumer ARM laptops ($1,400) deliver production performance
- Impact: Small labs and LMIC researchers can compete
2. Environmental Barrier
- Problem: HPC clusters consume massive energy (300× excess for many workloads)
- Solution: ARM efficiency inherent in architecture
- Impact: Reduced carbon footprint for genomics research
3. Portability Barrier
- Problem: Vendor lock-in (x86-only, cloud-only tools)
- Solution: Works across ARM ecosystem (Mac, Graviton, Ampere, RPi)
- Impact: No platform dependencies, true portability
4. Data Access Barrier ⭐
- Problem: 5TB datasets require 5TB storage + days to download
- Solution: Network streaming with smart caching
- Impact: Analyze 5TB datasets on 24GB laptops without downloading
Evidence Base
biometal's design is grounded in comprehensive experimental validation:
- Experiments: 1,357 total (40,710 measurements with N=30)
- Statistical rigor: 95% confidence intervals, Cohen's d effect sizes
- Cross-platform: Mac M4 Max, AWS Graviton 3
- Lab notebook: 33 entries documenting full experimental log
See OPTIMIZATION_RULES.md for detailed evidence links.
Full methodology: apple-silicon-bio-bench
Publications (in preparation):
- DAG Framework: BMC Bioinformatics
- biometal Library: Bioinformatics (Application Note) or JOSS
- Four-Pillar Democratization: GigaScience
Platform Support
Optimization Strategy
biometal is optimized for Mac ARM (M1/M2/M3/M4) based on 1,357 experiments on Mac M3 Max. This aligns with our democratization mission: enable world-class bioinformatics on affordable consumer hardware ($1,000-2,000 MacBooks, not $50,000 servers).
Other platforms are supported with portable, correct code but not specifically optimized:
| Platform | Performance | Test Status | Strategy |
|---|---|---|---|
| Mac ARM (M1/M2/M3/M4) | 16-25× speedup | ✅ 121/121 tests pass | Optimized (target platform) |
| AWS Graviton | 6-10× speedup | ✅ 121/121 tests pass | Portable (works well) |
| Linux x86_64 | 1× (scalar) | ✅ 118/118 tests pass | Portable (fallback) |
Feature Support Matrix
| Feature | macOS ARM | Linux ARM | Linux x86_64 |
|---|---|---|---|
| ARM NEON SIMD | ✅ | ✅ | ❌ (scalar fallback) |
| Parallel Bgzip | ✅ | ✅ | ✅ |
| Smart mmap | ✅ | ⏳ | ❌ |
| Network Streaming | ✅ | ✅ | ✅ |
| Python Bindings | ✅ | ✅ | ✅ |
Validation: Cross-platform testing completed Nov 2025 on AWS Graviton 3 and x86_64. All tests pass. See results/cross_platform/FINDINGS.md for full details.
Roadmap
v1.0.0 (Released November 5, 2025) ✅
- Streaming FASTQ/FASTA parsers (constant memory)
- ARM NEON operations (16-25× speedup)
- Network streaming (HTTP/HTTPS, SRA)
- Python bindings (PyO3 0.27, Python 3.9-3.14)
- Cross-platform validation (Mac ARM, Graviton, x86_64)
- Production-grade quality (121 tests, Grade A+)
Future Considerations (Community Driven)
- Extended operation coverage (alignment, assembly)
- Additional format support (BAM/SAM, VCF)
- Publish to crates.io and PyPI
- Metal GPU acceleration (Mac-specific)
SRA Streaming: Analysis Without Downloads
One of biometal's most powerful features is direct streaming from NCBI's Sequence Read Archive (SRA) without local downloads.
Why This Matters
Traditional workflow:
- Download 5 GB SRA dataset → 30 minutes + 5 GB disk space
- Decompress → 15 GB disk space
- Process → Additional memory
- Total: 45 minutes + 20 GB resources before analysis even starts
biometal workflow:
- Start analysis immediately → 0 wait time, ~5 MB memory
- Stream directly from NCBI S3 → No disk space needed
- Background prefetching hides latency → Near-local performance
Supported Accessions
- SRR (Run): Most common, represents a sequencing run
- SRX (Experiment): Collection of runs
- SRS (Sample): Biological sample
- SRP (Study): Collection of experiments
Basic SRA Usage
use DataSource;
use ;
use FastqStream;
// Stream from SRA accession
let source = Sra;
let stream = new?;
for record in stream
Real-World Example: E. coli Analysis
# Run the E. coli streaming example
# Process ~250,000 reads with only ~5 MB memory
# No download required!
See examples/sra_ecoli.rs for complete example.
Performance Tuning
biometal automatically configures optimal settings for most use cases. For custom tuning:
use ;
let url = sra_to_url?;
let reader = new?
.with_prefetch_count // Prefetch 8 blocks ahead
.with_chunk_size; // 128 KB chunks
// See docs/PERFORMANCE_TUNING.md for detailed guide
SRA URL Conversion
use ;
// Check if string is SRA accession
if is_sra_accession
Memory Guarantees
- Streaming buffer: ~5 MB (constant)
- LRU cache: 50 MB (byte-bounded, automatic eviction)
- Prefetch: ~256 KB (4 blocks × 64 KB)
- Total: ~55 MB regardless of SRA file size
Compare to downloading a 5 GB SRA file → 99%+ memory savings
Examples
| Example | Dataset | Size | Demo |
|---|---|---|---|
| sra_streaming.rs | Demo mode | N/A | Capabilities overview |
| sra_ecoli.rs | E. coli K-12 | ~40 MB | Real SRA streaming |
| prefetch_tuning.rs | E. coli K-12 | ~40 MB | Performance tuning |
Example Use Cases
1. Large-Scale Quality Control
use ;
// Stream 5TB dataset without downloading
let stream = from_url?;
let mut total = 0;
let mut high_quality = 0;
for record in stream
println!;
2. BERT Preprocessing Pipeline (DNABert/ML)
use ;
use DataSource;
// Stream from SRA (no local copy!)
let source = Sra;
let stream = new?;
// Extract k-mers for DNABert training
for record in stream
Python equivalent:
=
# Extract k-mers for DNABert (k=3, 4, 5, or 6 typical)
=
# Feed to model - constant memory!
For large batches (10K+ sequences), use parallel extraction:
# Opt-in parallel for 2.2× speedup (Entry 034)
=
=
= # 2.2× faster
3. Metagenomics Filtering
use ;
let input = from_path?;
let mut output = create?;
for record in input
// Memory: constant ~5 MB
// Speed: 16-25× faster on ARM
FAQ
Why is the package called biometal-rs on PyPI but biometal everywhere else?
The biometal name was already taken on PyPI when we published v1.0.0, so we used biometal-rs (following the Rust convention). However:
- GitHub repository:
shandley/biometal - Python import:
import biometal(notbiometal_rs) - Rust crate:
biometal - PyPI package:
biometal-rs(install name only)
This means you install with:
But use it as:
# Not biometal_rs!
This is a common pattern for Rust-based Python packages and provides the best user experience (clean import name).
What platforms are supported?
Pre-built wheels available for:
- macOS ARM (M1/M2/M3/M4) - Optimized with NEON (16-25× speedup)
- macOS x86_64 (Intel Macs) - Scalar fallback
- Linux x86_64 - Scalar fallback
Coming soon:
- Linux ARM (Graviton, Raspberry Pi) - Will be added in v1.0.1
Build from source: All other platforms can build from the source distribution (requires Rust toolchain).
Does it work on Windows?
Currently untested. Building from source may work with the Rust toolchain installed, but we haven't validated it. Community contributions for Windows support are welcome!
Why ARM-native? What about x86_64?
biometal is designed to democratize bioinformatics by enabling world-class performance on consumer hardware. Modern ARM laptops (like MacBooks with M-series chips) cost $1,400 vs $50,000+ for traditional HPC servers.
Performance philosophy:
- Mac ARM (M1/M2/M3/M4): Optimized target - 16-25× NEON speedup
- Other platforms: Correct, production-ready code with scalar fallback
The library works great on x86_64 (all tests pass), it's just not specifically optimized for it. Our mission is enabling field researchers, students, and small labs in LMICs to do cutting-edge work on affordable hardware.
How do I get support?
- Bug reports: GitHub Issues
- Questions: GitHub Discussions
- Documentation: https://docs.rs/biometal
Contributing
We welcome contributions! biometal is built on evidence-based optimization, so new features should:
- Have clear use cases
- Be validated experimentally (when adding optimizations)
- Maintain platform portability
- Follow the optimization rules in OPTIMIZATION_RULES.md
See CLAUDE.md for development guidelines.
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Citation
If you use biometal in your research, please cite:
For the experimental methodology, see:
Status: v1.0.0 - Production Release 🎉 Released: November 5, 2025 Grade: A+ (rust-code-quality-reviewer) Tests: 121 passing (87 unit + 7 integration + 27 doc) Evidence Base: 1,357 experiments, 40,710 measurements Mission: Democratizing bioinformatics compute