bam_tide
Fast, reproducible BAM → binned coverage exporters and validation utilities written in Rust.
This repository currently ships two main command-line tools:
bam-coverage— compute binned coverage from a position-sorted BAM and write bedGraph or BigWig (depending on the build/features of your binary).bw-compare— validate Rust-generated BigWigs against a reference (typically deeptoolsbamCoverage) and report per-chromosome and global statistics, including a finalTOTALline.
Goal: provide bamCoverage-compatible filtering semantics (include/exclude SAM flags, MAPQ, duplicates, secondary/supplementary) with much higher performance and simple, scriptable reproducibility.
Installation
Option 1: Build from source (recommended)
Prerequisites:
- Rust toolchain (stable) via
rustup - A C toolchain (for some dependencies) as needed on your system
Binaries will be available here:
Option 2: Use inside a container (optional)
If you run on HPC and prefer containers, you can build an Apptainer/Singularity image around a release build of these binaries. (A recipe is not included yet—see Future developments.)
Quickstart
1) Generate coverage with bam-coverage
Minimal example:
Common options:
--width 50to change bin size (default: 50)--min-mapping-quality 30to require MAPQ ≥ 30--sam-flag-exclude 256to exclude secondary alignments (deeptools-compatible)--sam-flag-include 64to include only Read1
Example (exclude secondary + supplementary alignments):
--sam-flag-excludeis a bitmask: any read with(flag & mask) != 0is discarded.
2) Validate results with bw-compare
Compare a Python (deeptools) BigWig to a Rust BigWig:
This prints a per-chromosome report and finishes with a TOTAL line that summarizes all chromosomes.
Write the report to a file:
See the full benchmark results here:
Command reference
bam-coverage --help
Shared CLI options for coverage exporters (bedGraph / bigWig)
Usage: bam-coverage [OPTIONS] --bam <BAM> --outfile <OUTFILE>
Options:
-b, --bam <BAM>
Input BAM file (sorted by chromosome position)
-o, --outfile <OUTFILE>
Output file (bedGraph or BigWig depending on binary)
-n, --normalize <NORMALIZE>
Normalize the data somehow
[default: not]
[possible values: not, rpkm, cpm, bpm, rpgc]
-w, --width <WIDTH>
Bin width for coverage calculation
[default: 50]
--only-r1
Collect only R1 areas
--min-mapping-quality <MIN_MAPPING_QUALITY>
Minimum mapping quality to include a read
[default: 0]
--include-secondary
Include secondary alignments
--include-supplementary
Include supplementary alignments
--include-duplicates
Include duplicate-marked reads
--sam-flag-exclude <SAM_FLAG_EXCLUDE>
Exclude reads with ANY of these SAM flag bits set (equivalent to deeptools --samFlagExclude).
The value is a bitmask of SAM flags. Any read with (read_flag & mask) != 0 will be discarded.
Examples:
256 -> exclude secondary alignments
512 -> exclude QC-failed reads
1024 -> exclude PCR/optical duplicates
2048 -> exclude supplementary alignments
2816 -> exclude secondary + QC-fail + supplementary
Default: None (no flag-based exclusion, matches bamCoverage defaults)
--sam-flag-include <SAM_FLAG_INCLUDE>
Include only reads that have ALL of these SAM flag bits set. Applied after the exclusion test (equivalent to deeptools --samFlagInclude).
The value is a bitmask of SAM flags. A read is kept only if: (read_flag & mask) == mask
Examples:
64 -> include only read1
128 -> include only read2
2 -> include only properly paired reads
Default: None (no include constraint, matches bamCoverage defaults)
-h, --help
Print help (see a summary with '-h')
bw-compare --help
Compare two BigWig files (typically python bamCoverage vs. rust bam-coverage) and report per-chromosome and global differences.
The tool bins both BigWigs with the same bin width and compares the values position by position. It reports several statistics describing how different the signals are.
Output columns:
n_over_epsNumber of bins where|python - rust| > epsfrac_n_over_epsFraction of bins over epsmean_absMean absolute differencevar_absVariance of absolute differencesrmseRoot mean squared errormax_absMaximum absolute differencepearson_rhoPearson correlation between tracks
A final TOTAL line summarizes all chromosomes.
If --outfile is not given, a report file will be created automatically:
bw_compare_<rust_basename>_w<bin_width>.txt
Usage: bw-compare [OPTIONS] --python-bw <FILE> --rust-bw <FILE>
Options:
--python-bw <FILE>
Python-generated BigWig (reference)
--rust-bw <FILE>
Rust-generated BigWig (to be validated)
--bin-width <INT>
Bin width used during coverage generation (must match both files)
[default: 50]
--eps <FLOAT>
Epsilon threshold for counting a bin as different (|python - rust| > eps)
[default: 0.00001]
--outfile <FILE>
Optional output file for the comparison report.
If not provided, the report is written to:
bw_compare_<rust_basename>_w<bin_width>.txt
-h, --help
Print help (see a summary with '-h')
-V, --version
Print version
Reproducibility notes
To get reproducible and comparable coverage tracks:
- Use a position-sorted BAM
bam-coverageexpects the BAM to be sorted by chromosome position.
- Match bin width
bam-coverage --widthandbw-compare --bin-widthmust match (and must also match yourbamCoverage --binSizeif you compare to deeptools).
- Match filtering semantics
- MAPQ threshold:
--min-mapping-quality - Include/exclude:
--sam-flag-exclude/--sam-flag-include - Secondary/supplementary/duplicates:
--include-secondary,--include-supplementary,--include-duplicates
- MAPQ threshold:
- Normalization
- Ensure both tools use the same normalization scheme (e.g. CPM/RPKM/BPM/RPGC) and genome size settings (where applicable).
- Floating point
- Minor floating point differences can occur; use
bw-compare --epsto set a meaningful tolerance and rely on correlation (pearson_rho) plus RMSE/mean absolute difference for validation.
- Minor floating point differences can occur; use
Example validation workflow
# Python reference (deeptools)
# Rust candidate
# Compare
Benchmarks (example)
In a small validation experiment across multiple SAM flag exclusion masks, the Rust implementation produced numerically indistinguishable results (Pearson r ≈ 1.0) while being ~10× faster and using ~10× less peak memory than deeptools bamCoverage (exact numbers depend on dataset, IO, and chosen flags).
Future developments
Planned improvements to increase parity with deeptools bamCoverage and to improve reproducibility/UX:
- Feature parity (selected examples)
- more read extension / fragment handling options (paired-end fragment coverage semantics)
- region-restricted output (chromosome/interval subsets)
- smoothing / rolling aggregation options
- advanced normalization configurations (explicit effective genome size, RPGC parameters)
- blacklist / region exclusion
- Better packaging
- published releases with prebuilt binaries for common Linux targets
- container recipes for Apptainer/Singularity and Docker
- Performance and QA
- expanded test suite (known-answer tests and randomized property tests)
- continuous benchmarking and regression checks
- improved bw-compare reporting (optional CSV/TSV output, plots)
If you rely on a particular bamCoverage option that is currently missing, please open an issue describing:
- the exact deeptools CLI you use
- a small test BAM (or synthetic minimal example)
- the expected behavior
License
See LICENSE in this repository.
Acknowledgements
- Inspired by the deeptools
bamCoverageinterface and semantics. - Uses the Rust ecosystem for HTS parsing and BigWig writing (see
Cargo.tomlfor dependencies).