check_build
A fast, memory-efficient tool to verify VCF files against hg19 and hg38 reference genomes. Also available as a library for general-purpose use beyond VCF.
Quick Start
What build is my file?
# Output: Hg38 (100.0% match, high confidence)
Full verification:
Installation
Or from source:
Usage
CLI
# Simple build detection
# Full verification with summary
# Quiet mode (no progress bars)
# Summary only (no mismatch details)
# Single reference
# Custom reference paths
Library
Add to Cargo.toml:
[]
= { = "https://github.com/SauersML/check_build" }
Simple usage:
use detect_build;
let result = detect_build?;
println!; // "Hg38 (100.0% match, high confidence)"
Full control:
use ;
let result = new
.quiet
.verify_both?;
println!;
println!;
// Detailed detection with edge case handling
match result.detect
Features
- Fast: Parallel verification of hg19/hg38 using rayon
- Memory-efficient: Streams references, processes one contig at a time
- Auto-download: Fetches reference FASTAs if not present
- Edge case handling: Detects ambiguous, unknown, or corrupt files
- Dual interface: Both CLI and library
How It Works
- Splits VCF by contig into temp files
- Streams each reference FASTA (never loads full genome)
- Verifies REF alleles match reference bases
- Reports match rates and infers build
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success (build detected or verification passed) |
| 1 | Error (file not found, download failed, etc.) |
| 2 | Ambiguous (matches both builds similarly) |
| 3 | Unknown (low match on both, possibly corrupt) |
| 4 | No data (VCF had no valid variants) |
License
MIT