genome
The jq of genomics. Fast, local, human-readable variant analysis.
$ genome query rs80357906
╭──────────────────────────────────────────────────────────────╮
│ rs80357906 · BRCA1 · Pathogenic │
├──────────────────────────────────────────────────────────────┤
│ Location chr17:43092919 (GRCh38) │
│ Change G > A (missense) │
│ Gene BRCA1 │
│ │
│ Clinical │
│ ├─ ClinVar Pathogenic (★★★★ reviewed) │
│ ├─ Condition Hereditary breast and ovarian cancer │
│ └─ Reviewed 2024-08-15 │
│ │
│ Population Frequency │
│ ├─ Global 0.00003 (1 in 33333) │
│ ├─ European 0.00004 │
│ ├─ African 0.00001 │
│ └─ East Asian 0.00002 │
╰──────────────────────────────────────────────────────────────╯
genome combines ClinVar, gnomAD, dbSNP, PharmGKB, and UniProt into one local database. Query any variant, annotate VCF files, compare two genomes, or predict variant effects with AlphaGenome. Everything runs on your machine. No data leaves your computer.
Install
From source (Rust)
Pre-built binaries
Download from GitHub Releases for macOS, Linux, and Windows.
Bioconda
Quick Start
Download the variant database (ClinVar + gnomAD exomes, ~5 GB).
Query a variant.
That's it. Everything is local from this point on.
Commands
genome query
Look up variants by rsID, genomic coordinates, HGVS notation, or gene name.
# By rsID
# By coordinates (chr:pos:ref:alt)
# By gene (shows known pathogenic variants)
# Multiple variants at once
# JSON output for scripting
# Compact output for pipelines
genome annotate
Annotate a VCF file against the local database. Streams the file record by record, so memory stays flat regardless of file size.
# Annotate a VCF
# Only clinically significant variants
# Generate an HTML report
# Pipe from bcftools
|
genome compare
Compare variants between two individuals. Shows shared vs. unique variants, estimates genetic relatedness, and highlights clinically significant differences.
# Basic comparison
# With kinship estimation
# Only clinical differences
# JSON output
Shared variants: 3,847,291 (87.2%)
Unique to person1: 298,412 (6.8%)
Unique to person2: 264,891 (6.0%)
Kinship estimate: 0.49
Likely relationship: Parent/Child
Clinically significant differences: 12
rs80357906 BRCA1 Pathogenic A=0/1 B=0/0
rs121913529 TP53 Pathogenic A=0/0 B=0/1
genome extract
Extract variants from CRAM or BAM alignment files. Streams the file without loading it into memory.
# Extract from CRAM
# Specific region only
# With quality filters
genome predict
Predict variant effects using Google DeepMind's AlphaGenome API. Opt-in: requires an API key and shows a clear warning before sending any data.
# Set your API key
# Predict a single variant
# Predict with specific output tracks
genome db
Manage the local variant database.
# Install (choose a tier)
# Check what's installed
# Update to latest
# Remove everything
genome config
Output Formats
Every command supports three output formats via --format.
human (default): Box-drawing terminal output with colors. Designed to be read by humans.
json: Structured JSON. Pipe to jq or consume from scripts.
|
compact: Tab-separated single-line output. One variant per line, easy to grep.
|
Database Sources
| Source | What | License | Included In |
|---|---|---|---|
| ClinVar | Clinical variant classifications | Public domain | lite, standard, full |
| gnomAD | Population allele frequencies | ODbL | standard, full |
| dbSNP | rsID reference catalog | Public domain | full |
| PharmGKB | Drug-gene-variant interactions | CC BY-SA 4.0 | full |
| UniProt | Protein-level variant annotations | CC BY 4.0 | full |
All databases are open and freely redistributable. genome downloads and indexes them locally during genome db install. No ongoing connection is required after installation.
Privacy
genome is built around a simple principle: your genomic data never leaves your machine.
- All file processing is local. VCF, BAM, and CRAM files are read and processed entirely on your computer.
- The database is local. Variant lookups query a SQLite file on disk. No API calls.
- AlphaGenome is opt-in. Prediction requests send only genomic coordinates (chr, pos, ref, alt) to Google's API. No sample identifiers or personal data. A warning is shown before the first call.
- No telemetry. No analytics, crash reports, or phone-home. The binary runs fully offline after database installation.
- No accounts. No signup, no login, no tokens (except the optional AlphaGenome API key).
Performance
| Operation | Target |
|---|---|
genome query rs12345 |
< 100 ms |
genome annotate throughput |
7,000+ variants/sec |
genome extract streaming |
50,000+ reads/sec |
genome compare (two WGS VCFs) |
< 5 min |
| Memory usage (any operation) | < 512 MB |
| Binary size | < 20 MB |
genome uses streaming I/O throughout. A 100 GB CRAM file uses the same amount of memory as a 100 MB one.
Architecture
┌─────────┐ ┌──────────────┐ ┌──────────────────┐ ┌──────────────┐
│ Input │───▶│ Streaming │───▶│ Annotation │───▶│ Output │
│ Files │ │ Parser │ │ Engine │ │ Formatter │
│ .cram │ │ (noodles) │ │ (local SQLite) │ │ │
│ .bam │ │ │ │ │ │ human │
│ .vcf │ └──────────────┘ └────────┬─────────┘ │ json │
└─────────┘ │ │ compact │
┌────────┴─────────┐ └──────────────┘
│ Local Database │
│ ClinVar, gnomAD │
│ dbSNP, PharmGKB │
└──────────────────┘
Built with:
- Rust for performance and single-binary distribution
- noodles for pure-Rust BAM/CRAM/VCF parsing with async streaming
- SQLite for the local variant database (zero config, fast indexed queries)
- AlphaGenome for variant effect prediction (optional API)
Disclaimer
genome is a research and exploration tool. It is not a clinical diagnostic tool and should not be used as the sole basis for medical decisions. Always consult a qualified healthcare professional or genetic counselor for clinical interpretation of genetic variants.
License
MIT