genome-sh 0.1.0

The jq of genomics. Fast, local, human-readable variant analysis.
genome-sh-0.1.0 is not a library.

genome

The jq of genomics. Fast, local, human-readable variant analysis.

$ genome query rs80357906

╭──────────────────────────────────────────────────────────────╮
│  rs80357906 · BRCA1 · Pathogenic                             │
├──────────────────────────────────────────────────────────────┤
│  Location     chr17:43092919 (GRCh38)                        │
│  Change       G > A (missense)                               │
│  Gene         BRCA1                                          │
│                                                              │
│  Clinical                                                    │
│  ├─ ClinVar       Pathogenic (★★★★ reviewed)                 │
│  ├─ Condition     Hereditary breast and ovarian cancer        │
│  └─ Reviewed      2024-08-15                                 │
│                                                              │
│  Population Frequency                                        │
│  ├─ Global        0.00003 (1 in 33333)                       │
│  ├─ European      0.00004                                    │
│  ├─ African       0.00001                                    │
│  └─ East Asian    0.00002                                    │
╰──────────────────────────────────────────────────────────────╯

genome combines ClinVar, gnomAD, dbSNP, PharmGKB, and UniProt into one local database. Query any variant, annotate VCF files, compare two genomes, or predict variant effects with AlphaGenome. Everything runs on your machine. No data leaves your computer.

Install

From source (Rust)

cargo install genome-sh

Pre-built binaries

Download from GitHub Releases for macOS, Linux, and Windows.

Bioconda

conda install -c bioconda genome-sh

Quick Start

Download the variant database (ClinVar + gnomAD exomes, ~5 GB).

genome db install standard

Query a variant.

genome query rs80357906

That's it. Everything is local from this point on.

Commands

genome query

Look up variants by rsID, genomic coordinates, HGVS notation, or gene name.

# By rsID
genome query rs80357906

# By coordinates (chr:pos:ref:alt)
genome query chr17:43092919:G:A

# By gene (shows known pathogenic variants)
genome query BRCA2

# Multiple variants at once
genome query rs80357906 rs121913529 rs28897696

# JSON output for scripting
genome query rs80357906 --format json

# Compact output for pipelines
genome query rs80357906 --format compact

genome annotate

Annotate a VCF file against the local database. Streams the file record by record, so memory stays flat regardless of file size.

# Annotate a VCF
genome annotate input.vcf.gz -o annotated.vcf.gz

# Only clinically significant variants
genome annotate input.vcf.gz --filter clinical

# Generate an HTML report
genome annotate input.vcf.gz --report report.html

# Pipe from bcftools
bcftools view sample.vcf.gz chr17 | genome annotate - --format json

genome compare

Compare variants between two individuals. Shows shared vs. unique variants, estimates genetic relatedness, and highlights clinically significant differences.

# Basic comparison
genome compare person1.vcf.gz person2.vcf.gz

# With kinship estimation
genome compare person1.vcf.gz person2.vcf.gz --kinship

# Only clinical differences
genome compare person1.vcf.gz person2.vcf.gz --clinical

# JSON output
genome compare person1.vcf.gz person2.vcf.gz --format json
Shared variants:    3,847,291 (87.2%)
Unique to person1:    298,412 (6.8%)
Unique to person2:    264,891 (6.0%)

Kinship estimate:   0.49
Likely relationship: Parent/Child

Clinically significant differences: 12
  rs80357906     BRCA1      Pathogenic            A=0/1 B=0/0
  rs121913529    TP53       Pathogenic            A=0/0 B=0/1

genome extract

Extract variants from CRAM or BAM alignment files. Streams the file without loading it into memory.

# Extract from CRAM
genome extract sample.cram --ref GRCh38 -o variants.vcf.gz

# Specific region only
genome extract sample.cram --region chr17:43000000-44000000

# With quality filters
genome extract sample.bam --min-quality 30 --min-depth 10

genome predict

Predict variant effects using Google DeepMind's AlphaGenome API. Opt-in: requires an API key and shows a clear warning before sending any data.

# Set your API key
genome config set alphagenome-api-key <your-key>

# Predict a single variant
genome predict chr17:43092919:G:A

# Predict with specific output tracks
genome predict chr17:43092919:G:A --tracks expression,splicing

genome db

Manage the local variant database.

# Install (choose a tier)
genome db install lite       # ClinVar only (~170 MB download)
genome db install standard   # + gnomAD exomes (~5 GB download)
genome db install full       # + dbSNP, PharmGKB, UniProt (~15 GB download)

# Check what's installed
genome db status

# Update to latest
genome db update

# Remove everything
genome db remove all

genome config

genome config set format json              # Default output format
genome config set reference GRCh38         # Default reference genome
genome config set alphagenome-api-key KEY  # AlphaGenome API key
genome config list                         # Show all settings

Output Formats

Every command supports three output formats via --format.

human (default): Box-drawing terminal output with colors. Designed to be read by humans.

json: Structured JSON. Pipe to jq or consume from scripts.

genome query BRCA1 --format json | jq '.[].clinvar.significance'

compact: Tab-separated single-line output. One variant per line, easy to grep.

genome query BRCA1 --format compact | grep "Pathogenic"

Database Sources

Source What License Included In
ClinVar Clinical variant classifications Public domain lite, standard, full
gnomAD Population allele frequencies ODbL standard, full
dbSNP rsID reference catalog Public domain full
PharmGKB Drug-gene-variant interactions CC BY-SA 4.0 full
UniProt Protein-level variant annotations CC BY 4.0 full

All databases are open and freely redistributable. genome downloads and indexes them locally during genome db install. No ongoing connection is required after installation.

Privacy

genome is built around a simple principle: your genomic data never leaves your machine.

  • All file processing is local. VCF, BAM, and CRAM files are read and processed entirely on your computer.
  • The database is local. Variant lookups query a SQLite file on disk. No API calls.
  • AlphaGenome is opt-in. Prediction requests send only genomic coordinates (chr, pos, ref, alt) to Google's API. No sample identifiers or personal data. A warning is shown before the first call.
  • No telemetry. No analytics, crash reports, or phone-home. The binary runs fully offline after database installation.
  • No accounts. No signup, no login, no tokens (except the optional AlphaGenome API key).

Performance

Operation Target
genome query rs12345 < 100 ms
genome annotate throughput 7,000+ variants/sec
genome extract streaming 50,000+ reads/sec
genome compare (two WGS VCFs) < 5 min
Memory usage (any operation) < 512 MB
Binary size < 20 MB

genome uses streaming I/O throughout. A 100 GB CRAM file uses the same amount of memory as a 100 MB one.

Architecture

┌─────────┐    ┌──────────────┐    ┌──────────────────┐    ┌──────────────┐
│ Input    │───▶│  Streaming   │───▶│  Annotation      │───▶│  Output      │
│ Files    │    │  Parser      │    │  Engine          │    │  Formatter   │
│ .cram    │    │  (noodles)   │    │  (local SQLite)  │    │              │
│ .bam     │    │              │    │                  │    │  human       │
│ .vcf     │    └──────────────┘    └────────┬─────────┘    │  json        │
└─────────┘                                  │              │  compact     │
                                    ┌────────┴─────────┐    └──────────────┘
                                    │  Local Database   │
                                    │  ClinVar, gnomAD  │
                                    │  dbSNP, PharmGKB  │
                                    └──────────────────┘

Built with:

  • Rust for performance and single-binary distribution
  • noodles for pure-Rust BAM/CRAM/VCF parsing with async streaming
  • SQLite for the local variant database (zero config, fast indexed queries)
  • AlphaGenome for variant effect prediction (optional API)

Disclaimer

genome is a research and exploration tool. It is not a clinical diagnostic tool and should not be used as the sole basis for medical decisions. Always consult a qualified healthcare professional or genetic counselor for clinical interpretation of genetic variants.

License

MIT