bv
A uv-style tool manager for bioinformatics.
bv installs bioinformatics tools as containers, pins them to exact digests in a lockfile, and makes any analysis environment reproducible with a single bv sync. Works with Docker on laptops and Apptainer/Singularity on HPC clusters; the same manifest, the same lockfile, either backend.
Quickstart
Requires Docker or Apptainer/Singularity and git. No other dependencies.
Install a runtime
Pick whichever fits your machine. Docker is typical on laptops; Apptainer is typical on shared HPC nodes.
# Docker (rootless, Linux). On a GPU box you'll also want nvidia-container-toolkit.
|
# Apptainer (no root needed, works on most HPC clusters)
Install bv
|
Or with Cargo:
Five commands to a reproducible analysis
Example: homology search pipeline (two tools)
bv run mounts your current directory as /workspace inside the container.
&&
# Download a sample protein sequence (human p53, ~400 aa)
# Add both tools at once
# Step 1: build a BLAST protein database
# Step 2: BLAST search (tabular output)
# Step 3: build an HMM profile from the BLAST hits
# Step 4: search with the HMM profile
bv run <binary> looks up the binary name in the project's binary index and routes to the right container automatically. No need to specify the tool name.
Your project directory:
homology-project/
bv.toml # declares blast + hmmer
bv.lock # pinned image digests and binary index
.bv/bin/ # generated shims (gitignored)
p53.fasta
p53_db.* # BLAST database files
blast_hits.tsv
p53.hmm
hmmer_hits.txt
Commit the project files; collaborators reproduce the exact environment:
# On another machine:
&&
Using tools from scripts and pipelines
bv exec
bv exec runs any command with all project binaries prepended to PATH. It is the right form for scripts, Makefiles, and CI.
On Unix, bv exec replaces itself with the child process via exec(2). Signals, exit codes, and HPC schedulers see the child directly; there is no extra layer in ps.
Makefile:
:
Snakemake:
:
:
:
:
bv shell
bv shell starts an interactive subshell with all project binaries on PATH. The prompt changes to show the active project.
()
()
()
Exiting the subshell returns to the original environment cleanly. BV_ACTIVE is set to the project name while inside, so scripts can detect activation.
Binary routing
Every binary a tool exposes is listed in bv.lock and gets a shim in .bv/bin/. bv run <binary> and bv exec <binary> both route through this index.
Binary Tool
----------------------------
blastn blast 2.15.0
blastp blast 2.15.0
makeblastdb blast 2.15.0
tblastn blast 2.15.0
hmmbuild hmmer 3.3.2
hmmsearch hmmer 3.3.2
hmmscan hmmer 3.3.2
If two tools expose the same binary name, bv lock fails with a clear error. Resolve it in bv.toml:
[]
= "samtools" # this tool wins when multiple tools expose samtools
Discovery: bv search and the registry website
# Search for tools by name, description, or I/O type
# Browse the full registry with filters at:
# https://tejasprabhune.github.io/bv-registry/
Each tool in the registry carries a tier:
| Tier | Meaning |
|---|---|
core |
Typed I/O complete, from a recognized publisher, actively maintained |
community |
Typed I/O present, basic checks pass |
experimental |
Basic checks pass; may lack typed I/O. Hidden by default. |
Typed I/O and tool introspection
Manifests declare typed inputs and outputs from the bv-types vocabulary. This powers composition, validation, and integrations.
# Human-readable schema
# Stable JSON output (for scripting)
# MCP tool descriptor (for Claude and other AI assistants)
# JSON Schema for the tool's inputs
Example MCP output:
Backend selection: Docker and Apptainer
bv auto-detects the available runtime. Docker is preferred on laptops; Apptainer is preferred on HPC clusters where Docker is unavailable.
Pin the backend in bv.toml:
[]
= "apptainer" # docker | apptainer | auto (default)
Or use the BV_BACKEND environment variable:
&&
GPU support works on both backends:
| Backend | GPU flag |
|---|---|
| Docker | --gpus all (nvidia-container-toolkit required) |
| Apptainer | --nv (uses host NVIDIA libraries) |
The manifest declares the GPU requirement; the runtime handles the flag automatically.
Cache mounts
Apptainer runs containers with a read-only root filesystem, so any tool that downloads model weights or scratches to disk inside the image will fail (e.g. ColabFold writing to /cache/colabfold). bv binds writable host directories into the container for paths that the tool needs to write. The set of paths is resolved in three layers:
- Tool manifest (
cache_pathsin the registry entry) — the tool author's authoritative list. ColabFold's manifest declarescache_paths = ["/cache/colabfold"]. - User overrides (
[[cache]]inbv.toml) — point any container path at a different host directory (e.g. a shared NFS cache). - Apptainer fallbacks — for tools whose manifest hasn't declared any cache paths yet, bv auto-binds the well-known
/cacheand/root/.cacheso common bioconda images don't fail outright.
Each container path defaults to a host directory under ~/.cache/bv/<tool>/. Docker skips the apptainer fallbacks (writable upper layer covers the same need); manifest and user entries apply on both backends.
User overrides in bv.toml:
# applies to every tool; {tool} is replaced with the tool id
[[]]
= "*"
= "/cache"
= "~/.cache/bv/{tool}"
# tool-specific: redirect colabfold weights to a shared cache
[[]]
= "colabfold"
= "/cache/colabfold"
= "/srv/shared/colabfold-weights"
Tool authors declare what their image needs in the registry manifest:
[]
= "colabfold"
# ...
= ["/cache/colabfold"]
Conformance testing
bv conformance <tool> pulls the tool's image and smoke-tests every binary it exposes. For each binary in [tool.binaries], bv tries --version, -version, --help, -h, -v, version (in that order) and considers the binary alive if any of them exits 0. This catches broken images, missing shared libraries, and binaries that segfault on startup.
Most tools need no extra config. For unusual binaries, add a [tool.smoke] block to the manifest:
[]
= { = "--check" } # pin a specific probe arg
= ["server-daemon"] # binaries with no safe probe arg
Conformance runs in CI on every PR to bv-registry. Today it's a smoke check only; running tools on canonical inputs and validating typed outputs is on the v2 roadmap.
Publishing a tool
# From a local directory with a Dockerfile
# From a GitHub repo (auto-clones it)
# Non-interactive (reads bv-publish.toml)
# Build and inspect the manifest without pushing
Interactive example with a new Python tool:
&&
# Detected requirements.txt (Python)
# Generated Dockerfile.bv
#
# Tool name [my-docking-tool]:
# Version [0.1.0]:
# Description: Fast molecular docking
#
# Inputs
# Add input? [y/n]: y
# Name: ligand
# Type (? to list): pdb
# Mount path [/workspace/ligand]: /workspace/ligand.pdb
# Add another? [y/n]: n
#
# Building image as ghcr.io/bv-registry/my-docking-tool:0.1.0 ...
# PR opened: https://github.com/tejasprabhune/bv-registry/pull/143
For automated publishing on every GitHub release, add to .github/workflows/bv-publish.yml:
on:
release:
types:
jobs:
publish:
uses: tejasprabhune/bv/.github/workflows/bv-publish.yml@main
with:
tool-name: my-docking-tool
secrets:
GHCR_TOKEN: ${{ secrets.GHCR_TOKEN }}
BV_REGISTRY_TOKEN: ${{ secrets.BV_REGISTRY_TOKEN }}
Auto-ingestion from Bioconda: bv-ingest
bv-ingest scrapes Bioconda recipes and auto-generates draft manifests for any tool that has a BioContainers image. Binary names are extracted from recipe test.commands and build.run_exports and written into [tool.binaries] automatically.
# Ingest 10 tools from Bioconda (dry run)
# Ingest a specific tool
# Review manifests that need typed I/O
# Promote a reviewed manifest to the main registry
The nightly GitHub Actions workflow runs automatically and opens PRs to bv-registry for newly discovered tools.
Reference data
For tools that need large reference databases:
Project files
bv.toml declares what you want:
[]
= "homology-project"
[]
= "https://github.com/tejasprabhune/bv-registry"
[]
= "auto" # optional; defaults to auto-detect
[[]]
= "blast"
= "=2.15.0"
[[]]
= "hmmer"
bv.lock pins the exact state, including the binary routing index:
= 1
[]
= "blast"
= "2.15.0"
= "ncbi/blast:2.15.0"
= "sha256:abc123..."
= "sha256:def456..."
= "2024-01-15T10:00:00Z"
= ["blastn", "blastp", "makeblastdb", "tblastn", "tblastx"]
[]
= "hmmer"
= "3.3.2"
= "quay.io/biocontainers/hmmer:3.3.2--h87f3376_2"
= "sha256:789abc..."
= ["hmmbuild", "hmmsearch", "hmmscan", "jackhmmer", "phmmer"]
[]
= "blast"
= "blast"
= "blast"
= "hmmer"
= "hmmer"
= "hmmer"
Both files belong in version control. bv run always uses the pinned digest. .bv/ (the generated shim directory) is gitignored automatically.
Reproducibility in CI
- run: bv sync --frozen # fails if bv.toml and bv.lock are inconsistent
- run: bv lock --check # fails if bv.lock would change
- run: bv exec snakemake --cores 4
Commands
| Command | Description |
|---|---|
bv add <tool>[@ver] |
Add tools and pull their images |
bv remove <tool> |
Remove a tool |
| `bv run <binary | tool> []` |
bv exec <command> |
Run a command with all project binaries on PATH |
bv shell [--shell <sh>] |
Start an interactive subshell with binaries on PATH |
bv list |
Show installed tools with tier, digest, and size |
bv list --binaries |
Show the binary routing table |
bv search <query> |
Search the registry (text, type, tier filters) |
bv show <tool> |
Show typed I/O schema and metadata |
bv info <tool> |
Show lockfile-level detail |
bv lock [--check] |
Regenerate bv.lock; --check exits 1 if anything changed |
bv sync [--frozen] |
Pull all locked images and regenerate shims |
bv conformance <tool> |
Run the conformance test suite for a tool |
bv publish <source> |
Build and publish a tool to bv-registry |
bv data fetch <dataset> |
Download a reference dataset |
bv data list |
List locally cached datasets |
bv doctor |
Check runtimes, hardware, cache, and project state |
The registry
Tools live in tejasprabhune/bv-registry, a plain git repo of TOML manifests:
bv-registry/
tools/
blast/2.14.0.toml 2.15.0.toml
hmmer/3.3.2.toml
mmseqs2/17.0.0.toml
colabfold/1.6.0.toml
proteinmpnn/1.0.1.toml
data/
pdbaa/2024_01.toml
index.json # generated search index
Browse and filter at https://tejasprabhune.github.io/bv-registry/
A full manifest:
[]
= "blast"
= "2.15.0"
= "BLAST+ Basic Local Alignment Search Tool"
= "https://blast.ncbi.nlm.nih.gov/Blast.cgi"
= "Public Domain"
= "core"
= ["github:ncbi"]
[]
= "docker"
= "ncbi/blast:2.15.0"
[]
= 4
= 8.0
= 2.0
[[]]
= "query"
= "fasta"
= "one"
= "Query sequences in FASTA format"
[[]]
= "output"
= "blast_tab"
= "one"
= "Tabular alignment results (outfmt 6)"
[]
= "blastn"
= "-query {query} -db {db} -out {output} -num_threads {cpu_cores}"
[]
= [
"blastn", "blastp", "tblastn", "tblastx",
"makeblastdb", "blastdbcmd", "blastdb_aliastool",
]
The [tool.smoke] block is optional and only needed for unusual binaries. Most manifests omit it.
Multi-script tools (subcommands)
ML repos typically bundle several entry scripts (train.py, sample.py, eval.py). Declare these as [tool.subcommands] instead of cramming them through one [tool.entrypoint]:
[]
= "genie2"
= "1.0.0"
[]
= "docker"
= "ghcr.io/aqlaboratory/genie2:1.0.0"
[]
= 4
= 16.0
[]
= true
= 16
= "12.1"
[]
= ["python", "genie/train.py"]
= ["python", "genie/sample_unconditional.py"]
= ["python", "genie/sample_scaffold.py"]
Use:
Subcommand names stay namespaced under the tool id, so generic names (train, eval) don't collide across tools. They are not added to PATH and don't appear in bv list --binaries (use [tool.binaries] for that). A manifest must declare [tool.entrypoint], [tool.subcommands], or both.
The default registry is used automatically. Override with --registry <url> or BV_REGISTRY=<url> for private registries.
Workspace layout
| Crate | Role |
|---|---|
bv-cli |
Binary, clap CLI, command implementations |
bv-core |
Manifest/lockfile types, cache layout, errors |
bv-runtime |
ContainerRuntime trait + Docker implementation |
bv-runtime-apptainer |
Apptainer/Singularity implementation |
bv-index |
IndexBackend trait + Git registry implementation |
bv-types |
Bioinformatics type vocabulary (20 types) |
bv-conformance |
Conformance test runner for registry manifests |
Development
See CONTRIBUTING.md for contribution guidelines.
License
Apache-2.0. See LICENSE.