rivet-cli 0.7.7

Not sure if Rivet fits your problem? docs/who-is-this-for.md is a 60-second fit-check.

rivet basic workflow — init, doctor, check, run, state

30-second quickstart

brew install panchenkoai/rivet/rivet

export DATABASE_URL="postgresql://user:pass@host/db"
rivet init --source-env DATABASE_URL --table orders -o rivet.yaml
rivet run -c rivet.yaml

Output: Parquet files in ./output/. Full walkthrough: docs/getting-started.md. Want to try without your own DB? docs/pilot/demo-quickstart.md runs the whole flow against a pre-seeded 14-table fixture in ~10 min.

Why Rivet

Rivet tries to make database extraction boring:

Plan before running — rivet plan seals the extraction intent into a reviewable JSON artifact before any writes happen. Review it like a migration.
Protect the source — server-side cursor + FETCH N on PostgreSQL (longest single query: 0.19s on a 2M-row table); adaptive PK-range chunking on MySQL (9s, vs 137–208s for alternatives). Neither shape holds an open transaction for minutes.
Knows you're behind a pooler — auto-detects pgBouncer / Odyssey on Postgres and ProxySQL / MaxScale on MySQL. Uses SET LOCAL inside RAII-guarded transactions so session state never leaks into the pool.
Write in resumable units — chunk checkpoints, not one giant transaction. The job can crash, the network can blip, the next rivet run --resume continues from the last committed chunk.
Record everything — run journal, file manifest, schema-drift tracker, all in .rivet_state.db. Every run is reconstructible. rivet state shows exactly what committed and what didn't.
Validate outputs — quality gates (row count, null ratio, uniqueness via xxHash3), rivet validate, rivet reconcile, rivet repair. Know before your downstream pipeline does.
Notice when the source changes — column adds/removes/retypes trigger on_schema_drift: warn|continue|fail on the next run. Shape drift in TEXT/JSON columns is tracked via byte-width sampling.

The execution contract behind each of these — what is guaranteed, what is at-least-once, what isn't covered — is in docs/semantics.md.

Trust contracts

Question	Where to look
What happens if the process is killed mid-export?	docs/semantics.md § Crash semantics
What does Rivet not guarantee?	docs/semantics.md § Known non-guarantees
What is actually tested in PR CI vs nightly vs manual?	docs/reliability-matrix.md
Which PostgreSQL / MySQL versions are exercised?	docs/reference/compatibility.md
How are credentials handled? Where do sensitive artifacts land?	SECURITY.md
What permissions does Rivet need on S3 / GCS / Azure?	docs/cloud-permissions.md
How were the benchmark numbers produced — can I rerun them?	docs/bench/

Sensitive local artifacts. Generated files — .rivet_state.db, plan.json, *.journal.jsonl, and exported Parquet/CSV — may contain query SQL, cursor values, table metadata, and the data itself. Do not commit them. See SECURITY.md § Sensitive local artifacts for a .gitignore snippet.

Source pressure, measured

"Source-safe" is easy to claim and hard to verify, so Rivet publishes a reproducible cross-tool benchmark harness against identical fixtures (22 PG tables / 17 MySQL tables, including a 2M-row × 20-column content_items table).

The primary metric is longest single SQL statement — the one that decides whether your DBA's statement_timeout cuts you off mid-run.

PostgreSQL — server-side cursor enables sub-second longest query

Tool	Longest single query	Peak RSS
rivet	0.19s (`FETCH 142 FROM _rive`)	443 MB
dlt	1.20s (`FETCH FORWARD 10000`) — 3.2 GB temp_bytes	1.4 GB
sling	134s (`SELECT * FROM content_items`)	6.0 GB

MySQL — no server-side cursor; chunked range scans are the fastest available shape

Tool	Longest single query	Peak RSS
rivet	9s (chunked + cursor)	280 MB
sling	137s	6.3 GB
dlt	208s	1.2 GB

The MySQL gap vs PostgreSQL is architectural: PostgreSQL exposes DECLARE … CURSOR / FETCH N which lets Rivet issue tiny sub-queries server-side; MySQL's protocol does not have a widely-supported equivalent in the current client stack. See MySQL parity roadmap for what's planned.

Failure count across all tables: rivet 0 / 22 (PG), 0 / 17 (MySQL). At least one other tool in the suite failed at least one table.

How Rivet wins these axes is not magic — it's the deliberately boring extraction shape: PK-auto-resolved chunks, a server-side cursor with a work_mem-aware FETCH N cap on PG, and an Arrow-memory-budgeted row buffer on MySQL. The «one big SELECT * into a giant client-side buffer» shape that most alternatives use produces both the multi-minute single-query holds and the multi-GB RSS.

The numbers above use each tool at its defaults. We also published a steelman re-run that gives each competitor its best plausible configuration. Short version: on narrow tables the gap closes; on the wide content_items fixture Rivet's edge survives largely intact.

Methodology, exact configs, raw gtime -v output, and DB-side counter deltas: docs/bench/ — one-command repro.

AI-native DB observability — `rivet-mcp`

rivet-mcp is a Model Context Protocol server binary that lets an AI agent answer "is this database healthy enough to extract from right now?" — before any rows are touched.

Exposed read-only surfaces:

PostgreSQL — pg_stat_activity (active queries, lock waits, idle-in-transaction), pg_stat_statements top I/O, checkpoint pressure (pg_stat_bgwriter), pgBouncer pool saturation and client wait time
MySQL — SHOW PROCESSLIST (running queries and duration)

Works out-of-the-box with Claude Desktop, Claude Code, and any MCP-compatible client. Runs as a separate binary — never requires write access to the source database.

rivet-mcp --source-env DATABASE_URL

Add to your MCP client config:

{
  "mcpServers": {
    "rivet": {
      "command": "rivet-mcp",
      "env": { "DATABASE_URL": "postgresql://..." }
    }
  }
}

What Rivet is (and is not)

What Rivet does	What you bring
Queries PostgreSQL 12–16 and MySQL 5.7 / 8.0	The database and credentials
Streams rows → Arrow → Parquet or CSV	A destination (local path, S3 bucket, GCS bucket, Azure container)
Retries failed batches with exponential backoff	Orchestration (cron, Airflow, dbt, etc.)
Validates row counts, null ratios, and uniqueness	Your warehouse or downstream pipeline
Checkpoints progress — resume after crashes	Schema management on the warehouse side
Protects the source DB — longest single query ~0.2s on PG / ~9s on MySQL on 2M-row tables	—

Supported destinations: local filesystem, Amazon S3, Google Cloud Storage, Azure Blob Storage, stdout. Export modes: full, incremental (cursor-based), chunked, time_window. Formats: Parquet (zstd / snappy / gzip / lz4 / none) and CSV.

Not for you if you need:

CDC / streaming — Rivet reads a snapshot per run; no replication slot or event log. Use Debezium or Estuary.
Connectors to SaaS sources — no Salesforce, Stripe, HubSpot, etc. Use Airbyte or Fivetran.
An integrated extract-and-load product — Rivet stops at "file in a bucket." Use dlt or Sling if you want the warehouse load included.
Loading or transformation — bring dbt, Spark, or your own loader.
A Kubernetes data platform — Rivet runs as a single binary in a Job or CronJob; a full operator is a different architecture.

Documentation language: English-only. See CONTRIBUTING.md.

Core workflow

rivet init      # scaffold config from a live DB (discovers tables, infers cursors)
rivet doctor    # verify credentials and destination auth before the run
rivet check     # validate config logic, warn about chunking and cursor choices
rivet plan      # seal execution intent — reviewable JSON artifact, no writes yet
rivet run       # execute the plan; checkpoint each chunk
rivet validate  # verify row counts and manifest against the destination

Branch commands: rivet apply (apply a saved plan), rivet reconcile (compare manifest vs destination), rivet repair (re-upload orphaned chunks), rivet state (inspect progression and checkpoints).

For a first run, rivet init + rivet run is enough. The full workflow is for production pipelines where "it ran" is not sufficient — you need a verifiable record of what was written.

Stateless deployment

By default Rivet keeps cursors, manifests, chunk checkpoints, and the run journal in a SQLite file (.rivet_state.db) next to your config — perfect for local and single-node runs. For ephemeral containers / Kubernetes pods, set RIVET_STATE_URL to a PostgreSQL connection string and Rivet creates and migrates the state schema on first connect — no manual DDL, no init job. Details: docs/reference/cli.md § State backend.

export RIVET_STATE_URL="postgresql://rivet:secret@state-db.internal/rivet_state?sslmode=require"
rivet run -c rivet.yaml

More walkthroughs

plan / apply · plan campaign — multi-export waves · reconcile + repair · parallel cards UI · composite cursor (COALESCE fallback) · pool detection · discovery artifact (rivet init --discover) · post-run inspect. Source scripts in docs/gifs/.

Installation

Names. The project and CLI are Rivet; the command is rivet. On crates.io the package is published as rivet-cli because the crate name rivet was already taken. Homebrew and release archives install the rivet binary.

Homebrew (macOS / Linux) — recommended

brew install panchenkoai/rivet/rivet
rivet --version

cargo install (crates.io)

Requires Rust 1.94+:

cargo install rivet-cli
rivet --version

Pre-built binaries

Download the latest release for your platform from GitHub Releases:

# macOS (Apple Silicon)
curl -L https://github.com/panchenkoai/rivet/releases/latest/download/rivet-aarch64-apple-darwin.tar.gz | tar xz
sudo mv rivet-*/rivet /usr/local/bin/

# macOS (Intel)
curl -L https://github.com/panchenkoai/rivet/releases/latest/download/rivet-x86_64-apple-darwin.tar.gz | tar xz
sudo mv rivet-*/rivet /usr/local/bin/

# Linux (x86_64)
curl -L https://github.com/panchenkoai/rivet/releases/latest/download/rivet-x86_64-unknown-linux-gnu.tar.gz | tar xz
sudo mv rivet-*/rivet /usr/local/bin/

# Linux (arm64)
curl -L https://github.com/panchenkoai/rivet/releases/latest/download/rivet-aarch64-unknown-linux-gnu.tar.gz | tar xz
sudo mv rivet-*/rivet /usr/local/bin/

rivet --version

Docker

docker run --rm ghcr.io/panchenkoai/rivet:latest --version

docker run --rm \
  -e DATABASE_URL="postgresql://user:pass@host.docker.internal:5432/db" \
  -v $(pwd)/examples/rivet.yaml:/config/rivet.yaml \
  -v $(pwd)/output:/output \
  ghcr.io/panchenkoai/rivet:latest \
  run --config /config/rivet.yaml

From a container, localhost is not your machine. Use host.docker.internal (Docker Desktop) or --add-host=host.docker.internal:host-gateway on Linux. See Getting Started for details.

Build from source