<div align="center">
<img src="docs/assets/rho-icon.png" alt="rho" width="104" height="104">
# rho
**A partial solution to the lethal trifecta for Agents.**
A secure, decentralized network for agentic data science: collaborate
end-to-end-encrypted over git repos (and other storage) with anyone, then let
agents run tool calls against private data — without the agent ever touching the
data, the keys, or the network it could leak through.
[](https://crates.io/crates/rho-cli)
[](https://github.com/madhavajay/rho/actions/workflows/ci.yml)
[](./LICENSE)

</div>
---
## rho = Pi + Gondolin + Nostr + Git
Modern agents are dangerous in exactly the situation data science needs them:
pointed at **private data**, fed **untrusted content** (a collaborator's code, a
dataset description, a web page), and handed the **ability to communicate
outward**. Simon Willison calls that combination the
[**lethal trifecta**](https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/) —
any one of the three is fine; together they let an attacker turn "summarize this"
into "exfiltrate everything."
rho doesn't pretend prompt injection is solved. Instead it **breaks the
trifecta apart** and wires four proven pieces together so the agent never holds
all three capabilities at once:
| 🤖 **[Pi](https://pi.dev)** | The agent harness. Plans, reads allowed files, writes code, and *proposes* tool calls — but is treated as untrusted and can never self-approve a protected action. |
| ⧉ **[Gondolin](https://github.com/earendil-works/gondolin)** | Local Linux micro-VM sandbox. Protected code runs here with host-side, default-deny network and read-only data mounts — JavaScript-programmable policy, not the honor system. |
| 🔑 **[Nostr](https://nostr.com)** | Decentralized identity + messaging. Every account has a permanent `id/rho/…` and a Nostr controller key; encrypted messages and signed records replicate across relays, no central server required. |
| 🌳 **[Git](https://git-scm.com)** | The collaboration substrate. Repos are the workspace, **files are the protocol**, and everything sensitive is encrypted *in the git objects* while staying readable in your working tree. |
The result: collaborators share work over ordinary git repos, the agent does the
thinking on **mock data**, and the one moment real data is touched happens in a
sandbox, behind a deterministic host check, after an explicit human grant.
## How it works
rho's central bet is a hard split between **planning** and **protected
execution**:
```mermaid
flowchart LR
subgraph U[Collaborator · untrusted]
A[Pi agent] -->|writes code +<br/>signed request| R[encrypted request<br/>in git PR]
end
R --> H
subgraph O[Owner · host-controlled]
H{deterministic<br/>validation} -->|sig ✓ · digest ✓<br/>· policy ✓| G[⧉ Gondolin<br/>micro-VM]
H -.->|reject| X[no run]
G -->|real data<br/>read-only| OUT[result]
end
OUT -->|encrypted to<br/>requester| REL[release in git]
```
- **Agents propose, they don't decide.** A collaborator's agent may write code
and emit a *signed request* — it cannot grant itself access, run against real
data, or release an output. Those are deterministic host actions plus optional
human approval.
- **Protected tools only ever run in one place.** `rho run` is the single trust
boundary: it validates the request signature and code digest, checks policy,
then executes inside Gondolin with a fixed environment, synthetic DNS, and a
default-deny network. The agent that wrote the code is nowhere near it.
- **Files are the protocol.** Identities, permissions, requests, approval grants,
run receipts, and results are all inspectable, versioned files. Sensitive ones
are encrypted with [`age`](https://github.com/FiloSottile/age)-based recipient
envelopes via git clean/smudge filters — the ciphertext lives in git, the
plaintext only in authorized working trees.
### Twins: mock data for thinking, real data for answers
Every dataset is a **twin** — a private `real` side that never leaves the owner's
machine, and a `mock` side that's committed to the repo. Collaborators (and their
agents) develop and test against the mock; the real side is mounted read-only,
inside the sandbox, only after the owner grants the exact action and input
hashes. Mock generation preserves shape and semantics while minimizing leakage
from the source.
```text
datasets/prices/
dataset.yaml # name, uuid, schema — committed
mock/prices-mock.csv # shareable twin — committed
~/rho/alice/.../private/prices/real/prices-real.csv # never committed
```
## Quickstart
**CLI** — straight from crates.io, or from source:
```sh
# from crates.io
cargo install rho-cli --bin rho
# or from source (Rust toolchain required)
git clone git@github.com:madhavajay/rho.git
cd rho
./install.sh # cargo install --path . --bin rho
rho --version
```
**Desktop app** — download from the
[Releases](https://github.com/madhavajay/rho/releases) page *(macOS / Linux /
Windows binaries coming soon)*.
Create an identity backed by your GitHub handle and a freshly generated Nostr
controller key:
```sh
rho id init --github alice --generate-ssh-key --display-name "Alice"
```
### End-to-end: from empty repo to released result
The whole collaboration lives in one git repo, mostly in one PR. `--profile`
selects which identity acts (here **alice** owns the data, **bob** collaborates).
**Stage 1 — Owner creates the project**
`git init` + governance + GitHub repo + initial push, all signed.
```sh
rho --profile alice repo create alice/genomics --public --yes
```
**Stage 2 — Collaborator joins, owner admits**
Bob opens a join PR carrying his public identity; alice admits him on the same
PR and merges.
```sh
rho --profile bob repo join alice/genomics --pr
rho --profile alice repo admit-pr 1 --pr
rho --profile alice repo merge-pr 1 --merge --delete-branch
```
**Stage 3 — Owner publishes a twin dataset**
The private `real` side stays local; the `mock` side is committed for everyone
to develop against.
```sh
rho --profile alice dataset --name prices \
--real data/private/prices-real.csv \
--mock data/mock/prices-mock.csv
rho --profile alice publish alice <uuid>
```
**Stage 4 — Collaborator requests a run**
Bob (or his agent) writes code against the mock, then submits a *signed* request
for a real-data run — opening a PR. He cannot run it himself.
```sh
rho --profile bob request submit-run req-001 alice/genomics \
--to alice --tool run_real --dataset prices \
--code workspace/sum_prices.py \
--command "python3 sum_prices.py DATASET_CSV" --tier real --pr
```
**Stage 5 — Owner approves and runs in the sandbox**
alice's host validates the signature and code digest, then executes inside
Gondolin against the real data — results pushed back to the same PR.
```sh
rho --profile alice run approve-pr 3 --runner gondolin --pr
```
**Stage 6 — Owner releases, collaborator verifies**
The result is encrypted to bob and released; bob verifies the full chain from
request → grant → receipt → result.
```sh
rho --profile alice result release req-001 --to bob --pr
rho --profile bob result verify req-001
```
Nothing proprietary lands in history: every stage is plain `git` plus signed,
encrypted files. A reviewer can read each governance change in the diff.
## Commands
| **Identity** | `rho id init · show · export · import · list · verify-github` |
| **Repo & collaboration** | `rho repo create · join · admit-pr · merge-pr · sync · doctor · protect-path · install-filters · create-pr` |
| **Data (twins)** | `rho dataset --name … · set --public · bind · list · remove` · `rho publish` |
| **Requests & runs** | `rho request submit-run · pending · review` · `rho run approve-pr · proposal-action · grant-action · controlled-action · status` |
| **Results** | `rho result release · release-pr · verify` |
| **Crypto** | `rho crypto sign · verify · view` |
| **Repo plumbing** | `rho status · commit · gh · env · version` |
Every command takes `--profile <identity>` for multi-identity work and aims to
infer the rest (root, `--from`, SSH key, `gh` account) from context. Run
`rho --help` for the grouped list, or `rho repo doctor` to validate a checkout.
## Desktop app
rho also ships as a [Tauri desktop app](desktop) (macOS / Linux / Windows) that
drives the same `rho_core` engine directly — no CLI shell-out. It collapses the
flow above to **Add** and **Create**: roles are auto-detected from repo state and
outsiders auto-join, so the cryptography and git choreography stay out of the way.
## Status
rho is early and built in the open. What works end-to-end today (covered by
scenario tests in `tests/e2e/`):
- ✅ GitHub-backed identities with Nostr controller keys, signed and exportable
- ✅ Join / admit / merge collaboration over real PRs
- ✅ Recipient-encrypted inbox paths and transparent repo-key paths via git filters
- ✅ Signed governance, approval grants, run receipts, and a verifiable result chain
- ✅ Twin datasets (real/mock), incl. Hugging Face public sources
- ✅ Sandboxed real-data runs through Gondolin with host-side network/FS policy
- ✅ Tamper-rejection across envelopes, signatures, and code digests
The roadmap — pluggable storage/transport/identity providers, a Nostr relay for
discovery and chat, and first-class `join`/`admit` ergonomics — lives in
[TODO.md](TODO.md). Design notes are in [docs/](docs/architecture/overview.md),
and the identity model is written up in [identity.md](identity.md).
## Development
```sh
./rho <command> # run the debug build via the dev shim, e.g. ./rho status
./test.sh # unit + scenario tests
bash tests/e2e/local-git-pi-sandbox-encrypted.sh # cached e2e
RHO_LOCAL_GIT_PI_LIVE=1 bash tests/e2e/local-git-pi-sandbox-encrypted.sh # live Pi + Gondolin
```
Agent contributors: see [AGENTS.md](AGENTS.md).
## License
[Apache-2.0](./LICENSE)
</content>
</invoke>