# Rho Live Tutorial Flow Notes
Repo used during this run:
```text
https://github.com/madhavajay/rho-live-clean-test-20260616-codex1
```
Users:
```text
owner: madhavajay
collaborator: madhavajay-test
```
Local project layout:
```text
~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1
~/rho/madhavajay-test/projects/rho-live-clean-test-20260616-codex1
```
## Final Tested Flow
### 1. Create Project
```sh
./rho --profile madhavajay repo create madhavajay/rho-live-clean-test-20260616-codex1 --public --yes
```
What happened:
- Created the local project under `~/rho/madhavajay/projects/...`.
- Initialized Git and Rho governance.
- Added `github/madhavajay` as owner.
- Protected the owner inbox path.
- Installed `rho-crypt` filters.
- Signed governance.
- Created the GitHub repo.
- Committed and pushed initial project state.
### 2. Collaborator Join PR
```sh
./rho --profile madhavajay-test repo join madhavajay/rho-live-clean-test-20260616-codex1 --pr
```
What happened:
- Forked the repo for `madhavajay-test`.
- Created/reused the checkout under `~/rho/madhavajay-test/projects/...`.
- Configured `upstream` for the owner repo and `origin` for the collaborator fork.
- Auto-detected the right SSH key for `madhavajay-test`.
- Created `madhavajay-test/join-rho`.
- Added the collaborator participant file.
- Opened the join PR.
### 3. Owner Admits Collaborator On Same PR
```sh
./rho --profile madhavajay repo admit-pr 1 \
--root ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1 \
--pr
```
What happened:
- Checked out PR `#1`.
- Verified `github/madhavajay-test`.
- Added collaborator membership and inbox permissions.
- Signed governance.
- Committed admission changes directly onto the join PR branch.
- Pushed back to the existing PR branch.
This is the preferred pattern: do not create a second admin PR for admission when the original PR branch can be updated.
### 4. Merge Join/Admission PR
```sh
./rho --profile madhavajay repo merge-pr 1 \
--root ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1 \
--merge \
--delete-branch
```
### 5. Publish Dataset
Create local fixture files:
```sh
mkdir -p ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1/data/private
mkdir -p ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1/data/mock
```
Generate the twin dataset:
```sh
./rho --profile madhavajay dataset \
--name prices \
--owner madhavajay \
--real ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1/data/private/prices-real.csv \
--mock ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1/data/mock/prices-mock.csv \
--share-dir ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1/users/madhavajay/datasets/share \
--private-dir ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1/users/madhavajay/datasets/private \
--uuid 0f1e2d3c-4b5a-4678-9abc-202606160001
```
Publish it with the clean name path:
```sh
./rho --profile madhavajay publish madhavajay 0f1e2d3c-4b5a-4678-9abc-202606160001 \
--source-root ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1/users \
--target-root ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1/datasets
```
Commit and push:
```sh
git -C ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1 add datasets
./rho --profile madhavajay commit -C ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1 -m "Publish prices mock dataset"
git -C ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1 push origin main
```
Final public path:
```text
datasets/prices/dataset.yaml
datasets/prices/mock/prices-mock.csv
```
The UUID stays inside `dataset.yaml`.
### 6. Collaborator Submits Request PR
Sync first:
```sh
./rho --profile madhavajay-test repo sync madhavajay/rho-live-clean-test-20260616-codex1
```
Create code:
```text
workspace/sum_prices.py
```
Submit request:
```sh
./rho --profile madhavajay-test request submit-run req-prices-total-001 \
madhavajay/rho-live-clean-test-20260616-codex1 \
--to madhavajay \
--tool run_real \
--dataset prices \
--code workspace/sum_prices.py \
--command "python3 sum_prices.py DATASET_CSV" \
--tier real \
--pr
```
What was inferred:
- `--root` from the repo slug plus `--profile`.
- `--from` from `--profile madhavajay-test`.
- Branch name as `madhavajay-test/req-prices-total-001`.
- Dataset UUID from `datasets/prices/dataset.yaml`.
### 7. Owner Reviews, Approves, Runs, And Pushes Back To Same PR
```sh
./rho --profile madhavajay run approve-pr 3 \
--root ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1 \
--private-root ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1/users/madhavajay/datasets/private \
--runner local \
--run-id run-run-real-prices-total-001 \
--action-id act-run-real-prices-total-001 \
--pr
```
What happened:
- Checked out PR `#3`.
- Verified the encrypted request and request signature.
- Verified code digest.
- Ran the mock dataset.
- Ran against the private real dataset.
- Wrote approval grant and run receipt.
- Committed only:
```text
rho/approval-grants/req-prices-total-001/
rho/run-receipts/req-prices-total-001/
```
- Pushed those artifacts back to the same request PR branch.
Observed outputs:
```text
mock: 30.00
real: 1113.03
```
### 8. Merge Request/Approval/Run PR
```sh
./rho --profile madhavajay repo merge-pr 3 \
--root ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1 \
--merge \
--delete-branch
```
### 9. Result Release
In this run, the result was released in a separate PR because PR `#3` had already been merged:
```sh
./rho --profile madhavajay result release req-prices-total-001 \
--root ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1 \
--to madhavajay-test \
--run-id run-run-real-prices-total-001 \
--runner local \
--pr
```
This opened PR `#4`, containing:
```text
rho/messages/inbox/id/github/madhavajay-test/req-prices-total-001/result.yaml
rho/messages/inbox/id/github/madhavajay-test/req-prices-total-001/result.rhosig.yaml
rho/messages/inbox/id/github/madhavajay-test/req-prices-total-001/attachments/stdout.txt
```
Important filename rule:
- The result file keeps its normal name, e.g. `stdout.txt` or `results.csv`.
- The working tree path is the normal filename.
- Git stores it encrypted through Rho filters.
- Do not rename it to `*.rhoenc` in the user-facing tree.
Merge result PR:
```sh
./rho --profile madhavajay repo merge-pr 4 \
--root ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1 \
--merge \
--delete-branch
```
### 10. Collaborator Verifies Result
```sh
./rho --profile madhavajay-test repo sync madhavajay/rho-live-clean-test-20260616-codex1
./rho --profile madhavajay-test result verify req-prices-total-001 \
--root ~/rho/madhavajay-test/projects/rho-live-clean-test-20260616-codex1 \
--to madhavajay-test
```
Expected attachment:
```text
~/rho/madhavajay-test/projects/rho-live-clean-test-20260616-codex1/rho/messages/inbox/id/github/madhavajay-test/req-prices-total-001/attachments/stdout.txt
```
Expected contents:
```text
1113.03
```
## Improvements Identified
### Prefer Updating Existing PRs
Default behavior should use the existing PR branch whenever possible:
- `repo admit-pr` should update the join PR, not create a second admin PR.
- `run approve-pr` should update the request PR with approval grants and run receipts.
- `result release` should be able to update the same request PR before it is merged.
Preferred flow for the next run:
```text
request PR:
collaborator request
owner approval grant
owner run receipt
owner encrypted result
merge PR when done
```
Only fall back to a new owner PR when the owner cannot push to the original PR branch.
### Result Release Should Target A Request PR
Implemented command:
```sh
rho --profile madhavajay result release-pr 3 \
--root ~/rho/madhavajay/projects/rho-live-clean-test-20260616-codex1 \
--run-id run-run-real-prices-total-001 \
--to madhavajay-test \
--pr
```
This should:
- Check out PR `#3`.
- Release the result into the collaborator inbox.
- Commit result files onto that same PR branch.
- Push back to the PR branch.
### Inline Payload Threshold
Small encrypted payloads can stay inline because they are convenient and atomic.
Large payloads should use attachment syntax and point to a same-name file.
Preferred rule:
```text
small payload <= inline threshold:
keep inline in YAML
large payload > inline threshold:
write attachment file with the normal filename
reference it from YAML metadata
```
Example:
```text
result.yaml
attachments/results.csv
```
Not:
```text
attachments/results.csv.rhoenc
```
The working tree should show:
```text
attachments/results.csv
```
The Git object should be encrypted by Rho.
### Preserve YAML Envelope Metadata For Actions
For structured action files, the outer YAML metadata is valuable and should remain inspectable:
```text
request.yaml
result.yaml
approval grant yaml
run receipt yaml
signature yaml
```
These can remain recipient-envelope YAML documents when protected. The metadata helps reviewers understand:
- what the object is
- who it is for
- what repo/path/purpose it belongs to
- which recipients can open it
The “avoid base64 in YAML” concern mainly applies to large data/artifact payloads, not small structured action envelopes.
### Dataset Paths Should Be Name-Based
Published public datasets should use:
```text
datasets/prices/
```
not:
```text
datasets/0f1e2d3c-4b5a-4678-9abc-202606160001/
```
The UUID should remain inside the manifest:
```yaml
dataset:
uuid: "0f1e2d3c-4b5a-4678-9abc-202606160001"
name: "prices"
```
Commands should accept either `prices` or the UUID, with ambiguity errors if needed.
Implemented first pass:
```sh
rho dataset --name prices --real data/private/prices-real.csv --mock data/mock/prices-mock.csv
rho dataset publish prices --pr
```
Local twin bundles now default to:
```text
users/<owner>/datasets/share/<dataset-name>/
users/<owner>/datasets/private/<dataset-name>/
```
The UUID stays in `dataset.yaml`; paths use the dataset name.
### External Public Dataset Variants
External data should be represented as a normal dataset variant, not as a
separate command family or a Git submodule by default.
Implemented first pass:
```sh
rho dataset set mydataset \
--public repo:huggingface:madhavajay/1kgp-bv-all \
--owner madhavajay \
--pr
```
This writes:
```text
datasets/mydataset/dataset.yaml
```
with:
```yaml
variants:
public:
tier: "public"
source:
kind: "huggingface"
repo: "madhavajay/1kgp-bv-all"
url: "https://huggingface.co/datasets/madhavajay/1kgp-bv-all"
revision: "main"
materialization:
mode: "on_demand"
path: ".rho/external/datasets/mydataset/public"
```
Public-only and mock-only dataset manifests are valid. This allows requests to
target a public/mock interface before a real private twin exists. The owner can
later attach or maintain a private `real` side that matches the same logical
interface.
Still needed:
- `rho dataset fetch <name> --variant public` to materialize external sources.
- Schema/interface checks to prove the private `real` side matches the public or
mock side.
### Profile-Global Private Bindings
Private real dataset locations should be reusable across projects and kept out
of Git. They live under the profile home:
```text
~/rho/<profile>/datasets/<dataset-name>/bindings.yaml
```
Add or update a private real binding:
```sh
rho --profile madhavajay dataset bind 1kgp-bv-all \
--root ~/rho/madhavajay/projects/<repo> \
--real /Users/madhavajay/data/private/1kgp-bv-all
```
The binding stores the repo dataset UUID when `--root` can resolve it. At run
time, Rho resolves real data by UUID first, then by dataset name. This keeps a
single real binding reusable across projects while avoiding accidental matches
between unrelated datasets with the same name.
Inspect repo datasets and local bindings:
```sh
rho --profile madhavajay dataset list \
--root ~/rho/madhavajay/projects/<repo>
```
Remove only the local private binding:
```sh
rho --profile madhavajay dataset remove 1kgp-bv-all --binding --yes
```
Remove only the repo dataset manifest folder:
```sh
rho --profile madhavajay dataset remove 1kgp-bv-all \
--root ~/rho/madhavajay/projects/<repo> \
--repo \
--yes
```
Owner approval can now omit `--private-root` when a matching binding exists:
```sh
rho --profile madhavajay run approve-pr 3 \
--root ~/rho/madhavajay/projects/<repo> \
--runner local \
--pr
```
### Profile Defaults
Commands should keep reducing repeated flags:
- infer `--from` from `--profile`
- infer project root from repo slug and profile
- accept `madhavajay` as shorthand for `github/madhavajay`
- use `~/rho/<profile>/projects/<repo>` by default
- auto-switch `gh` only when the active account differs
- auto-detect the matching SSH key for each GitHub profile
### Local-Only Paths
These should remain untracked:
```text
.rho/
data/
users/
```
Committed/public artifacts should live under:
```text
rho/
datasets/
workspace/
```
Private real datasets stay under the owner local `users/.../datasets/private` tree.