burn_p2p 🔥🤝
burn_p2p turns a burn learner into a decentralized training network.
core shape:
- trainers lease shard slices, sync the latest visible canonical head, run one train window, then publish candidate updates
- reducers are non-authoritative proposal builders; they can reduce early, but validators independently rescreen the cohort and can locally re-reduce if reducer output is missing or wrong
- only validator quorum can attest a reduction, issue merge/quorum certificates, and advance the canonical head
- quarantined or revoked peers are filtered out of future trainer, reducer, and validator pools
- cheap bootstrap/coherence seeds handle ingress, discovery, relay fallback, and browser-edge http state
- the same network can include native peers, browser peers, viewers, reducers, validators, and trainer pools on different hardware classes
install
[]
= { = "=0.21.2", = ["burn"] }
happy path
use Learner;
use ;
use from_loaders;
let mut trainer = from_loaders
.trainer?
.with_network?
.with_storage
.with_bootstrap_peer
.spawn?;
let experiment = trainer.experiment;
let mut session = trainer.continuous_trainer?;
let outcome = session.train_next_window?;
println!;
keep your existing burn model, optimizer, scheduler, and loaders.
use train_window_once(...) instead when you want a single strictly
orchestrated training window with no retained session state.
for revisions using TrainingProtocol::DiLoCo, use train_protocol_once(...)
or diloco_round_once(...). train_window_once(...) intentionally stays an
artifact-window primitive so artifact publication and DiLoCo checkpoint rounds
do not share an ambiguous return contract.
burn_p2p handles:
- head sync
- lease-scoped shard assignment
- window-by-window training publication
- checkpoint/artifact movement
- reducer proposal flow
- validator attestation and promotion
- peer discovery, relay fallback, and control-plane sync
safety boundary
the important trust split is:
- trainer updates are candidate inputs, not canonical state
- reducer output is a proposal, not canonical state
- validator quorum is the authority boundary for canonical promotion
that means:
- a bad trainer can be rejected, downweighted, quarantined, or left out of the merge
- a bad reducer can waste time or bandwidth, but validators still recompute the expected accepted cohort locally before promotion
- if a dedicated reducer is silent or serves a mismatched aggregate, validators fall back to local reduction instead of letting the reducer stall the window
- a canonical head only moves after validator attestation and merge promotion
the main operating guidance for adversarial or semi-trusted deployments is:
- keep reducers and validators as separate roles
- run more than one validator;
validator_quorum = 1is a lab mode, not a safe default - use admission/auth for untrusted membership
- treat reducers as optional acceleration, not as authorities
this is a decentralized training protocol, not a full bft ledger. if validator quorum is compromised, canonical safety is compromised too.
most deployments should separate:
- cheap bootstrap/coherence seeds
- reducer nodes
- validator / authority nodes
- trainer pools
the reference deploy shape follows that split directly:
- bootstrap seeds are public ingress/discovery nodes
- validators and reducers stay private by default
- browser edge is an explicit opt-in profile, not the baseline split-fleet role
for trainer nodes, from_loaders(...) is still the main public entrypoint. use
from_learner(...) for reducer, validator, viewer, and helper-style runtime
roles.
data
one lease is one micro-epoch. that is the unit that drives publish cadence and canonical reconcile.
use with_sharded_dataset(...) when data already lives as prepared shard
files.
use LeaseDataPipeline<Device, Batch> when batches should be rebuilt from
indices, samplers, seeds, recipes, or custom lease metadata.
pipeline kinds stay simple:
ShardedStatic: shard filesIndexedDataset: dataset + sampler scopeGeneratedDataset: deterministic generationCustom: anything else
burn uses .with_data_pipeline(...). python/torch uses
PythonTorchProject::new_with_data_pipeline(...).
both adapters expose the same inspection surface:
data_pipeline_descriptor()data_pipeline_kind()local_upstream_root()
local_upstream_root() only returns Some(...) for local shard-backed
pipelines.
native peers exchange control-plane state, heads, checkpoints, and artifacts over the peer network.
browser peers fetch only the active lease-scoped shard data through the browser
edge. today that path is peer-backed (p2p-artifact-via-edge): native peers
sync the prepared shard bundle over the overlay, and the edge serves only the
leased slice to the browser.
workspace map
crates:
burn_p2p: public runtime facade, trainer/reducer/validator orchestration, validation, and promotion flowburn_p2p_app: reference dioxus app and browser-edge product surfaceburn_p2p_auth_external: trusted-ingress and provider-mapped external auth connectorsburn_p2p_auth_github: github-focused auth connector wrapper with github api defaultsburn_p2p_auth_oauth: generic oauth connector wrapperburn_p2p_auth_oidc: oidc connector wrapperburn_p2p_bootstrap: coherence seed, browser-edge http surface, auth issuance, and deployment-facing daemonburn_p2p_browser: browser runtime bridge and wasm transport glueburn_p2p_checkpoint: checkpoint encoding, replay, and import/export surfacesburn_p2p_core: shared ids, schema, manifests, signatures, and protocol wire typesburn_p2p_dataloader: prepared-shard and lease-data-loader supportburn_p2p_engine: execution/runtime engine glue used by higher-level rolesburn_p2p_experiment: experiment/revision/head metadata and experiment-local helpersburn_p2p_limits: capability probing, budgets, and placement heuristicsburn_p2p_metrics: runtime metrics and reporting surfaceburn_p2p_publish: artifact export, publication targets, and download ticketsburn_p2p_python: subprocess-backed python/torch workload adapterburn_p2p_security: auth connectors, admission policy, certificate issuance, and peer auth verificationburn_p2p_social: social/profile/feed-facing data surfaceburn_p2p_swarm: native libp2p transport, discovery, relay/rendezvous, and control-plane syncburn_p2p_testkit: deterministic harnesses, multiprocess soak tools, and integration supportburn_p2p_views: query/view models used by app/bootstrap/social surfacesburn_p2p_workload: backend-neutral workload and lease-data-pipeline seam
top-level support:
examples/mnist_p2p_demo: mixed-fleet burn demo used bycargo xtask e2e mnistexamples/torch_mnist_p2p_demo: python/torch subprocess-backed mnist demo on the same runtimedeploy/: reference configs, compose stacks, and cloud deployment assetsdocs/: protocol, operator, deployment, and roadmap docsxtask/: repo automation for ci, demos, deploy flows, and release/publish checks
same experiment layout works across native and browser peers. browser-facing runtime and ui live in the companion crates above.
see it working
single-machine mixed-fleet mnist sanity run:
python/torch adapter sanity run:
best follow-up docs:
- docs/examples/mnist.md
- docs/downstream-burn-guide.md
- docs/learning-dynamics.md
- docs/protocol-shape.md
- docs/formal-verification-plan.md
- docs/production-roadmap.md
- deploy/README.md
- docs/operator-runbook.md
- docs/features.md
non-burn runtime? implement P2pWorkload directly.