Module setup

Expand description

Zero-config setup protocol for master and worker nodes.

Runs before the normal inference lifecycle:

Workers advertise via mDNS, master discovers them.
Master computes layer assignments based on GPU VRAM.
Master connects to each worker, authenticates, assigns layers, and pushes model data if the worker doesn’t have it cached.
Workers load their assigned layers and signal readiness.

After setup, both sides proceed with normal Context::from_args() / inference.

Functions§

compute_layer_assignments: Compute layer assignments proportional to each worker’s estimated TFLOPS, accounting for the master’s own compute so it retains its fair share of layers. When layer_size_bytes > 0, each worker’s assignment is capped by its per-GPU VRAM to prevent out-of-memory errors on multi-GPU nodes. The master’s local layers are also capped by master_max_layers to avoid OOM.
master_setup: Run the full zero-config master setup.
worker_setup
worker_setup_with_progress