Expand description
Layer assignment solvers for pipeline-parallel inference.
Determines how to partition a model’s decoder layers across multiple nodes. Two strategies:
- Proportional: layers proportional to available RAM (good default)
- Bandwidth-aware: minimize bottleneck link cost
With 2-4 nodes (typical Apple Silicon home cluster), exhaustive search over contiguous splits is feasible — no MILP solver needed.
Functions§
- assign_
layers_ bandwidth_ aware - Divide layers to minimize bottleneck latency, accounting for per-node bandwidth.
- assign_
layers_ proportional - Divide
num_layersacross nodes proportionally toavailable_ram.