Skip to main content

Module layer_assignment

Module layer_assignment 

Source
Expand description

Layer assignment solvers for pipeline-parallel inference.

Determines how to partition a model’s decoder layers across multiple nodes. Two strategies:

  • Proportional: layers proportional to available RAM (good default)
  • Bandwidth-aware: minimize bottleneck link cost

With 2-4 nodes (typical Apple Silicon home cluster), exhaustive search over contiguous splits is feasible — no MILP solver needed.

Functions§

assign_layers_bandwidth_aware
Divide layers to minimize bottleneck latency, accounting for per-node bandwidth.
assign_layers_proportional
Divide num_layers across nodes proportionally to available_ram.