Skip to main content

Module tp_executor

Module tp_executor 

Source
Expand description

Tensor-Parallel model executor.

Wraps TpDecodeGroup for multi-GPU decode with NCCL all-reduce. Prefill uses candle on GPU 0; decode uses sharded runners on all GPUs.

Feature-gated: only available with tensor-parallel feature.