Expand description
Tensor-Parallel model executor.
Wraps TpDecodeGroup for multi-GPU decode with NCCL all-reduce. Prefill uses candle on GPU 0; decode uses sharded runners on all GPUs.
Feature-gated: only available with tensor-parallel feature.
Tensor-Parallel model executor.
Wraps TpDecodeGroup for multi-GPU decode with NCCL all-reduce. Prefill uses candle on GPU 0; decode uses sharded runners on all GPUs.
Feature-gated: only available with tensor-parallel feature.