Skip to main content

Module tensor_parallel

Module tensor_parallel 

Source
Expand description

TensorParallelTrainer — weight-sharded matmul: each replica owns a slice of the weight matrix; activations are split, each shard runs a partial matmul, then results are summed via AllReduce.

F6 ships the public surface + a host-side reference. Each shard implements ShardProtocol which receives a partial input slice and returns its partial output. The trainer collects all partials and sums them.

Structs§

ShardStepResult
TensorParallelConfig
TensorParallelTrainer

Enums§

TensorParallelMsg

Traits§

ShardProtocol