Skip to main content

Crate atomr_accel_train

Crate atomr_accel_train 

Source
Expand description

Distributed training blueprints on atomr-accel-cuda.

use atomr_accel_train::prelude::*;

Modules§

data_parallel
DataParallelTrainer — replicates a model across N replicas, splits a mini-batch evenly, runs forward+backward per replica, aggregates loss/grad-norm, and applies an optimizer step.
loss
Loss kinds.
optimizer
Optimizer kinds. F4 ships SGD and AdamW configs; the actual parameter-update kernels live in F4.x once the gradient buffers are flowing through NCCL.
parameter_server
AsyncParameterServer — central parameter store with async gradient pushes and async weight pulls.
pipeline_parallel
PipelineParallelTrainer — stage-pipelined model across N GPUs/actors.
prelude
Canonical re-exports. use atomr_accel_train::prelude::*;.
tensor_parallel
TensorParallelTrainer — weight-sharded matmul: each replica owns a slice of the weight matrix; activations are split, each shard runs a partial matmul, then results are summed via AllReduce.