pub struct DistributedDataParallel<M: Module> { /* private fields */ }Expand description
Wrapper that enables distributed data parallel training.
DDP replicates the model across multiple processes and synchronizes gradients during the backward pass.
Implementations§
Source§impl<M: Module> DistributedDataParallel<M>
impl<M: Module> DistributedDataParallel<M>
Sourcepub fn new(module: M, process_group: ProcessGroup) -> Self
pub fn new(module: M, process_group: ProcessGroup) -> Self
Creates a new DDP wrapper.
Sourcepub fn broadcast_buffers(self, broadcast: bool) -> Self
pub fn broadcast_buffers(self, broadcast: bool) -> Self
Sets whether to broadcast buffers from rank 0.
Sourcepub fn gradient_as_bucket_view(self, bucket_view: bool) -> Self
pub fn gradient_as_bucket_view(self, bucket_view: bool) -> Self
Sets whether to use gradient bucketing.
Sourcepub fn module_mut(&mut self) -> &mut M
pub fn module_mut(&mut self) -> &mut M
Returns a mutable reference to the underlying module.
Sourcepub fn process_group(&self) -> &ProcessGroup
pub fn process_group(&self) -> &ProcessGroup
Returns the process group.
Sourcepub fn sync_parameters(&mut self)
pub fn sync_parameters(&mut self)
Synchronizes model parameters across all processes. Should be called once at the start of training to ensure all ranks start from identical parameters (broadcast from rank 0).
Sourcepub fn sync_gradients(&self)
pub fn sync_gradients(&self)
Synchronizes gradients across all processes. Should be called after the backward pass. All-reduces gradients so every rank gets the average gradient across all ranks.
Trait Implementations§
Source§impl<M: Module> Module for DistributedDataParallel<M>
impl<M: Module> Module for DistributedDataParallel<M>
Source§fn is_training(&self) -> bool
fn is_training(&self) -> bool
Returns whether the module is in training mode. Read more
Source§fn named_parameters(&self) -> HashMap<String, Parameter>
fn named_parameters(&self) -> HashMap<String, Parameter>
Returns named parameters of this module.
Source§fn num_parameters(&self) -> usize
fn num_parameters(&self) -> usize
Returns the number of trainable parameters.
Source§fn set_training(&mut self, _training: bool)
fn set_training(&mut self, _training: bool)
Sets the training mode.
Sets the training mode. Read more
Auto Trait Implementations§
impl<M> Freeze for DistributedDataParallel<M>where
M: Freeze,
impl<M> !RefUnwindSafe for DistributedDataParallel<M>
impl<M> Send for DistributedDataParallel<M>
impl<M> Sync for DistributedDataParallel<M>
impl<M> Unpin for DistributedDataParallel<M>where
M: Unpin,
impl<M> UnsafeUnpin for DistributedDataParallel<M>where
M: UnsafeUnpin,
impl<M> !UnwindSafe for DistributedDataParallel<M>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more