Skip to main content

exec_launch

Function exec_launch 

Source
pub fn exec_launch(
    node: &NodeConfig,
    model_path: &Path,
    data_path: &Path,
    checkpoint_dir: &Path,
    rank: u32,
    epochs: u32,
) -> Result<Child, String>
Expand description

Execute a training job on a remote or local node.

For local nodes, spawns the process directly. For SSH nodes, pipes the command via stdin to ssh host bash.

Returns the child process handle for monitoring.