pub fn exec_launch(
node: &NodeConfig,
model_path: &Path,
data_path: &Path,
checkpoint_dir: &Path,
rank: u32,
epochs: u32,
) -> Result<Child, String>Expand description
Execute a training job on a remote or local node.
For local nodes, spawns the process directly.
For SSH nodes, pipes the command via stdin to ssh host bash.
Returns the child process handle for monitoring.