Skip to main content

TrainCommands

Enum TrainCommands 

Source
pub enum TrainCommands {
    Plan {
Show 17 fields data: Option<PathBuf>, model_size: String, model_path: Option<PathBuf>, num_classes: usize, task: String, config: Option<PathBuf>, output: PathBuf, strategy: String, budget: usize, scout: bool, max_epochs: usize, learning_rate: Option<f32>, lora_rank: Option<usize>, batch_size: Option<usize>, val_data: Option<PathBuf>, test_data: Option<PathBuf>, format: String,
}, Apply {
Show 21 fields plan: Option<PathBuf>, config: Option<PathBuf>, task: String, data: Option<PathBuf>, model_size: String, model_path: Option<PathBuf>, num_classes: usize, output: PathBuf, strategy: String, budget: usize, scout: bool, max_epochs: usize, learning_rate: Option<f32>, lora_rank: Option<usize>, batch_size: Option<usize>, distributed: bool, world_size: Option<usize>, rank: Option<usize>, coordinator_addr: Option<String>, deterministic: bool, seed: Option<u64>,
}, Watch { config: PathBuf, max_restarts: usize, heartbeat_timeout: u64, backoff_initial: u64, backoff_max: u64, }, Sweep { config: PathBuf, strategy: String, num_configs: usize, output_dir: PathBuf, seed: u64, }, Archive { checkpoint_dir: PathBuf, output: PathBuf, version: Option<String>, notes: Option<String>, }, Submit { cluster: PathBuf, model: PathBuf, adapters: Vec<String>, rank: u32, epochs: u32, budget_mb: u64, dry_run: bool, }, ClusterStatus { cluster: PathBuf, }, }
Expand description

Training pipeline subcommands (forjar-style plan/apply).

Thin CLI wrappers around entrenar’s training plan/apply infrastructure.

Variants§

§

Plan

Generate a training plan without touching the GPU.

Validates data quality, checks model compatibility, builds HPO search space, estimates resource usage, and runs pre-flight checks. Outputs a serializable plan manifest (text, JSON, or YAML).

Analogous to forjar plan — shows what will happen before committing GPU time.

Fields

§data: Option<PathBuf>

Path to training data (JSONL) — required for –task classify

§model_size: String

Model size: “0.5B”, “9B”, “7B”, “13B”

§model_path: Option<PathBuf>

Path to model weights directory

§num_classes: usize

Number of output classes

§task: String

Task type: classify, pretrain

§config: Option<PathBuf>

YAML training config (for –task pretrain)

§output: PathBuf

Output directory for checkpoints

§strategy: String

HPO strategy: tpe, grid, random, manual

§budget: usize

HPO budget (number of trials)

§scout: bool

Scout mode: 1 epoch per trial for fast exploration

§max_epochs: usize

Maximum epochs per trial

§learning_rate: Option<f32>

Manual learning rate (only used with –strategy manual)

§lora_rank: Option<usize>

Manual LoRA rank (only used with –strategy manual)

§batch_size: Option<usize>

Manual batch size (only used with –strategy manual)

§val_data: Option<PathBuf>

Validation data file (JSONL)

§test_data: Option<PathBuf>

Test data file (JSONL)

§format: String

Output format: text, json, yaml

§

Apply

Execute a training plan (allocate GPU, run trials).

Reads a previously generated plan (YAML/JSON) and executes it:

  • Manual strategy: single training run with specified hyperparameters
  • HPO strategy: multiple trials with automatic hyperparameter tuning

Analogous to forjar apply — commits resources and executes the plan.

Fields

§plan: Option<PathBuf>

Path to a saved plan file (YAML or JSON from apr train plan)

§config: Option<PathBuf>

YAML training config (for –task pretrain)

§task: String

Task type: classify, pretrain

§data: Option<PathBuf>

Path to training data (JSONL)

§model_size: String

Model size: “0.5B”, “9B”, “7B”, “13B”

§model_path: Option<PathBuf>

Path to model weights directory

§num_classes: usize

Number of output classes

§output: PathBuf

Output directory for checkpoints and leaderboard

§strategy: String

HPO strategy: tpe, grid, random, manual

§budget: usize

HPO budget (number of trials)

§scout: bool

Scout mode: 1 epoch per trial

§max_epochs: usize

Maximum epochs per trial

§learning_rate: Option<f32>

Manual learning rate (only used with –strategy manual)

§lora_rank: Option<usize>

Manual LoRA rank (only used with –strategy manual)

§batch_size: Option<usize>

Manual batch size (only used with –strategy manual)

§distributed: bool

Enable distributed data-parallel training

§world_size: Option<usize>

Total number of workers (default: auto-detect GPUs)

§rank: Option<usize>

This worker’s global rank (default: 0 = coordinator)

§coordinator_addr: Option<String>

Coordinator address for distributed training (default: 0.0.0.0:9000)

§deterministic: bool

Enable bitwise deterministic training (CUBLAS_WORKSPACE_CONFIG, cuDNN deterministic)

§seed: Option<u64>

Random seed for reproducibility (default: from YAML or 42)

§

Watch

Watch a training run with automatic restart on crash and hang detection.

Monitors a running or to-be-started training process:

  • Detects crashes (SIGABRT, SIGSEGV, OOM) and restarts with backoff
  • Detects hangs via heartbeat/training_state.json staleness
  • Captures GPU state and crash diagnostics
  • Auto-enables CUDA_LAUNCH_BLOCKING on async crash pattern

Sovereign Rust replacement for train-guard.sh.

Fields

§config: PathBuf

YAML training config to run and watch

§max_restarts: usize

Maximum number of restart attempts

§heartbeat_timeout: u64

Heartbeat staleness threshold in seconds

§backoff_initial: u64

Initial backoff delay in seconds

§backoff_max: u64

Maximum backoff delay in seconds

§

Sweep

Generate hyperparameter sweep configs from a base YAML.

Creates N training configs with varied hyperparameters using grid or random search. Each config is a complete YAML that can be passed to apr train apply --task pretrain --config <file>.

Sovereign Rust replacement for hyperparam-sweep.py.

Fields

§config: PathBuf

Base YAML training config to sweep from

§strategy: String

Search strategy: grid or random

§num_configs: usize

Number of configs to generate (random) or max combinations (grid)

§output_dir: PathBuf

Output directory for generated configs

§seed: u64

Seed for random search reproducibility

§

Archive

Archive a checkpoint into a release bundle.

Packages model weights, config, training state, and metadata into a self-contained directory with integrity manifest.

Fields

§checkpoint_dir: PathBuf

Path to checkpoint directory

§output: PathBuf

Output archive directory

§version: Option<String>

Release version tag (e.g., “v1.0”)

§notes: Option<String>

Release notes

§

Submit

Submit multi-adapter training jobs to a cluster (GPU-SHARE Phase 3).

Reads a cluster.yaml config, places adapter jobs across nodes using the greedy placement algorithm, and generates launch commands.

Fields

§cluster: PathBuf

Path to cluster config YAML

§model: PathBuf

Model checkpoint path (.apr)

§adapters: Vec<String>

Adapter specs: DATA:CHECKPOINT pairs (one per adapter)

§rank: u32

LoRA rank

§epochs: u32

Number of training epochs

§budget_mb: u64

Estimated VRAM budget per adapter (MB)

§dry_run: bool

Dry run: show placement and commands without executing

§

ClusterStatus

Show cluster status: nodes, GPUs, adapter capacity (GPU-SHARE Phase 3).

Reads a cluster.yaml config and displays node health, VRAM availability, and adapter placement capacity.

Fields

§cluster: PathBuf

Path to cluster config YAML

Trait Implementations§

Source§

impl Debug for TrainCommands

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl FromArgMatches for TrainCommands

Source§

fn from_arg_matches(__clap_arg_matches: &ArgMatches) -> Result<Self, Error>

Instantiate Self from ArgMatches, parsing the arguments as needed. Read more
Source§

fn from_arg_matches_mut( __clap_arg_matches: &mut ArgMatches, ) -> Result<Self, Error>

Instantiate Self from ArgMatches, parsing the arguments as needed. Read more
Source§

fn update_from_arg_matches( &mut self, __clap_arg_matches: &ArgMatches, ) -> Result<(), Error>

Assign values from ArgMatches to self.
Source§

fn update_from_arg_matches_mut<'b>( &mut self, __clap_arg_matches: &mut ArgMatches, ) -> Result<(), Error>

Assign values from ArgMatches to self.
Source§

impl Subcommand for TrainCommands

Source§

fn augment_subcommands<'b>(__clap_app: Command) -> Command

Append to Command so it can instantiate Self via FromArgMatches::from_arg_matches_mut Read more
Source§

fn augment_subcommands_for_update<'b>(__clap_app: Command) -> Command

Append to Command so it can instantiate self via FromArgMatches::update_from_arg_matches_mut Read more
Source§

fn has_subcommand(__clap_name: &str) -> bool

Test whether Self can parse a specific subcommand

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> Conv for T

Source§

fn conv<T>(self) -> T
where Self: Into<T>,

Converts self into T using Into<T>. Read more
Source§

impl<T> Downcast<T> for T

Source§

fn downcast(&self) -> &T

Source§

impl<T> FmtForward for T

Source§

fn fmt_binary(self) -> FmtBinary<Self>
where Self: Binary,

Causes self to use its Binary implementation when Debug-formatted.
Source§

fn fmt_display(self) -> FmtDisplay<Self>
where Self: Display,

Causes self to use its Display implementation when Debug-formatted.
Source§

fn fmt_lower_exp(self) -> FmtLowerExp<Self>
where Self: LowerExp,

Causes self to use its LowerExp implementation when Debug-formatted.
Source§

fn fmt_lower_hex(self) -> FmtLowerHex<Self>
where Self: LowerHex,

Causes self to use its LowerHex implementation when Debug-formatted.
Source§

fn fmt_octal(self) -> FmtOctal<Self>
where Self: Octal,

Causes self to use its Octal implementation when Debug-formatted.
Source§

fn fmt_pointer(self) -> FmtPointer<Self>
where Self: Pointer,

Causes self to use its Pointer implementation when Debug-formatted.
Source§

fn fmt_upper_exp(self) -> FmtUpperExp<Self>
where Self: UpperExp,

Causes self to use its UpperExp implementation when Debug-formatted.
Source§

fn fmt_upper_hex(self) -> FmtUpperHex<Self>
where Self: UpperHex,

Causes self to use its UpperHex implementation when Debug-formatted.
Source§

fn fmt_list(self) -> FmtList<Self>
where &'a Self: for<'a> IntoIterator,

Formats each item in a sequence. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pipe for T
where T: ?Sized,

Source§

fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> R
where Self: Sized,

Pipes by value. This is generally the method you want to use. Read more
Source§

fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> R
where R: 'a,

Borrows self and passes that borrow into the pipe function. Read more
Source§

fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> R
where R: 'a,

Mutably borrows self and passes that borrow into the pipe function. Read more
Source§

fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> R
where Self: Borrow<B>, B: 'a + ?Sized, R: 'a,

Borrows self, then passes self.borrow() into the pipe function. Read more
Source§

fn pipe_borrow_mut<'a, B, R>( &'a mut self, func: impl FnOnce(&'a mut B) -> R, ) -> R
where Self: BorrowMut<B>, B: 'a + ?Sized, R: 'a,

Mutably borrows self, then passes self.borrow_mut() into the pipe function. Read more
Source§

fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> R
where Self: AsRef<U>, U: 'a + ?Sized, R: 'a,

Borrows self, then passes self.as_ref() into the pipe function.
Source§

fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> R
where Self: AsMut<U>, U: 'a + ?Sized, R: 'a,

Mutably borrows self, then passes self.as_mut() into the pipe function.
Source§

fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> R
where Self: Deref<Target = T>, T: 'a + ?Sized, R: 'a,

Borrows self, then passes self.deref() into the pipe function.
Source§

fn pipe_deref_mut<'a, T, R>( &'a mut self, func: impl FnOnce(&'a mut T) -> R, ) -> R
where Self: DerefMut<Target = T> + Deref, T: 'a + ?Sized, R: 'a,

Mutably borrows self, then passes self.deref_mut() into the pipe function.
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> Tap for T

Source§

fn tap(self, func: impl FnOnce(&Self)) -> Self

Immutable access to a value. Read more
Source§

fn tap_mut(self, func: impl FnOnce(&mut Self)) -> Self

Mutable access to a value. Read more
Source§

fn tap_borrow<B>(self, func: impl FnOnce(&B)) -> Self
where Self: Borrow<B>, B: ?Sized,

Immutable access to the Borrow<B> of a value. Read more
Source§

fn tap_borrow_mut<B>(self, func: impl FnOnce(&mut B)) -> Self
where Self: BorrowMut<B>, B: ?Sized,

Mutable access to the BorrowMut<B> of a value. Read more
Source§

fn tap_ref<R>(self, func: impl FnOnce(&R)) -> Self
where Self: AsRef<R>, R: ?Sized,

Immutable access to the AsRef<R> view of a value. Read more
Source§

fn tap_ref_mut<R>(self, func: impl FnOnce(&mut R)) -> Self
where Self: AsMut<R>, R: ?Sized,

Mutable access to the AsMut<R> view of a value. Read more
Source§

fn tap_deref<T>(self, func: impl FnOnce(&T)) -> Self
where Self: Deref<Target = T>, T: ?Sized,

Immutable access to the Deref::Target of a value. Read more
Source§

fn tap_deref_mut<T>(self, func: impl FnOnce(&mut T)) -> Self
where Self: DerefMut<Target = T> + Deref, T: ?Sized,

Mutable access to the Deref::Target of a value. Read more
Source§

fn tap_dbg(self, func: impl FnOnce(&Self)) -> Self

Calls .tap() only in debug builds, and is erased in release builds.
Source§

fn tap_mut_dbg(self, func: impl FnOnce(&mut Self)) -> Self

Calls .tap_mut() only in debug builds, and is erased in release builds.
Source§

fn tap_borrow_dbg<B>(self, func: impl FnOnce(&B)) -> Self
where Self: Borrow<B>, B: ?Sized,

Calls .tap_borrow() only in debug builds, and is erased in release builds.
Source§

fn tap_borrow_mut_dbg<B>(self, func: impl FnOnce(&mut B)) -> Self
where Self: BorrowMut<B>, B: ?Sized,

Calls .tap_borrow_mut() only in debug builds, and is erased in release builds.
Source§

fn tap_ref_dbg<R>(self, func: impl FnOnce(&R)) -> Self
where Self: AsRef<R>, R: ?Sized,

Calls .tap_ref() only in debug builds, and is erased in release builds.
Source§

fn tap_ref_mut_dbg<R>(self, func: impl FnOnce(&mut R)) -> Self
where Self: AsMut<R>, R: ?Sized,

Calls .tap_ref_mut() only in debug builds, and is erased in release builds.
Source§

fn tap_deref_dbg<T>(self, func: impl FnOnce(&T)) -> Self
where Self: Deref<Target = T>, T: ?Sized,

Calls .tap_deref() only in debug builds, and is erased in release builds.
Source§

fn tap_deref_mut_dbg<T>(self, func: impl FnOnce(&mut T)) -> Self
where Self: DerefMut<Target = T> + Deref, T: ?Sized,

Calls .tap_deref_mut() only in debug builds, and is erased in release builds.
Source§

impl<T> TryConv for T

Source§

fn try_conv<T>(self) -> Result<T, Self::Error>
where Self: TryInto<T>,

Attempts to convert self into T using TryInto<T>. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> Upcast<T> for T

Source§

fn upcast(&self) -> Option<&T>

Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

Source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

Source§

impl<A, B, T> HttpServerConnExec<A, B> for T
where B: Body,

Source§

impl<T> WasmNotSend for T
where T: Send,

Source§

impl<T> WasmNotSendSync for T

Source§

impl<T> WasmNotSync for T
where T: Sync,