pub enum TokenizeCommands {
Plan {
data: PathBuf,
vocab_size: usize,
algorithm: String,
output: PathBuf,
format: String,
},
Apply {
data: PathBuf,
vocab_size: usize,
algorithm: String,
output: PathBuf,
max_lines: usize,
},
}Expand description
Tokenizer training pipeline subcommands (forjar-style plan/apply).
Thin CLI wrappers around aprender’s BPE training infrastructure. Trains a BPE vocabulary from a text corpus for use in model training.
Variants§
Plan
Validate inputs and estimate tokenizer training time/resources.
Checks that the input corpus exists, counts lines/bytes, estimates vocabulary coverage, and reports expected training time. Outputs a serializable plan manifest (text, JSON, or YAML).
Analogous to forjar plan — shows what will happen before committing.
Fields
Apply
Train a tokenizer on the corpus.
Reads the input corpus, trains a BPE/WordPiece/Unigram tokenizer, and writes vocab.json + merges.txt to the output directory.
Analogous to forjar apply — commits resources and executes the plan.
Fields
Trait Implementations§
Source§impl Debug for TokenizeCommands
impl Debug for TokenizeCommands
Source§impl FromArgMatches for TokenizeCommands
impl FromArgMatches for TokenizeCommands
Source§fn from_arg_matches(__clap_arg_matches: &ArgMatches) -> Result<Self, Error>
fn from_arg_matches(__clap_arg_matches: &ArgMatches) -> Result<Self, Error>
Source§fn from_arg_matches_mut(
__clap_arg_matches: &mut ArgMatches,
) -> Result<Self, Error>
fn from_arg_matches_mut( __clap_arg_matches: &mut ArgMatches, ) -> Result<Self, Error>
Source§fn update_from_arg_matches(
&mut self,
__clap_arg_matches: &ArgMatches,
) -> Result<(), Error>
fn update_from_arg_matches( &mut self, __clap_arg_matches: &ArgMatches, ) -> Result<(), Error>
Assign values from
ArgMatches to self.Source§fn update_from_arg_matches_mut<'b>(
&mut self,
__clap_arg_matches: &mut ArgMatches,
) -> Result<(), Error>
fn update_from_arg_matches_mut<'b>( &mut self, __clap_arg_matches: &mut ArgMatches, ) -> Result<(), Error>
Assign values from
ArgMatches to self.Source§impl Subcommand for TokenizeCommands
impl Subcommand for TokenizeCommands
Source§fn augment_subcommands<'b>(__clap_app: Command) -> Command
fn augment_subcommands<'b>(__clap_app: Command) -> Command
Source§fn augment_subcommands_for_update<'b>(__clap_app: Command) -> Command
fn augment_subcommands_for_update<'b>(__clap_app: Command) -> Command
Append to
Command so it can instantiate self via
FromArgMatches::update_from_arg_matches_mut Read moreSource§fn has_subcommand(__clap_name: &str) -> bool
fn has_subcommand(__clap_name: &str) -> bool
Test whether
Self can parse a specific subcommandAuto Trait Implementations§
impl Freeze for TokenizeCommands
impl RefUnwindSafe for TokenizeCommands
impl Send for TokenizeCommands
impl Sync for TokenizeCommands
impl Unpin for TokenizeCommands
impl UnsafeUnpin for TokenizeCommands
impl UnwindSafe for TokenizeCommands
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more