pub struct TokenDataset {
pub tokens: Vec<usize>,
pub tokenizer: CharTokenizer,
}Expand description
A tokenized text dataset for training.
Fields§
§tokens: Vec<usize>§tokenizer: CharTokenizerImplementations§
Source§impl TokenDataset
impl TokenDataset
Sourcepub fn shakespeare() -> Self
pub fn shakespeare() -> Self
Build from the built-in Shakespeare corpus.
Sourcepub fn vocab_size(&self) -> usize
pub fn vocab_size(&self) -> usize
Vocabulary size.
Sourcepub fn sample_window(&self, length: usize, seed: u64) -> &[usize]
pub fn sample_window(&self, length: usize, seed: u64) -> &[usize]
Sample a deterministic window of length tokens.
The start position is derived from seed for reproducibility.
Sourcepub fn num_windows(&self, length: usize) -> usize
pub fn num_windows(&self, length: usize) -> usize
Number of possible windows of given length.
Sourcepub fn from_jsonl(path: &Path) -> Result<Self>
pub fn from_jsonl(path: &Path) -> Result<Self>
Load from a JSONL file where each line is a quoted text string. Lines are unescaped (\n → newline) and concatenated with newline separators.
Auto Trait Implementations§
impl Freeze for TokenDataset
impl RefUnwindSafe for TokenDataset
impl Send for TokenDataset
impl Sync for TokenDataset
impl Unpin for TokenDataset
impl UnsafeUnpin for TokenDataset
impl UnwindSafe for TokenDataset
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more