Skip to main content

Module dataset

Module dataset 

Source

Structs§

BatchRequest
OpenAI Batch API format - JSONL entries
DatasetEntry
A processed dataset entry ready for simulation
DatasetIterator
Iterator over dataset entries, parsing JSON but NOT tokenizing Tokenization happens in batches in the background thread for performance
DatasetLoader
Dataset loader that provides lazy iteration over entries
Message
RequestBody
UnparsedEntry
Unparsed entry from dataset (before tokenization)

Enums§

PromptInput
RequestInput

Type Aliases§

BatchTokenizerFn
A batch tokenizer function that takes multiple prompt inputs and returns multiple token vectors. This is much faster than tokenizing one at a time.
TokenizerFn
A tokenizer function that takes either chat messages or a raw prompt and returns tokenized output. This allows different implementations (tiktoken, transformers.js, etc.) to be passed in from the CLI or WASM interface. The tokenizer should apply the appropriate chat template for chat-style requests.