pub struct LlamaBatch { /* private fields */ }Expand description
A safe wrapper around llama_batch.
Implementations§
Source§impl LlamaBatch
impl LlamaBatch
Sourcepub fn clear(&mut self)
pub fn clear(&mut self)
Clear the batch. This does not free the memory associated with the batch, but it does reset the number of tokens to 0.
Sourcepub fn add(
&mut self,
LlamaToken: LlamaToken,
pos: llama_pos,
seq_ids: &[i32],
logits: bool,
) -> Result<(), BatchAddError>
pub fn add( &mut self, LlamaToken: LlamaToken, pos: llama_pos, seq_ids: &[i32], logits: bool, ) -> Result<(), BatchAddError>
add a token to the batch for sequences seq_ids at position pos. If logits is true, the
token will be initialized and can be read from after the next decode.
§Panics
self.llama_batch.n_tokensdoes not fit into a usizeseq_ids.len()does not fit into allama_seq_id
§Errors
returns a error if there is insufficient space in the buffer
Sourcepub fn add_sequence(
&mut self,
tokens: &[LlamaToken],
seq_id: i32,
logits_all: bool,
) -> Result<(), BatchAddError>
pub fn add_sequence( &mut self, tokens: &[LlamaToken], seq_id: i32, logits_all: bool, ) -> Result<(), BatchAddError>
Add a sequence of tokens to the batch for the given sequence id. If logits_all is true, the
tokens will be initialized and can be read from after the next decode.
Either way the last token in the sequence will have its logits set to true.
§Errors
Returns an error if there is insufficient space in the buffer
§Panics
Sourcepub fn new(n_tokens: usize, n_seq_max: i32) -> Self
pub fn new(n_tokens: usize, n_seq_max: i32) -> Self
Create a new LlamaBatch that can contain up to n_tokens tokens.
§Arguments
n_tokens: the maximum number of tokens that can be added to the batchn_seq_max: the maximum number of sequences that can be added to the batch (generally 1 unless you know what you are doing)
§Panics
Panics if n_tokens is greater than i32::MAX.
Sourcepub fn get_one(tokens: &mut [LlamaToken]) -> llama_batch
pub fn get_one(tokens: &mut [LlamaToken]) -> llama_batch
Create a batch from a slice of tokens for simple one-shot decoding.
The returned batch uses the provided token buffer directly and does not own the memory. All tokens are assigned to sequence 0 and logits are enabled for the last token only.
Note: The returned batch does NOT free memory on drop — it borrows from the input
slice. The caller must ensure tokens outlives the returned batch.
§Panics
Panics if tokens.len() does not fit into an i32.