pub struct LlamaSampler { /* private fields */ }Expand description
A safe wrapper around llama_sampler.
Implementations§
Source§impl LlamaSampler
impl LlamaSampler
Sourcepub fn sample(&self, ctx: &LlamaContext<'_>, idx: i32) -> LlamaToken
pub fn sample(&self, ctx: &LlamaContext<'_>, idx: i32) -> LlamaToken
Sample and accept a token from the idx-th output of the last evaluation
Sourcepub fn apply(&mut self, data_array: &mut LlamaTokenDataArray)
pub fn apply(&mut self, data_array: &mut LlamaTokenDataArray)
Applies this sampler to a LlamaTokenDataArray.
Sourcepub fn accept(&mut self, token: LlamaToken)
pub fn accept(&mut self, token: LlamaToken)
Accepts a token from the sampler, possibly updating the internal state of certain samplers (e.g. grammar, repetition, etc.)
Sourcepub fn accept_many(
&mut self,
tokens: impl IntoIterator<Item = impl Borrow<LlamaToken>>,
)
pub fn accept_many( &mut self, tokens: impl IntoIterator<Item = impl Borrow<LlamaToken>>, )
Accepts several tokens from the sampler or context, possibly updating the internal state of certain samplers (e.g. grammar, repetition, etc.)
Sourcepub fn with_tokens(
self,
tokens: impl IntoIterator<Item = impl Borrow<LlamaToken>>,
) -> Self
pub fn with_tokens( self, tokens: impl IntoIterator<Item = impl Borrow<LlamaToken>>, ) -> Self
Accepts several tokens from the sampler or context, possibly updating the internal state of certain samplers (e.g. grammar, repetition, etc.)
Sourcepub fn chain(samplers: impl IntoIterator<Item = Self>, no_perf: bool) -> Self
pub fn chain(samplers: impl IntoIterator<Item = Self>, no_perf: bool) -> Self
Combines a list of samplers into a single sampler that applies each component sampler one after another.
If you are using a chain to select a token, the chain should always end with one of
LlamaSampler::greedy, LlamaSampler::dist, LlamaSampler::mirostat, and
LlamaSampler::mirostat_v2.
§Panics
Panics if llama.cpp returns a null pointer.
Sourcepub fn chain_simple(samplers: impl IntoIterator<Item = Self>) -> Self
pub fn chain_simple(samplers: impl IntoIterator<Item = Self>) -> Self
Same as Self::chain with no_perf = false.
§Panics
Panics if llama.cpp returns a null pointer.
§Example
use llama_cpp_4::token::{
LlamaToken,
data::LlamaTokenData,
data_array::LlamaTokenDataArray
};
use llama_cpp_4::sampling::LlamaSampler;
let mut data_array = LlamaTokenDataArray::new(vec![
LlamaTokenData::new(LlamaToken(0), 0., 0.),
LlamaTokenData::new(LlamaToken(1), 1., 0.),
LlamaTokenData::new(LlamaToken(2), 2., 0.),
], false);
data_array.apply_sampler(&mut LlamaSampler::chain_simple([
LlamaSampler::temp(0.5),
LlamaSampler::greedy(),
]));
assert_eq!(data_array.data[0].logit(), 0.);
assert_eq!(data_array.data[1].logit(), 2.);
assert_eq!(data_array.data[2].logit(), 4.);
assert_eq!(data_array.data.len(), 3);
assert_eq!(data_array.selected_token(), Some(LlamaToken(2)));Sourcepub fn temp(t: f32) -> Self
pub fn temp(t: f32) -> Self
Updates the logits l_i’ = l_i/t. When t <= 0.0, the maximum logit is kept at its original
value, the rest are set to -inf.
§Panics
Panics if llama.cpp returns a null pointer.
§Example:
use llama_cpp_4::token::{
LlamaToken,
data::LlamaTokenData,
data_array::LlamaTokenDataArray
};
use llama_cpp_4::sampling::LlamaSampler;
let mut data_array = LlamaTokenDataArray::new(vec![
LlamaTokenData::new(LlamaToken(0), 0., 0.),
LlamaTokenData::new(LlamaToken(1), 1., 0.),
LlamaTokenData::new(LlamaToken(2), 2., 0.),
], false);
data_array.apply_sampler(&mut LlamaSampler::temp(0.5));
assert_eq!(data_array.data[0].logit(), 0.);
assert_eq!(data_array.data[1].logit(), 2.);
assert_eq!(data_array.data[2].logit(), 4.);Sourcepub fn temp_ext(t: f32, delta: f32, exponent: f32) -> Self
pub fn temp_ext(t: f32, delta: f32, exponent: f32) -> Self
Dynamic temperature implementation (a.k.a. entropy) described in the paper https://arxiv.org/abs/2309.02772.
§Panics
Panics if llama.cpp returns a null pointer.
Sourcepub fn top_k(k: i32) -> Self
pub fn top_k(k: i32) -> Self
Top-K sampling described in academic paper “The Curious Case of Neural Text Degeneration” https://arxiv.org/abs/1904.09751.
§Panics
Panics if llama.cpp returns a null pointer.
§Example:
use llama_cpp_4::token::{
LlamaToken,
data::LlamaTokenData,
data_array::LlamaTokenDataArray
};
use llama_cpp_4::sampling::LlamaSampler;
let mut data_array = LlamaTokenDataArray::new(vec![
LlamaTokenData::new(LlamaToken(0), 0., 0.),
LlamaTokenData::new(LlamaToken(1), 1., 0.),
LlamaTokenData::new(LlamaToken(2), 2., 0.),
LlamaTokenData::new(LlamaToken(3), 3., 0.),
], false);
data_array.apply_sampler(&mut LlamaSampler::top_k(2));
assert_eq!(data_array.data.len(), 2);
assert_eq!(data_array.data[0].id(), LlamaToken(3));
assert_eq!(data_array.data[1].id(), LlamaToken(2));Sourcepub fn typical(p: f32, min_keep: usize) -> Self
pub fn typical(p: f32, min_keep: usize) -> Self
Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
§Panics
Panics if llama.cpp returns a null pointer.
Sourcepub fn top_p(p: f32, min_keep: usize) -> Self
pub fn top_p(p: f32, min_keep: usize) -> Self
Nucleus sampling described in academic paper “The Curious Case of Neural Text Degeneration” https://arxiv.org/abs/1904.09751.
§Panics
Panics if llama.cpp returns a null pointer.
Sourcepub fn min_p(p: f32, min_keep: usize) -> Self
pub fn min_p(p: f32, min_keep: usize) -> Self
Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841.
§Panics
Panics if llama.cpp returns a null pointer.
Sourcepub fn xtc(p: f32, t: f32, min_keep: usize, seed: u32) -> Self
pub fn xtc(p: f32, t: f32, min_keep: usize, seed: u32) -> Self
XTC sampler as described in https://github.com/oobabooga/text-generation-webui/pull/6335.
§Panics
Panics if llama.cpp returns a null pointer.
Sourcepub fn grammar(
model: &LlamaModel,
grammar_str: &str,
grammar_root: &str,
) -> Self
pub fn grammar( model: &LlamaModel, grammar_str: &str, grammar_root: &str, ) -> Self
Grammar sampler
§Panics
- If either of
grammar_strorgrammar_rootcontain null bytes. - If llama.cpp returns a null pointer.
Sourcepub fn dry(
&self,
model: &LlamaModel,
n_ctx_train: i32,
multiplier: f32,
base: f32,
allowed_length: i32,
penalty_last_n: i32,
seq_breakers: impl IntoIterator<Item = impl AsRef<[u8]>>,
) -> Self
pub fn dry( &self, model: &LlamaModel, n_ctx_train: i32, multiplier: f32, base: f32, allowed_length: i32, penalty_last_n: i32, seq_breakers: impl IntoIterator<Item = impl AsRef<[u8]>>, ) -> Self
DRY sampler, designed by p-e-w, as described in: https://github.com/oobabooga/text-generation-webui/pull/5677, porting Koboldcpp implementation authored by pi6am: https://github.com/LostRuins/koboldcpp/pull/982
§Panics
- If any string in
seq_breakerscontains null bytes. - If llama.cpp returns a null pointer.
Sourcepub fn penalties(
n_vocab: i32,
special_eos_id: f32,
linefeed_id: f32,
penalty_last_n: f32,
) -> Self
pub fn penalties( n_vocab: i32, special_eos_id: f32, linefeed_id: f32, penalty_last_n: f32, ) -> Self
Penalizes tokens for being present in the context.
Parameters:
n_vocab:LlamaModel::n_vocabspecial_eos_id:LlamaModel::token_eoslinefeed_id:LlamaModel::token_nlpenalty_last_n: last n tokens to penalize (0 = disable penalty, -1 = context size)
§Panics
Panics if llama.cpp returns a null pointer.
Sourcepub fn penalties_simple(model: &LlamaModel, penalty_last_n: i32) -> Self
pub fn penalties_simple(model: &LlamaModel, penalty_last_n: i32) -> Self
Same as Self::penalties, but with n_vocab, special_eos_id, and linefeed_id
initialized from model, penalize_nl = false, and ignore_eos = true.
Parameters:
model: The model’s tokenizer to use to initialize the sampler.penalty_last_n: last n tokens to penalize (0 = disable penalty, -1 = context size)
§Panics
Panics if llama.cpp returns a null pointer.
Sourcepub fn mirostat(n_vocab: i32, seed: u32, tau: f32, eta: f32, m: i32) -> Self
pub fn mirostat(n_vocab: i32, seed: u32, tau: f32, eta: f32, m: i32) -> Self
Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
§Panics
Panics if llama.cpp returns a null pointer.
§Parameters:
n_vocab:LlamaModel::n_vocabseed: Seed to initialize random generation with.tau: The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.eta: The learning rate used to updatemubased on the error between the target and observed surprisal of the sampled word. A larger learning rate will causemuto be updated more quickly, while a smaller learning rate will result in slower updates.m: The number of tokens considered in the estimation ofs_hat. This is an arbitrary value that is used to calculates_hat, which in turn helps to calculate the value ofk. In the paper, they usem = 100, but you can experiment with different values to see how it affects the performance of the algorithm.
Sourcepub fn mirostat_v2(seed: u32, tau: f32, eta: f32) -> Self
pub fn mirostat_v2(seed: u32, tau: f32, eta: f32) -> Self
Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
§Panics
Panics if llama.cpp returns a null pointer.
§Parameters:
seed: Seed to initialize random generation with.tau: The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.eta: The learning rate used to updatemubased on the error between the target and observed surprisal of the sampled word. A larger learning rate will causemuto be updated more quickly, while a smaller learning rate will result in slower updates.
Sourcepub fn dist(seed: u32) -> Self
pub fn dist(seed: u32) -> Self
Selects a token at random based on each token’s probabilities.
§Panics
Panics if llama.cpp returns a null pointer.
Sourcepub fn greedy() -> Self
pub fn greedy() -> Self
Selects the most likely token.
§Panics
Panics if llama.cpp returns a null pointer.
§Example:
use llama_cpp_4::token::{
LlamaToken,
data::LlamaTokenData,
data_array::LlamaTokenDataArray
};
use llama_cpp_4::sampling::LlamaSampler;
let mut data_array = LlamaTokenDataArray::new(vec![
LlamaTokenData::new(LlamaToken(0), 0., 0.),
LlamaTokenData::new(LlamaToken(1), 1., 0.),
], false);
data_array.apply_sampler(&mut LlamaSampler::greedy());
assert_eq!(data_array.data.len(), 2);
assert_eq!(data_array.selected_token(), Some(LlamaToken(1)));Sourcepub fn top_n_sigma(n: f32) -> Self
pub fn top_n_sigma(n: f32) -> Self
Top-N sigma sampling.
Keeps tokens within N standard deviations of the maximum logit.
§Panics
Panics if llama.cpp returns a null pointer.
Sourcepub fn adaptive_p(target: f32, decay: f32, seed: u32) -> Self
pub fn adaptive_p(target: f32, decay: f32, seed: u32) -> Self
Sourcepub fn logit_bias(n_vocab: i32, biases: &[(LlamaToken, f32)]) -> Self
pub fn logit_bias(n_vocab: i32, biases: &[(LlamaToken, f32)]) -> Self
Logit bias sampler.
Applies additive bias to specific token logits before sampling.
§Panics
Panics if llama.cpp returns a null pointer.
§Parameters
n_vocab: Number of tokens in the vocabulary (LlamaModel::n_vocab).biases: Slice of(token_id, bias)pairs.
Sourcepub fn infill(model: &LlamaModel) -> Self
pub fn infill(model: &LlamaModel) -> Self
Infill sampler.
Reorders token probabilities for fill-in-the-middle tasks.
§Panics
Panics if llama.cpp returns a null pointer.
Sourcepub fn get_seed(&self) -> u32
pub fn get_seed(&self) -> u32
Get the seed of the sampler.
Returns LLAMA_DEFAULT_SEED if the sampler is not seeded.
Sourcepub fn chain_n(&self) -> i32
pub fn chain_n(&self) -> i32
Get the number of samplers in a chain.
Returns 0 if this sampler is not a chain.
Sourcepub fn chain_remove(&mut self, i: i32) -> Self
pub fn chain_remove(&mut self, i: i32) -> Self
Remove and return the sampler at position i from a chain.
The returned sampler is owned by the caller and will be freed on drop.
§Panics
Panics if i is out of range or if llama.cpp returns a null pointer.
Sourcepub fn grammar_lazy(
model: &LlamaModel,
grammar_str: &str,
grammar_root: &str,
trigger_words: &[&str],
trigger_tokens: &[LlamaToken],
) -> Self
pub fn grammar_lazy( model: &LlamaModel, grammar_str: &str, grammar_root: &str, trigger_words: &[&str], trigger_tokens: &[LlamaToken], ) -> Self
Grammar sampler with lazy activation.
The grammar is only activated when one of the trigger words or trigger tokens is encountered.
§Panics
- If
grammar_strorgrammar_rootcontain null bytes. - If any trigger word contains null bytes.
- If llama.cpp returns a null pointer.
Sourcepub fn grammar_lazy_patterns(
model: &LlamaModel,
grammar_str: &str,
grammar_root: &str,
trigger_patterns: &[&str],
trigger_tokens: &[LlamaToken],
) -> Self
pub fn grammar_lazy_patterns( model: &LlamaModel, grammar_str: &str, grammar_root: &str, trigger_patterns: &[&str], trigger_tokens: &[LlamaToken], ) -> Self
Grammar sampler with lazy activation via regex patterns.
The grammar is only activated when one of the trigger patterns or trigger tokens matches.
§Panics
- If
grammar_strorgrammar_rootcontain null bytes. - If any trigger pattern contains null bytes.
- If llama.cpp returns a null pointer.
Sourcepub fn clone_sampler(&self) -> Self
pub fn clone_sampler(&self) -> Self
Clone this sampler.
Creates an independent copy of this sampler with the same state.
§Panics
Panics if llama.cpp returns a null pointer.
Sourcepub fn perf_print(&self)
pub fn perf_print(&self)
Print sampler performance data.
Sourcepub fn perf_reset(&mut self)
pub fn perf_reset(&mut self)
Reset sampler performance counters.
Sourcepub fn perf_data(&self) -> llama_perf_sampler_data
pub fn perf_data(&self) -> llama_perf_sampler_data
Get sampler performance data.
Sourcepub unsafe fn chain_get_ptr(&self, i: i32) -> *mut llama_sampler
pub unsafe fn chain_get_ptr(&self, i: i32) -> *mut llama_sampler
Get a non-owning reference to the ith sampler in a chain.
§Safety
The returned pointer is owned by the chain. Do not free it or use it after the chain is dropped or modified.
Sourcepub unsafe fn from_raw(
iface: *mut llama_sampler_i,
ctx: llama_sampler_context_t,
) -> Self
pub unsafe fn from_raw( iface: *mut llama_sampler_i, ctx: llama_sampler_context_t, ) -> Self
Sourcepub fn common() -> Self
pub fn common() -> Self
Creates a new instance of LlamaSampler with common sampling parameters.
This function initializes a LlamaSampler using default values from common_sampler_params
and configures it with common settings such as top_k, top_p, temperature, and seed values.
§Panics
Panics if llama.cpp returns a null pointer.
§Returns
A LlamaSampler instance configured with the common sampling parameters.