Skip to main content

LlamaSampler

Struct LlamaSampler 

Source
pub struct LlamaSampler { /* private fields */ }
Expand description

A safe wrapper around llama_sampler.

Implementations§

Source§

impl LlamaSampler

Source

pub fn new() -> Self

Create new sampler with default params.

§Panics

Panics if llama.cpp returns a null pointer.

Source

pub fn sample(&self, ctx: &LlamaContext<'_>, idx: i32) -> LlamaToken

Sample and accept a token from the idx-th output of the last evaluation

Source

pub fn apply(&mut self, data_array: &mut LlamaTokenDataArray)

Applies this sampler to a LlamaTokenDataArray.

Source

pub fn accept(&mut self, token: LlamaToken)

Accepts a token from the sampler, possibly updating the internal state of certain samplers (e.g. grammar, repetition, etc.)

Source

pub fn accept_many( &mut self, tokens: impl IntoIterator<Item = impl Borrow<LlamaToken>>, )

Accepts several tokens from the sampler or context, possibly updating the internal state of certain samplers (e.g. grammar, repetition, etc.)

Source

pub fn with_tokens( self, tokens: impl IntoIterator<Item = impl Borrow<LlamaToken>>, ) -> Self

Accepts several tokens from the sampler or context, possibly updating the internal state of certain samplers (e.g. grammar, repetition, etc.)

Source

pub fn chain(samplers: impl IntoIterator<Item = Self>, no_perf: bool) -> Self

Combines a list of samplers into a single sampler that applies each component sampler one after another.

If you are using a chain to select a token, the chain should always end with one of LlamaSampler::greedy, LlamaSampler::dist, LlamaSampler::mirostat, and LlamaSampler::mirostat_v2.

§Panics

Panics if llama.cpp returns a null pointer.

Source

pub fn chain_simple(samplers: impl IntoIterator<Item = Self>) -> Self

Same as Self::chain with no_perf = false.

§Panics

Panics if llama.cpp returns a null pointer.

§Example
use llama_cpp_4::token::{
   LlamaToken,
   data::LlamaTokenData,
   data_array::LlamaTokenDataArray
};
use llama_cpp_4::sampling::LlamaSampler;

let mut data_array = LlamaTokenDataArray::new(vec![
    LlamaTokenData::new(LlamaToken(0), 0., 0.),
    LlamaTokenData::new(LlamaToken(1), 1., 0.),
    LlamaTokenData::new(LlamaToken(2), 2., 0.),
], false);

data_array.apply_sampler(&mut LlamaSampler::chain_simple([
    LlamaSampler::temp(0.5),
    LlamaSampler::greedy(),
]));

assert_eq!(data_array.data[0].logit(), 0.);
assert_eq!(data_array.data[1].logit(), 2.);
assert_eq!(data_array.data[2].logit(), 4.);

assert_eq!(data_array.data.len(), 3);
assert_eq!(data_array.selected_token(), Some(LlamaToken(2)));
Source

pub fn temp(t: f32) -> Self

Updates the logits l_i’ = l_i/t. When t <= 0.0, the maximum logit is kept at its original value, the rest are set to -inf.

§Panics

Panics if llama.cpp returns a null pointer.

§Example:
use llama_cpp_4::token::{
   LlamaToken,
   data::LlamaTokenData,
   data_array::LlamaTokenDataArray
};
use llama_cpp_4::sampling::LlamaSampler;

let mut data_array = LlamaTokenDataArray::new(vec![
    LlamaTokenData::new(LlamaToken(0), 0., 0.),
    LlamaTokenData::new(LlamaToken(1), 1., 0.),
    LlamaTokenData::new(LlamaToken(2), 2., 0.),
], false);

data_array.apply_sampler(&mut LlamaSampler::temp(0.5));

assert_eq!(data_array.data[0].logit(), 0.);
assert_eq!(data_array.data[1].logit(), 2.);
assert_eq!(data_array.data[2].logit(), 4.);
Source

pub fn temp_ext(t: f32, delta: f32, exponent: f32) -> Self

Dynamic temperature implementation (a.k.a. entropy) described in the paper https://arxiv.org/abs/2309.02772.

§Panics

Panics if llama.cpp returns a null pointer.

Source

pub fn top_k(k: i32) -> Self

Top-K sampling described in academic paper “The Curious Case of Neural Text Degeneration” https://arxiv.org/abs/1904.09751.

§Panics

Panics if llama.cpp returns a null pointer.

§Example:
use llama_cpp_4::token::{
   LlamaToken,
   data::LlamaTokenData,
   data_array::LlamaTokenDataArray
};
use llama_cpp_4::sampling::LlamaSampler;

let mut data_array = LlamaTokenDataArray::new(vec![
    LlamaTokenData::new(LlamaToken(0), 0., 0.),
    LlamaTokenData::new(LlamaToken(1), 1., 0.),
    LlamaTokenData::new(LlamaToken(2), 2., 0.),
    LlamaTokenData::new(LlamaToken(3), 3., 0.),
], false);

data_array.apply_sampler(&mut LlamaSampler::top_k(2));

assert_eq!(data_array.data.len(), 2);
assert_eq!(data_array.data[0].id(), LlamaToken(3));
assert_eq!(data_array.data[1].id(), LlamaToken(2));
Source

pub fn typical(p: f32, min_keep: usize) -> Self

Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.

§Panics

Panics if llama.cpp returns a null pointer.

Source

pub fn top_p(p: f32, min_keep: usize) -> Self

Nucleus sampling described in academic paper “The Curious Case of Neural Text Degeneration” https://arxiv.org/abs/1904.09751.

§Panics

Panics if llama.cpp returns a null pointer.

Source

pub fn min_p(p: f32, min_keep: usize) -> Self

Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841.

§Panics

Panics if llama.cpp returns a null pointer.

Source

pub fn xtc(p: f32, t: f32, min_keep: usize, seed: u32) -> Self

XTC sampler as described in https://github.com/oobabooga/text-generation-webui/pull/6335.

§Panics

Panics if llama.cpp returns a null pointer.

Source

pub fn grammar( model: &LlamaModel, grammar_str: &str, grammar_root: &str, ) -> Self

Grammar sampler

§Panics
  • If either of grammar_str or grammar_root contain null bytes.
  • If llama.cpp returns a null pointer.
Source

pub fn dry( &self, model: &LlamaModel, n_ctx_train: i32, multiplier: f32, base: f32, allowed_length: i32, penalty_last_n: i32, seq_breakers: impl IntoIterator<Item = impl AsRef<[u8]>>, ) -> Self

DRY sampler, designed by p-e-w, as described in: https://github.com/oobabooga/text-generation-webui/pull/5677, porting Koboldcpp implementation authored by pi6am: https://github.com/LostRuins/koboldcpp/pull/982

§Panics
  • If any string in seq_breakers contains null bytes.
  • If llama.cpp returns a null pointer.
Source

pub fn penalties( n_vocab: i32, special_eos_id: f32, linefeed_id: f32, penalty_last_n: f32, ) -> Self

Penalizes tokens for being present in the context.

Parameters:

§Panics

Panics if llama.cpp returns a null pointer.

Source

pub fn penalties_simple(model: &LlamaModel, penalty_last_n: i32) -> Self

Same as Self::penalties, but with n_vocab, special_eos_id, and linefeed_id initialized from model, penalize_nl = false, and ignore_eos = true.

Parameters:

  • model: The model’s tokenizer to use to initialize the sampler.
  • penalty_last_n: last n tokens to penalize (0 = disable penalty, -1 = context size)
§Panics

Panics if llama.cpp returns a null pointer.

Source

pub fn mirostat(n_vocab: i32, seed: u32, tau: f32, eta: f32, m: i32) -> Self

Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.

§Panics

Panics if llama.cpp returns a null pointer.

§Parameters:
  • n_vocab: LlamaModel::n_vocab
  • seed: Seed to initialize random generation with.
  • tau: The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
  • eta: The learning rate used to update mu based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu to be updated more quickly, while a smaller learning rate will result in slower updates.
  • m: The number of tokens considered in the estimation of s_hat. This is an arbitrary value that is used to calculate s_hat, which in turn helps to calculate the value of k. In the paper, they use m = 100, but you can experiment with different values to see how it affects the performance of the algorithm.
Source

pub fn mirostat_v2(seed: u32, tau: f32, eta: f32) -> Self

Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.

§Panics

Panics if llama.cpp returns a null pointer.

§Parameters:
  • seed: Seed to initialize random generation with.
  • tau: The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
  • eta: The learning rate used to update mu based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu to be updated more quickly, while a smaller learning rate will result in slower updates.
Source

pub fn dist(seed: u32) -> Self

Selects a token at random based on each token’s probabilities.

§Panics

Panics if llama.cpp returns a null pointer.

Source

pub fn greedy() -> Self

Selects the most likely token.

§Panics

Panics if llama.cpp returns a null pointer.

§Example:
use llama_cpp_4::token::{
   LlamaToken,
   data::LlamaTokenData,
   data_array::LlamaTokenDataArray
};
use llama_cpp_4::sampling::LlamaSampler;

let mut data_array = LlamaTokenDataArray::new(vec![
    LlamaTokenData::new(LlamaToken(0), 0., 0.),
    LlamaTokenData::new(LlamaToken(1), 1., 0.),
], false);

data_array.apply_sampler(&mut LlamaSampler::greedy());

assert_eq!(data_array.data.len(), 2);
assert_eq!(data_array.selected_token(), Some(LlamaToken(1)));
Source

pub fn top_n_sigma(n: f32) -> Self

Top-N sigma sampling.

Keeps tokens within N standard deviations of the maximum logit.

§Panics

Panics if llama.cpp returns a null pointer.

Source

pub fn adaptive_p(target: f32, decay: f32, seed: u32) -> Self

Adaptive P sampling.

§Panics

Panics if llama.cpp returns a null pointer.

§Parameters
  • target: Target probability.
  • decay: Decay rate.
  • seed: Random seed.
Source

pub fn logit_bias(n_vocab: i32, biases: &[(LlamaToken, f32)]) -> Self

Logit bias sampler.

Applies additive bias to specific token logits before sampling.

§Panics

Panics if llama.cpp returns a null pointer.

§Parameters
  • n_vocab: Number of tokens in the vocabulary (LlamaModel::n_vocab).
  • biases: Slice of (token_id, bias) pairs.
Source

pub fn infill(model: &LlamaModel) -> Self

Infill sampler.

Reorders token probabilities for fill-in-the-middle tasks.

§Panics

Panics if llama.cpp returns a null pointer.

Source

pub fn get_seed(&self) -> u32

Get the seed of the sampler.

Returns LLAMA_DEFAULT_SEED if the sampler is not seeded.

Source

pub fn name(&self) -> String

Get the name of the sampler.

§Panics

Panics if the name is not valid UTF-8.

Source

pub fn reset(&mut self)

Reset the sampler state (e.g. grammar, repetition penalties).

Source

pub fn chain_n(&self) -> i32

Get the number of samplers in a chain.

Returns 0 if this sampler is not a chain.

Source

pub fn chain_remove(&mut self, i: i32) -> Self

Remove and return the sampler at position i from a chain.

The returned sampler is owned by the caller and will be freed on drop.

§Panics

Panics if i is out of range or if llama.cpp returns a null pointer.

Source

pub fn grammar_lazy( model: &LlamaModel, grammar_str: &str, grammar_root: &str, trigger_words: &[&str], trigger_tokens: &[LlamaToken], ) -> Self

Grammar sampler with lazy activation.

The grammar is only activated when one of the trigger words or trigger tokens is encountered.

§Panics
  • If grammar_str or grammar_root contain null bytes.
  • If any trigger word contains null bytes.
  • If llama.cpp returns a null pointer.
Source

pub fn grammar_lazy_patterns( model: &LlamaModel, grammar_str: &str, grammar_root: &str, trigger_patterns: &[&str], trigger_tokens: &[LlamaToken], ) -> Self

Grammar sampler with lazy activation via regex patterns.

The grammar is only activated when one of the trigger patterns or trigger tokens matches.

§Panics
  • If grammar_str or grammar_root contain null bytes.
  • If any trigger pattern contains null bytes.
  • If llama.cpp returns a null pointer.
Source

pub fn clone_sampler(&self) -> Self

Clone this sampler.

Creates an independent copy of this sampler with the same state.

§Panics

Panics if llama.cpp returns a null pointer.

Source

pub fn perf_print(&self)

Print sampler performance data.

Source

pub fn perf_reset(&mut self)

Reset sampler performance counters.

Source

pub fn perf_data(&self) -> llama_perf_sampler_data

Get sampler performance data.

Source

pub unsafe fn chain_get_ptr(&self, i: i32) -> *mut llama_sampler

Get a non-owning reference to the ith sampler in a chain.

§Safety

The returned pointer is owned by the chain. Do not free it or use it after the chain is dropped or modified.

Source

pub unsafe fn from_raw( iface: *mut llama_sampler_i, ctx: llama_sampler_context_t, ) -> Self

Create a sampler from a raw interface and context.

§Safety

The caller must ensure that iface and ctx are valid and that the interface functions properly manage the context lifetime.

§Panics

Panics if llama.cpp returns a null pointer.

Source

pub fn common() -> Self

Creates a new instance of LlamaSampler with common sampling parameters.

This function initializes a LlamaSampler using default values from common_sampler_params and configures it with common settings such as top_k, top_p, temperature, and seed values.

§Panics

Panics if llama.cpp returns a null pointer.

§Returns

A LlamaSampler instance configured with the common sampling parameters.

Trait Implementations§

Source§

impl Debug for LlamaSampler

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for LlamaSampler

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl Drop for LlamaSampler

Source§

fn drop(&mut self)

Executes the destructor for this type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more