Enum SamplingMode

Source

pub enum SamplingMode {
    Greedy,
    TopP {
        p: Probability<f64>,
        min_keep: NonZeroUsize,
    },
    TopK {
        k: NonZeroUsize,
    },
    MinP {
        p: Probability<f32>,
        min_keep: NonZeroUsize,
    },
    TailFree {
        z: Probability<f32>,
        min_keep: NonZeroUsize,
    },
    LocallyTypical {
        p: Probability<f32>,
        min_keep: NonZeroUsize,
    },
    Mirostat {
        tau: f32,
        eta: f32,
        max_keep: Option<NonZeroUsize>,
    },
    MirostatV2 {
        tau: f32,
        eta: f32,
        max_keep: Option<NonZeroUsize>,
    },
    SplitP {
        min_keep: NonZeroUsize,
        max_keep: Option<NonZeroUsize>,
    },
    SplitL {
        min_keep: NonZeroUsize,
        max_keep: Option<NonZeroUsize>,
    },
}

Variants§

§

Greedy

Greedy sampling. The most likely next token is always chosen. Not very useful unless you want to regurgitate the training data.

§

TopP

Top-p sampling. A token is chosen from the top tokens whose cumulative probability is greater than or equal to p.

Fields

§p: Probability<f64>

Reasonable values are between 0.9 and 0.95. Higher means more diversity, but potentially less coherent.

§min_keep: NonZeroUsize

Minimum number of candidates to keep per token.

§

TopK

A token is chosen from the top k tokens. This is not very good.

Fields

§k: NonZeroUsize

The top k tokens are kept. Reasonable values are between 30 and 40.

§

MinP

Min-p sampling. p sets the minimum probability to keep a token. Below that the tail is cut off. p is scaled by the top token’s probability to balance diversity and quality.

It is described in detail in the following pull request: https://github.com/ggerganov/llama.cpp/pull/3841

Fields

§p: Probability<f32>

The minimum probability to keep a token. This is scaled by the top token’s probability. Reasonable values are 0.05 to 0.3. Higher means less diversity.

§min_keep: NonZeroUsize

§

TailFree

Tail free sampling.

“TFS first converts logits output by a model into probabilities using the softmax function before sorting them in descending order. It then calculates the first and second derivatives. As the tokens are discrete, this can be found with subtraction. The magnitude of each second derivative is then taken and normalized so that they sum to 1. Finally, a threshold z is used to determine what part of the cumulative distribution of the second derivative weights to define the “tail” of the distribution to be at.”

https://www.trentonbricken.com/Tail-Free-Sampling/

Fields

§z: Probability<f32>

Reasonable values are between 0.25 and 0.75. The higher, the more diverse the output, but also potentially less coherent.

§min_keep: NonZeroUsize

Minimum number of candidates to keep per token.

§

LocallyTypical

Locally typical sampling.

“First, we compute the conditional entropy, which is an O(|V|) operation. Second, we sort words by their absolute distance from H(pb(·| Y <t = y<t)), which can be done in O(|V| log |V|) time with standard sorting algorithms. Finally, we greedily take words from this list until their cumulative probability exceeds the threshold p , which again takes O(|V|) time. Thus, creating our altered distribution has time complexity O(|V| log |V|).”

https://arxiv.org/pdf/2202.00666.pdf

Fields

§p: Probability<f32>

Probability. Reasonable values are between 0.2 and 0.95. For story generation, lower is better. For summarization, higher is better.

§min_keep: NonZeroUsize

Minimum number of candidates to keep per token.

§

Mirostat

Mirostat sampling.

“a neural text decoding algorithm that directly controls the perplexity of the generated text over a wide range of text length. Notably, for longer texts and certain ranges of input parameters, top-k and top-p sampling fall into boredom and confusion traps which cause low-quality texts; Mirostat avoids both traps.”

https://arxiv.org/pdf/2007.14966.pdf

Fields

§tau: f32

Tau. Target entropy. A good value is 3.0 according to this paper: https://arxiv.org/pdf/2202.00666.pdf

llama.cpp uses a default of 5.0.

§eta: f32

Eta. Learning rate. A good value is 0.1.

§max_keep: Option<NonZeroUsize>

Maximum number of candidates to keep. In the original paper and code the default is 100 and the name is m.

§

MirostatV2

Mirostat V.2 sampling.

“Here we provide an alternate algorithm for perplexity control, Alg. 2, which does not depend on the distribution of the underlying LM. In this sense, Alg. 2 controls perplexity in more general sequential generative models than Alg. 1 where the underlying distribution may not be Zipfian. In our work, we choose Alg. 1 since it has only an additional constant time complexity compared to top-k sampling. Whereas Alg. 2 has additional time complexity that depends on target cross-entropy rate and vocabulary size, which may vary with different LMs.”

§Note:

The bit about time complexity is not relevant to this implementation since we truncate the candidates to a fixed size like v1.

https://arxiv.org/pdf/2007.14966.pdf

Fields

§tau: f32

Tau. Target entropy. A good value is 3.0 according to the paper and HF’s experiments in https://arxiv.org/pdf/2202.00666.pdf

llama.cpp uses a default of 5.0.

§eta: f32

Eta. Learning rate. A good value is 0.1.

§max_keep: Option<NonZeroUsize>

Maximum number of candidates to keep. Defaults to 100. The original implementation does not support this. If identical behavior is desired, set this to the vocabulary size.

§

SplitP

Split P sampling. This cuts the tail off where the difference between adjacent probabilities is greatest, where the slope is steepest.

Fields

§min_keep: NonZeroUsize

Minimum number of candidates to keep.

§max_keep: Option<NonZeroUsize>

Maximum number of candidates to keep.

§

SplitL

Split L sampling. This cuts the tail off where the difference between adjacent logits is greatest, where the slope is steepest.

Fields

§min_keep: NonZeroUsize

Minimum number of candidates to keep.

§max_keep: Option<NonZeroUsize>

Maximum number of candidates to keep.

Enum SamplingMode Copy item path

Variants§

Greedy

TopP

Fields

TopK

Fields

MinP

Fields

TailFree

Fields

LocallyTypical

Fields

Mirostat

Fields

MirostatV2

§Note:

Fields

SplitP

Fields

SplitL

Fields

Implementations§

impl SamplingMode

pub const ALL: [Self; 10]

pub const fn name(&self) -> &'static str

pub const fn help(&self) -> &'static str

pub const fn top_p() -> Self

pub const fn top_k() -> Self

pub const fn min_p() -> Self

pub const fn tail_free() -> Self

pub const fn locally_typical() -> Self

pub const fn mirostat() -> Self

pub const fn mirostat_v2() -> Self

pub const fn split_p() -> Self

pub const fn split_l() -> Self

pub fn draw_inner(&mut self, ui: &mut Ui) -> Response

pub fn draw(&mut self, ui: &mut Ui, index: usize) -> Response

Trait Implementations§

impl Clone for SamplingMode

fn clone(&self) -> SamplingMode

fn clone_from(&mut self, source: &Self)

impl Debug for SamplingMode

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Default for SamplingMode

fn default() -> Self

impl<'de> Deserialize<'de> for SamplingMode

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where __D: Deserializer<'de>,

impl PartialEq for SamplingMode

fn eq(&self, other: &SamplingMode) -> bool

fn ne(&self, other: &Rhs) -> bool

impl Serialize for SamplingMode

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>where __S: Serializer,

impl StructuralPartialEq for SamplingMode

Auto Trait Implementations§

impl Freeze for SamplingMode

impl RefUnwindSafe for SamplingMode

impl Send for SamplingMode

impl Sync for SamplingMode

impl Unpin for SamplingMode

impl UnsafeUnpin for SamplingMode

impl UnwindSafe for SamplingMode

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

fn in_current_span(self) -> Instrumented<Self>

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoCollection<T> for T

fn into_collection<A>(self) -> SmallVec<A>where A: Array<Item = T>,

Enum SamplingMode

fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where __D: Deserializer<'de>,

fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_collection<A>(self) -> SmallVec<A>
where A: Array<Item = T>,

fn mapped<U, F, A>(self, f: F) -> SmallVec<A>
where F: FnMut(T) -> U, A: Array<Item = U>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T> Paint for T
where T: ?Sized,