llm-samplers
Token samplers for large language models, written in Rust!
Status
Extremely early in development, poorly tested. You can look at src/tests.rs for some examples of use.
Also a fairly simple example of using Mirostat with my RWKV project here: https://github.com/KerfuffleV2/smolrsrwkv/blob/60b8e8bfe64f157f1800445128af3b4adbbc64c1/smolrwkv-cli/src/main.rs#L139-L164
For notes on migrating from 0.0.6 to 0.0.7, see below.
Samplers
Using the term "sampler" here loosely, perhaps it should be renamed in the future. Right now a "sampler" could be something that manipulates the list of logits (for example, a top-k sampler might prune the list to the top K entries), it might actually pick a token or both!
- Flat bias - biases tokens by the specified amount
- Frequency / presence - Applies frequency and presence penalties
- Greedy - picks the token ID with the highest probability
- Locally typical
- Mirostat V1
- Mirostat V2
- Random distribution - picks a token ID based on weighted probabilities
- Repetition - applies a repetition penalty
- Tail free
- Temperature
- Top-K
- Top-P
- Min-P
- Top-A
Real descriptions may (or may not happen) eventually. For now, you can check out the llama.cpp main example README for a brief overview of some of the types of sampler: https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md#generation-flags
Example
You probably won't usually want to use individual Samplers. The most typical
use case is going to be chaining a number of samplers together.
A simple example of constructing a [SamplerChain]:
use Result;
use *;
The previous example is simple but not very realistic: the greedy sampler doesn't even care about temperature. Now let's look at something a bit more complicated:
use Result;
use ;
use *;
0.0.6 to 0.0.7 Migration
Unfortunately, this involved some breaking changes. Basically, the samplers and chains no
longer take token id and logits type variables anymore. You can have your token ids in any
color you like, as long as it's u32. Same for logits: they're always f32 now.
For example, where previously you would have done SampleRandDistrib::<u32>::new or SampleMirostat2::<u32, f32>::new,
you only need SampleRandDistrib::new, SampleMirostat2::new. Same for creating chains: SamplerChain::<u32, f32>::new will
only need SamplerChain::new.
Links
Note: Crate/docs version likely won't match this repo.
Credits
Initial version closely referenced from the samplers in the llama.cpp project (although not a line-by-line port). Thanks!