Expand description
§RLlama
RLlama is a Rust implementation of the quantized Llama 7B language model.
Llama 7B is a very small but performant language model that can be easily run on your local machine.
This library uses Candle to run Llama.
§Usage
use kalosm_llama::prelude::*;
#[tokio::main]
async fn main() {
let mut model = Llama::new().await.unwrap();
let prompt = "The capital of France is ";
let mut stream = model(prompt);
print!("{prompt}");
while let Some(token) = stream.next().await {
print!("{token}");
}
}
Modules§
- prelude
- A prelude of commonly used items in kalosm-llama.
Structs§
- Attention
Mask - Cache
- KvCache
- A growable kv cache. This cache wraps candles
KvCache
with exponentially larger allocations as the sequence length increases. - Llama
- A quantized Llama language model with support for streaming generation.
- Llama
Builder - A builder with configuration for a Llama model.
- Llama
Cache - A cache for llama inference. This cache will speed up generation of sequential text significantly.
- Llama
Chat Session - A Llama chat session.
- Llama
Session - A Llama session with cached state for the current fed prompt
- Llama
Source - A source for the Llama model.
- Mask
Cache - Tensor
Cache - A growable tensor cache. This cache wraps candles [
Cache
] with exponentially larger allocations as the sequence length increases.
Enums§
- Cache
Error - Llama
Source Error - Errors that can occur when loading the Llama model.
Functions§
- accelerated_
device_ if_ available - Create a candle device that uses any available accelerator.
- copy_
tensor_ into_ vec - Clear a
Vec<T>
and copy the contents of a tensor into it. - maybe_
autoreleasepool - Wrap a closure in a release pool if the metal feature is enabled