Crate kalosm_llama

Source
Expand description

§RLlama

RLlama is a Rust implementation of the quantized Llama 7B language model.

Llama 7B is a very small but performant language model that can be easily run on your local machine.

This library uses Candle to run Llama.

§Usage

use kalosm_llama::prelude::*;

#[tokio::main]
async fn main() {
    let mut model = Llama::new().await.unwrap();
    let prompt = "The capital of France is ";
    let mut stream = model(prompt);

    print!("{prompt}");
    while let Some(token) = stream.next().await {
        print!("{token}");
    }
}

Modules§

prelude
A prelude of commonly used items in kalosm-llama.

Structs§

AttentionMask
Cache
KvCache
A growable kv cache. This cache wraps candles KvCache with exponentially larger allocations as the sequence length increases.
Llama
A quantized Llama language model with support for streaming generation.
LlamaBuilder
A builder with configuration for a Llama model.
LlamaCache
A cache for llama inference. This cache will speed up generation of sequential text significantly.
LlamaChatSession
A Llama chat session.
LlamaSession
A Llama session with cached state for the current fed prompt
LlamaSource
A source for the Llama model.
MaskCache
TensorCache
A growable tensor cache. This cache wraps candles [Cache] with exponentially larger allocations as the sequence length increases.

Enums§

CacheError
LlamaSourceError
Errors that can occur when loading the Llama model.

Functions§

accelerated_device_if_available
Create a candle device that uses any available accelerator.
copy_tensor_into_vec
Clear a Vec<T> and copy the contents of a tensor into it.
maybe_autoreleasepool
Wrap a closure in a release pool if the metal feature is enabled