Crate kalosm_llama

Expand description

§RLlama

RLlama is a Rust implementation of the quantized Llama 7B language model.

Llama 7B is a very small but performant language model that can be easily run on your local machine.

This library uses Candle to run Llama.

§Usage

use kalosm_llama::prelude::*;

#[tokio::main]
async fn main() {
    let mut model = Llama::default();
    let prompt = "The capital of France is ";
    let mut result = model.stream_text(prompt).await?;

    print!("{prompt}");
    while let Some(token) = result.next().await {
        print!("{token}");
    }
}

Modules§

prelude
A prelude of commonly used items in kalosm-llama.

Structs§

Llama
A quantized Llama language model with support for streaming generation.
LlamaBuilder
A builder with configuration for a Llama model.
LlamaCache
A cache for Llama inference. This cache will speed up generation of sequential text significantly.
LlamaModel
The inner, synchronous Llama model.
LlamaSession
A Llama-1.5 session.
LlamaSource
A source for the Llama model.

Enums§

FileSource
A source for a file, either from Hugging Face or a local path