Crate kalosm_llama

source ·
Expand description

§RLlama

RLlama is a Rust implementation of the quantized Llama 7B language model.

Llama 7B is a very small but performant language model that can be easily run on your local machine.

This library uses Candle to run Llama.

§Usage

use kalosm_llama::prelude::*;

#[tokio::main]
async fn main() {
    let mut model = Llama::default();
    let prompt = "The capital of France is ";
    let mut result = model.stream_text(prompt).await?;

    print!("{prompt}");
    while let Some(token) = result.next().await {
        print!("{token}");
    }
}

Modules§

  • A prelude of commonly used items in kalosm-llama.

Structs§

  • A quantized Llama language model with support for streaming generation.
  • A builder with configuration for a Llama model.
  • A cache for Llama inference. This cache will speed up generation of sequential text significantly.
  • The inner, synchronous Llama model.
  • A Llama-1.5 session.
  • A source for the Llama model.

Enums§

  • A source for a file, either from Hugging Face or a local path