Crate llama_cpp

source ·
Expand description

High-level bindings to llama.cpp’s C API, providing a predictable, safe, and high-performance medium for interacting with Large Language Models (LLMs) on consumer-grade hardware.

Along with llama.cpp, his crate is still in an early state, and breaking changes may occur between versions. The high-level API, however, is fairly settled on.

To get started, create a LlamaModel and a LlamaSession:

use llama_cpp::LlamaModel;

// Create a model from anything that implements `AsRef<Path>`:
let model = LlamaModel::load_from_file("path_to_model.gguf").expect("Could not load model");

// A `LlamaModel` holds the weights shared across many _sessions_; while your model may be
// several gigabytes large, a session is typically a few dozen to a hundred megabytes!
let mut ctx = model.create_session();

// You can feed anything that implements `AsRef<[u8]>` into the model's context.
ctx.advance_context("This is the story of a man named Stanley.").unwrap();

// LLMs are typically used to predict the next word in a sequence. Let's generate some tokens!
let max_tokens = 1024;
let mut decoded_tokens = 0;

// `ctx.get_completions` creates a worker thread that generates tokens. When the completion
// handle is dropped, tokens stop generating!

let mut completions = ctx.start_completing();

while let Some(next_token) = completions.next_token() {
    println!("{}", String::from_utf8_lossy(next_token.as_bytes()));

    decoded_tokens += 1;

    if decoded_tokens > max_tokens {
        break;
    }
}

Dependencies

This crate depends on (and builds atop) llama_cpp_sys, and builds llama.cpp from source. You’ll need libclang, cmake, and a C/C++ toolchain (clang is preferred) at the minimum. See llama_cpp_sys for more details.

The bundled GGML and llama.cpp binaries are statically linked by default, and their logs are re-routed through tracing instead of stderr. If you’re getting stuck, setting up tracing for more debug information should be at the top of your troubleshooting list!

Undefined Behavior / Panic Safety

It should be impossible to trigger undefined behavior from this crate, and any UB is considered a critical bug. UB triggered downstream in llama.cpp or ggml should have issues filed and mirrored in llama_cpp-rs’s issue tracker.

While panics are considered less critical, this crate should never panic, and any panic should be considered a bug. We don’t want your control flow!

Minimum Stable Rust Version (MSRV) Policy

This crates supports Rust 1.73.0 and above.

License

MIT or Apache 2.0 (the “Rust” license), at your option.

Structs

Enums