# llama-cpp-v3
Safe and ergonomic Rust wrapper for [llama.cpp](https://github.com/ggml-org/llama.cpp) with **runtime dynamic loading** and **auto-downloading** support.
## Features
- 🚀 **Runtime Backend Switching**: Switch between CPU, CUDA, Vulkan, and SYCL without recompiling.
- 📦 **Zero-Configuration Build**: No need for a C++ compiler or `llama.cpp` source locally.
- ⏬ **Auto-Download**: Automatically downloads pre-built `llama.cpp` binaries from GitHub releases based on the selected backend and OS.
- 🛡️ **Safe API**: RAII-style wrappers for models, contexts, and samplers.
- 🔄 **Latest llama.cpp**: Support for the modern GGUF and vocabulary APIs.
## Installation
Add this to your `Cargo.toml`:
```toml
[dependencies]
llama-cpp-v3 = "0.1.1" # Example version
```
## Quick Start
```rust
use llama_cpp_v3::{LlamaBackend, LlamaModel, LlamaContext, LlamaSampler, LoadOptions, Backend};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// 1. Load the backend (downloads DLL if missing)
let backend = LlamaBackend::load(LoadOptions {
backend: Backend::Cpu,
app_name: "my_app",
version: None, // uses latest
explicit_path: None,
cache_dir: None,
})?;
// 2. Load a model
let model = LlamaModel::load_from_file(&backend, "tinyllama.gguf", LlamaModel::default_params(&backend))?;
// 3. Create a context
let mut ctx = LlamaContext::new(&model, LlamaContext::default_params(&model))?;
// 4. Tokenize and generate
let tokens = model.tokenize("Hello, my name is", true, true)?;
// ... fill batch and decode ...
Ok(())
}
```
## How it Works
This crate uses `llama-cpp-sys-v3` to dynamically load `llama.dll` (Windows) or `libllama.so` (Linux). When you call `LlamaBackend::load`, the library:
1. Checks the local cache for the specified backend.
2. If missing, it uses the GitHub API to find the latest `llama.cpp` release.
3. Downloads the appropriate binary zip for your OS and Architecture.
4. Extracts and loads the library at runtime.
## License
MIT