llama-cpp-v3
Safe and ergonomic Rust wrapper for llama.cpp with runtime dynamic loading and auto-downloading support.
Features
- 🚀 Runtime Backend Switching: Switch between CPU, CUDA, Vulkan, and SYCL without recompiling.
- 📦 Zero-Configuration Build: No need for a C++ compiler or
llama.cppsource locally. - ⏬ Auto-Download: Automatically downloads pre-built
llama.cppbinaries from GitHub releases based on the selected backend and OS. - 🛡️ Safe API: RAII-style wrappers for models, contexts, and samplers.
- 🔄 Latest llama.cpp: Support for the modern GGUF and vocabulary APIs.
Installation
Add this to your Cargo.toml:
[]
= "0.1.1" # Example version
Quick Start
use ;
How it Works
This crate uses llama-cpp-sys-v3 to dynamically load llama.dll (Windows) or libllama.so (Linux). When you call LlamaBackend::load, the library:
- Checks the local cache for the specified backend.
- If missing, it uses the GitHub API to find the latest
llama.cpprelease. - Downloads the appropriate binary zip for your OS and Architecture.
- Extracts and loads the library at runtime.
License
MIT