llama-cpp-sys-4
Raw bindgen-generated bindings to llama.cpp,
plus the C/C++ build logic that compiles the library.
llama.cpp version: b8249 · Crate version: 0.2.5
Unless you need access to a symbol not yet exposed by llama-cpp-4,
use that crate instead — it provides a safe API over these raw bindings.
What's included
llama_*functions and types fromllama.hggml_*functions and types fromggml/include/ggml.hLLAMA_*constantscommon_tokenizeandcommon_token_to_piecefromcommon/common.h- The entire llama.cpp static library (or shared, with
dynamic-link)
Feature flags
| Feature | Description |
|---|---|
openmp |
OpenMP multi-threading (default on; auto-detected on ARM platforms) |
cuda |
NVIDIA GPU (requires CUDA toolkit) |
metal |
Apple GPU (macOS/iOS only) |
vulkan |
Vulkan GPU backend |
native |
-march=native — tune for the build machine's CPU |
rpc |
Remote compute backend |
dynamic-link |
Link against a pre-installed shared libllama instead of building from source |
Building
The crate compiles llama.cpp from the vendored submodule at build time using
cc + cmake-style flags. No external llama.cpp installation is required.
# CPU only (default)
# Metal (macOS)
# CUDA
# OpenMPI (distributed inference)
Build dependencies
clang— required bybindgento parse the C++ headers- A C++17 compiler (GCC 9+, Clang 10+, MSVC 2019+)
cmakeis not required — the build is driven entirely bybuild.rs
Regenerating bindings
Bindings are regenerated automatically whenever build.rs or wrapper.h
changes. The allowlist covers llama_*, ggml_*, LLAMA_*, and the two
common_* functions.
# Force a full rebuild including binding regeneration
Notable API changes (b4689 → b8249)
These are the upstream llama.cpp breaks handled in this crate:
| Removed / renamed | Replacement |
|---|---|
llama_kv_cache_* functions |
llama_memory_* via llama_get_memory(ctx) |
llama_set_adapter_lora + llama_rm_adapter_lora |
llama_set_adapters_lora (batch API) |
context_params.flash_attn: bool |
context_params.flash_attn_type: llama_flash_attn_type |
llama-sampling.h |
llama-sampler.h |
| C++11 build flag | C++17 required by new common.h (std::string_view) |
Bindgen configuration
Key decisions in build.rs:
derive_partialeq(true)withno_partialeq(...)overrides for structs containing function-pointer fields (avoids theunpredictable_function_pointer_comparisonslint).opaque_type("std::.*")— C++ STL types are opaque pointers.- OpenMP auto-detection — reads
GGML_OPENMP_ENABLEDfrom the CMake cache rather than relying solely on theopenmpfeature flag, because some ARM toolchains enable OpenMP unconditionally.