llama-cpp-sys-4 0.2.2

Low Level Bindings to llama.cpp
Documentation

llama-cpp-sys-4

Crates.io License

Raw bindgen-generated bindings to llama.cpp, plus the C/C++ build logic that compiles the library.

llama.cpp version: b8249 · Crate version: 0.2.1

Unless you need access to a symbol not yet exposed by llama-cpp-4, use that crate instead — it provides a safe API over these raw bindings.


What's included

  • llama_* functions and types from llama.h
  • ggml_* functions and types from ggml/include/ggml.h
  • LLAMA_* constants
  • common_tokenize and common_token_to_piece from common/common.h
  • The entire llama.cpp static library (or shared, with dynamic-link)

Feature flags

Feature Description
openmp OpenMP multi-threading (default on; auto-detected on ARM platforms)
cuda NVIDIA GPU (requires CUDA toolkit)
metal Apple GPU (macOS/iOS only)
vulkan Vulkan GPU backend
native -march=native — tune for the build machine's CPU
rpc Remote compute backend
dynamic-link Link against a pre-installed shared libllama instead of building from source

Building

The crate compiles llama.cpp from the vendored submodule at build time using cc + cmake-style flags. No external llama.cpp installation is required.

# CPU only (default)
cargo build -p llama-cpp-sys-4

# Metal (macOS)
cargo build -p llama-cpp-sys-4 --features metal

# CUDA
cargo build -p llama-cpp-sys-4 --features cuda

# OpenMPI (distributed inference)
brew install open-mpi   # or apt install libopenmpi-dev
cargo build -p llama-cpp-sys-4 --features mpi

Build dependencies

  • clang — required by bindgen to parse the C++ headers
  • A C++17 compiler (GCC 9+, Clang 10+, MSVC 2019+)
  • cmake is not required — the build is driven entirely by build.rs

Regenerating bindings

Bindings are regenerated automatically whenever build.rs or wrapper.h changes. The allowlist covers llama_*, ggml_*, LLAMA_*, and the two common_* functions.

# Force a full rebuild including binding regeneration
touch llama-cpp-sys-4/wrapper.h
cargo build -p llama-cpp-sys-4

Notable API changes (b4689 → b8249)

These are the upstream llama.cpp breaks handled in this crate:

Removed / renamed Replacement
llama_kv_cache_* functions llama_memory_* via llama_get_memory(ctx)
llama_set_adapter_lora + llama_rm_adapter_lora llama_set_adapters_lora (batch API)
context_params.flash_attn: bool context_params.flash_attn_type: llama_flash_attn_type
llama-sampling.h llama-sampler.h
C++11 build flag C++17 required by new common.h (std::string_view)

Bindgen configuration

Key decisions in build.rs:

  • derive_partialeq(true) with no_partialeq(...) overrides for structs containing function-pointer fields (avoids the unpredictable_function_pointer_comparisons lint).
  • opaque_type("std::.*") — C++ STL types are opaque pointers.
  • OpenMP auto-detection — reads GGML_OPENMP_ENABLED from the CMake cache rather than relying solely on the openmp feature flag, because some ARM toolchains enable OpenMP unconditionally.