Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
llama-crab
Safe, ergonomic Rust bindings for llama.cpp.
llama-crab provides:
- Low-level FFI bindings to
llama.cpp,ggml,ggufandmtmdthroughllama-crab-sys. - A safe high-level Rust API for model loading, text completion, chat completion and infill.
- Sampling chains, grammar-constrained decoding and JSON-Schema to GBNF conversion.
- Chat templates, tool-call parsing and OpenAI-compatible data structures.
- Embeddings, reranking, prompt cache, session state and speculative decoding.
- Multimodal support through
mtmdfor vision and audio capable GGUF models. - Hardware backends for CPU, Metal, CUDA, Vulkan and ROCm through Cargo features.
Documentation is available at docs.rs/llama-crab and in the mdBook user guide.
Installation
Add the crate to your Cargo.toml:
[]
= "0.1"
By default, llama-crab enables CPU OpenMP support and Apple Metal on aarch64 macOS. To choose backends explicitly, disable default features and enable the ones you need:
[]
= { = "0.1", = false, = ["cuda", "openmp"] }
The crate builds the bundled llama.cpp sources through CMake. You need:
- Rust 1.88 or newer.
- CMake 3.18 or newer.
- A C and C++ compiler supported by
llama.cpp. - A platform SDK when using GPU backends such as Metal, CUDA, Vulkan or ROCm.
Cargo Features
| Feature | Description |
|---|---|
openmp |
CPU backend with OpenMP. Enabled by default. |
metal |
Apple Metal backend. Enabled by default on aarch64 macOS. |
cuda |
NVIDIA CUDA backend. |
cuda-no-vmm |
CUDA backend without virtual memory management. |
vulkan |
Vulkan backend. |
rocm |
AMD ROCm/HIP backend. |
mtmd |
Multimodal support through mtmd.h; enables image/audio helpers. |
common |
Builds llama.cpp common utilities used by chat and grammar helpers. |
llguidance |
Enables the llguidance sampler integration. |
hf-tokenizer |
Enables Hugging Face tokenizer support. |
disk-cache |
Enables the persistent sled-backed prompt cache. |
dynamic-link |
Links llama.cpp as a shared object. |
dynamic-backends |
Loads GGML backends dynamically. |
system-ggml |
Uses a system GGML installation instead of the bundled copy. |
Basic Usage
Load a GGUF model and generate a text completion:
use ;
Chat Completion
Chat completion accepts a list of role-based messages. Built-in templates can be selected explicitly when you need deterministic formatting.
use BuiltinTemplate;
use ;
use ;
JSON Schema and Grammar-Constrained Decoding
llama-crab can convert JSON Schema into GBNF grammar and use grammar samplers to constrain model output.
use json_schema_grammar;
use json;
let schema = json!;
let grammar = json_schema_grammar?;
# let _ = grammar;
# Ok::
See the structured example for a complete program.
Tool Calling
The chat module includes incremental tool-call parsing for common model formats, including ChatML, Mistral, Llama 3, Functionary and plain JSON object output.
use ;
let mut parser = new;
let calls = parser.feed;
# let _ = calls;
# Ok::
See Chat & tool calling for supported formats and parser behavior.
Embeddings and Reranking
Enable embeddings when loading the model, then call Llama::embed to get an optionally L2-normalized vector.
use ;
See Embeddings & reranking, embeddings and embedding_search.
Multimodal Models
The mtmd feature exposes llama.cpp's multimodal pipeline for GGUF models that use a paired projector.
[]
= { = "0.1", = ["mtmd"] }
Supported workflows include:
- Loading a text model and an
mmprojprojector. - Decoding local images into
MtmdBitmap. - Tokenizing text and media together with
MtmdContext. - Evaluating multimodal chunks and continuing generation with normal samplers.
See Multimodal, vision, mtmd, and the integration tests under llama-crab/tests.
Speculative Decoding
Prompt-lookup speculative decoding is available through the speculative module. It can draft candidate tokens from repeated n-grams in the prompt and verify them with the main model.
See Speculative decoding and the speculative example.
Examples
The repository contains runnable example crates under examples/. The helper script downloads known-good GGUF fixtures on first run.
Each example is a standalone Cargo crate and can be copied into another project.
Documentation
To serve the guide locally:
Crates
| Crate | Description |
|---|---|
llama-crab |
Safe high-level API and Rust abstractions. |
llama-crab-sys |
Low-level FFI package that builds and links llama.cpp. |
Most applications should depend on llama-crab. Use llama-crab-sys only when you need direct access to raw llama.cpp symbols.
Development
Clone with submodules:
Common checks:
The minimum supported Rust version is 1.88 and is pinned in rust-toolchain.toml.
License
Licensed under the MIT License. See LICENSE-MIT.
Acknowledgements
llama-crab builds on llama.cpp.
Inspired by llama-cpp-rs and the feature completeness of llama-cpp-python.