docs.rs failed to build llama-crab-0.1.1
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build:
llama-crab-0.1.201
llama-crab
Safe, ergonomic and complete Rust bindings for llama.cpp.
Inspired by
llama-cpp-rsand the feature completeness ofllama-cpp-python.
llama-crab provides two crates:
| Crate | Purpose |
|---|---|
llama-crab-sys |
Low-level, hand-curated FFI over llama.h, ggml.h, gguf.h (and mtmd.h) generated via bindgen and cmake. |
llama-crab |
Safe, idiomatic Rust API: LlamaModel, LlamaContext, sampling chains, chat templates, tool calling, multimodal, speculative decoding, caching, embeddings, reranking. |
Quickstart
Add to your Cargo.toml:
[]
= "0.1"
Load a GGUF model and generate text:
use ;
Feature matrix
| Feature | Status |
|---|---|
| GGUF model loading (mmap, mlock) | ✅ |
| Multi-GPU layer offload (Metal, CUDA, Vulkan, HIP) | ✅ |
| KV cache quantization (Q2_K … Q8_K, IQ*) | ✅ |
| RoPE scaling (linear, yarn, longrope) | ✅ |
| Flash attention, SWA, MTP | ✅ |
| All sampling strategies (greedy, top-k/p, min-p, typical, xtc, mirostat v1/v2, dry, adaptive_p, infill, logit-bias, grammar, …) | ✅ |
| Custom samplers (Rust C-ABI vtable) | ✅ |
| GBNF grammar + JSON schema constrained decoding | ✅ |
| Chat templates (Jinja2 subset + 20+ builtins) | ✅ |
| Tool calling (functionary v1/v2, chatml, hermes, qwen, llama-3) | ✅ |
| Streaming JSON parsers (incremental tool-call deltas) | ✅ |
| Embeddings (mean/cls/last pooling + L2 normalize) | ✅ |
| Reranking (rank pooling) | ✅ |
| FIM infill (PSM/SPM) | ✅ |
| Speculative decoding (prompt-lookup n-gram + custom draft models) | ✅ |
| State save/load (full + per-sequence, with flags) | ✅ |
| Prompt + KV cache (RAM/Disk, prefix-match) | ✅ |
| Multimodal (mtmd): vision + audio chat handlers | ✅ (feature mtmd) |
HF AutoTokenizer (feature hf-tokenizer) |
✅ |
llguidance (feature llguidance) |
✅ |
| OpenAI-compatible HTTP server | ⛔ out of v0.1 (planned as llama-crab-server) |
Backends
| Backend | Feature | Default? |
|---|---|---|
| CPU (OpenMP) | openmp |
✅ |
| Apple Metal (macOS/iOS) | metal |
✅ on macOS aarch64 |
| NVIDIA CUDA | cuda |
– |
| NVIDIA CUDA (no VMM) | cuda-no-vmm |
– |
| Vulkan | vulkan |
– |
| AMD ROCm/HIP | rocm |
– |
| Dynamic linking | dynamic-link |
– |
| System GGML | system-ggml |
– |
| Dynamic backends | dynamic-backends |
– |
License
Dual-licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or https://opensource.org/licenses/MIT)
at your option.