# slm_ikllama_sys
Raw Rust FFI bindings for [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp) —
a performance-oriented fork of llama.cpp with improved CPU/GPU inference, new quantization
types, first-class DeepSeek/MLA support, and fused MoE operations.
The `ik_llama.cpp` source tree is included as a git submodule and compiled from source
during `cargo build` via a CMake-based `build.rs`.
## What this crate provides
- `bindings.rs` — generated by `bindgen` at build time from `ik_llama.cpp/include/llama.h`
and `ik_llama.cpp/ggml/include/ggml.h`; exposes the full `llama_*` and `ggml_*` C API.
- `ik_llama_cpp_wrapper` — a thin C++ static library (`wrapper.cpp`) compiled with `cc`
that gives Rust a stable link point into the ik_llama.cpp headers.
- Shared libraries (`libllama.so`, `libggml.so`, …) built by CMake and hard-linked into
the Cargo target directory so they are found at runtime.
This crate is **not intended for direct use** — consume it through [`slm_ikllama`](../slm_ikllama),
which implements the `slm_inference` trait layer on top of these bindings.
## Build
The build script compiles ik_llama.cpp with CMake. A few environment variables control
the build:
| `LLAMA_LIB_PROFILE` | `Release` | CMake build profile |
| `LLAMA_STATIC_CRT` | `0` | Link against static CRT (Windows MSVC) |
| `BUILD_DEBUG` | unset | Print verbose build diagnostics |
| `CMAKE_VERBOSE` | unset | Make CMake very verbose |
| `CMAKE_*` | — | Any `CMAKE_*` env var is forwarded to CMake as-is |
## Features
| `cuda` *(default)* | Enable NVIDIA CUDA via `GGML_CUDA`; links `cudart`, `cublas`, `cublasLt` |
| `native` *(default)* | Compile for the host CPU architecture (`CMAKE_CUDA_ARCHITECTURES=native`, `GGML_NATIVE=ON`) |
When `native` is off, the build targets a broad x86-64 baseline (AVX2, AVX512-BF16/VBMI/VNNI).
## Platform notes
- **Linux** — links `stdc++` dynamically; shared `.so` files are placed in the Cargo target directory.
- **macOS** — links `Foundation`, `Metal`, `MetalKit`, `Accelerate` frameworks; `.dylib` files are placed in the target directory.
- **Windows MSVC** — MSVC include paths are discovered via the `cc` crate; `.dll` files are copied to the target and `deps` directories.
- **CUDA architectures** — with `native`, the GPU arch is auto-detected; without it, `86;89;120` (RTX 30/40/50xx) are targeted.