slm_ikllama_sys 0.1.1

ik_llama.cpp rust sys bindings
# slm_ikllama_sys

Raw Rust FFI bindings for [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp) —
a performance-oriented fork of llama.cpp with improved CPU/GPU inference, new quantization
types, first-class DeepSeek/MLA support, and fused MoE operations.

The `ik_llama.cpp` source tree is included as a git submodule and compiled from source
during `cargo build` via a CMake-based `build.rs`.

## What this crate provides

- `bindings.rs` — generated by `bindgen` at build time from `ik_llama.cpp/include/llama.h`
  and `ik_llama.cpp/ggml/include/ggml.h`; exposes the full `llama_*` and `ggml_*` C API.
- `ik_llama_cpp_wrapper` — a thin C++ static library (`wrapper.cpp`) compiled with `cc`
  that gives Rust a stable link point into the ik_llama.cpp headers.
- Shared libraries (`libllama.so`, `libggml.so`, …) built by CMake and hard-linked into
  the Cargo target directory so they are found at runtime.

This crate is **not intended for direct use** — consume it through [`slm_ikllama`](../slm_ikllama),
which implements the `slm_inference` trait layer on top of these bindings.

## Build

The build script compiles ik_llama.cpp with CMake.  A few environment variables control
the build:

| Variable | Default | Description |
|---|---|---|
| `LLAMA_LIB_PROFILE` | `Release` | CMake build profile |
| `LLAMA_STATIC_CRT` | `0` | Link against static CRT (Windows MSVC) |
| `BUILD_DEBUG` | unset | Print verbose build diagnostics |
| `CMAKE_VERBOSE` | unset | Make CMake very verbose |
| `CMAKE_*` || Any `CMAKE_*` env var is forwarded to CMake as-is |

## Features

| Feature | Description                                                                                 |
|---|---------------------------------------------------------------------------------------------|
| `cuda` *(default)* | Enable NVIDIA CUDA via `GGML_CUDA`; links `cudart`, `cublas`, `cublasLt`                    |
| `native` *(default)* | Compile for the host CPU architecture (`CMAKE_CUDA_ARCHITECTURES=native`, `GGML_NATIVE=ON`) |

When `native` is off, the build targets a broad x86-64 baseline (AVX2, AVX512-BF16/VBMI/VNNI).

## Platform notes

- **Linux** — links `stdc++` dynamically; shared `.so` files are placed in the Cargo target directory.
- **macOS** — links `Foundation`, `Metal`, `MetalKit`, `Accelerate` frameworks; `.dylib` files are placed in the target directory.
- **Windows MSVC** — MSVC include paths are discovered via the `cc` crate; `.dll` files are copied to the target and `deps` directories.
- **CUDA architectures** — with `native`, the GPU arch is auto-detected; without it, `86;89;120` (RTX 30/40/50xx) are targeted.