Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
swink-agent-local-llm
On-device LLM inference for swink-agent powered by llama.cpp — ship an agent that runs with no network and no API keys.
Features
- SmolLM3-3B (default, GGUF
Q4_K_M, ~1.92 GB) — text generation, tool use, and reasoning on CPU-only hardware - Gemma 4 E2B (
gemma4feature, ~3.5 GB) — 128K context with native thinking mode and tool calling - EmbeddingGemma-300M (<200 MB) — text embeddings for semantic search and RAG
- GGUF weights are lazily downloaded from HuggingFace on first use (
hf-hub) - GPU acceleration:
metal(Apple),cuda(NVIDIA),vulkan(cross-platform) — CPU-only works by default default_local_connection()returns a readyModelConnection— drop it intoModelConnectionsalongside remote adapters- Models are designed for
Arc<>sharing across concurrent tasks
Quick Start
[]
= "0.9.0"
= { = "0.9.0", = ["metal"] } # or "cuda", "vulkan", or none for CPU
= { = "1", = ["full"] }
use *;
use default_local_connection;
async
Architecture
LocalStreamFn implements the core StreamFn trait by driving a loaded LocalModel through a token-by-token generation loop, emitting the same AssistantMessageEvent stream as remote adapters. ModelPreset holds the catalog of supported GGUF weights and download URLs; EmbeddingModel runs a separate llama.cpp context tuned for pooled-output embeddings. First-run downloads show progress via a ProgressCallbackFn hook.
swink-agent-local-llm depends on llama-cpp-2, which builds a C++ backend via cmake and generates bindings via bindgen. Contributor machines need LLVM/libclang available; set LIBCLANG_PATH to the LLVM bin directory if auto-discovery fails. Expect ~5 minutes on the first build.
No unsafe code in this crate (#![forbid(unsafe_code)]). The unsafe required for FFI into llama.cpp is encapsulated in the upstream llama-cpp-2 sys crate.
Part of the swink-agent workspace — see the main README for workspace overview and setup.