English | 한국어 | 日本語 | 简体中文 | 繁體中文 | Español | Français | Deutsch | Português | Русский | Italiano
llm-kernel
Foundation library for Rust AI-native apps — provider catalog, LLM client, MCP server, search, telemetry, and safety
Overview
llm-kernel provides the foundational layer for building LLM-powered tools, agents, and servers in Rust:
- Provider catalog — 16 built-in providers, 114 models with metadata, pricing, and capabilities
- Async client — trait-based client for OpenAI and Anthropic with SSE streaming
- Model discovery — dynamic model discovery from models.dev, Ollama, OpenAI-compatible endpoints
- Credential vault — dotenv-style API key management with atomic writes
- Config loader — TOML config with auto-create from template
- Knowledge graph — SQLite-backed graph with FTS5 search, smart recall, BFS traversal, async wrappers
- MCP server — JSON-RPC 2.0 server framework with stdio transport and Bearer auth
- Embedding — provider trait + cosine similarity, local ONNX (44 models), Qwen3 candle, Nomic V2 MoE candle, OpenAI remote (full model list →)
- Search — Reciprocal Rank Fusion for hybrid search result merging
- Token estimation — zero-dependency Unicode-script heuristic token counting
- Telemetry — enum-gated events with no PII, console and noop sinks
- Safety — secret masking, error classification, output sanitization
- Install wizard — MCP config generation for Claude Desktop, Cursor, Copilot, OpenCode, Cline
Feature flags
Each module is gated behind a feature flag so you only pay for what you use.
| Feature | Description | Default |
|---|---|---|
provider |
Provider catalog, model descriptors, pricing | ✅ |
client-async |
Async LLM client (reqwest) with streaming | |
discovery |
Dynamic model discovery (models.dev, Ollama, OpenAI-compat) | |
secrets |
SecretVault credential management | |
store |
SQLite init helpers (WAL, FTS5, schema versioning) | |
config |
TOML config loader | |
graph |
Knowledge graph — SQLite, FTS5, smart recall, BFS traversal | |
graph-async |
Async graph wrappers (requires tokio) | |
graph-pool |
Multi-connection async graph pool (AsyncPoolGraph, WAL concurrency) |
|
mcp |
MCP server — JSON-RPC 2.0, stdio transport, Bearer auth | |
tokens |
Token estimation with Unicode-script heuristics | |
install |
AI tool installation wizard | |
search |
Hybrid search with Reciprocal Rank Fusion | |
embedding |
Embedding provider trait + cosine similarity | |
embedding-openai |
OpenAI text-embedding client (sync HTTP) | |
embedding-fastembed |
Local ONNX embedding via fastembed-rs (44 models) | |
embedding-fastembed-qwen3 |
Qwen3 embedding via candle backend | |
embedding-fastembed-nomic-moe |
Nomic V2 MoE embedding via candle backend | |
telemetry |
Enum-gated telemetry events, no PII | |
safety |
Secret masking, error classification, output sanitization | |
full |
All features |
Quick start
Add to your Cargo.toml:
[]
= "0.1.0"
The provider feature is enabled by default. For the async client:
[]
= { = "0.1.0", = ["client-async"] }
For the knowledge graph with async wrappers:
[]
= { = "0.1.0", = ["graph", "graph-async"] }
For local embedding (ONNX, no API key):
[]
= { = "0.1.0", = ["embedding-fastembed"] }
Usage
Provider catalog
The embedded catalog contains 16 providers with 114 models aligned to the models.dev schema.
use *;
let catalog = embedded;
// List all providers
for id in catalog.ids
// Query models for a provider
for model in catalog.models_for
// Find a specific model
if let Some = catalog.find_model
Async chat completion
use *;
let config = ModelConfig ;
let client = new?;
let response = client.complete.await?;
println!;
println!;
Streaming
use *;
let config = ModelConfig ;
let client = new?;
let stream = client.stream_complete.await?;
// Stream yields Delta, Usage, and Done events
Model discovery
use ;
// Fetch from models.dev (caches to disk)
let payload = fetch_and_cache?;
for model in &payload.models
// Load from cache (no network)
if let Some = load_cache?
// Discover local Ollama models
let ollama_models = fetch_ollama_models?;
for name in &ollama_models
Credential vault
use *;
let vault = load_from?;
vault.set;
vault.save_to?;
// Redact credentials for logging
println!;
// → "sk-abcd...7890"
TOML config
use load_toml_config;
use Deserialize;
let config: AppConfig = load_toml_config?;
SQLite store
use init_schema;
let ddl = "CREATE TABLE items (id TEXT PRIMARY KEY, content TEXT);";
let conn = init_schema?;
// WAL mode, busy timeout, and schema versioning applied automatically
Knowledge graph
use *;
use Connection;
let conn = open_in_memory.unwrap;
init_graph_schema.unwrap;
// Create nodes
upsert_node.unwrap;
// Connect with edges
append_edge.unwrap;
// Smart recall with composite scoring
let results = smart_recall.unwrap;
for scored in &results
// Lifecycle management
decay_importance.unwrap;
tag_stale_nodes.unwrap;
let stats = compute_stats.unwrap;
println!;
MCP server
use ;
use json;
let mut server = new;
server.register_tool;
// Runs JSON-RPC 2.0 over stdio with Bearer auth
server.run_stdio.await?;
Token estimation
use estimate_tokens;
let tokens = estimate_tokens;
println!;
Embedding + search
use ;
use ;
// Cosine similarity between vectors
let sim = cosine_similarity;
// Reciprocal Rank Fusion for hybrid search
let bm25 = vec!;
let vector = vec!;
let merged = rrf_fuse;
Local ONNX embedding (fastembed-rs)
44 models via ONNX Runtime — no API key, no network after first download.
use ;
let provider = new?;
let result = provider.embed?;
assert_eq!;
Qwen3 embedding (candle)
Pure Rust GPU/CPU inference via candle-nn — no ONNX Runtime.
use ;
let provider = new?;
let result = provider.embed?;
Nomic V2 MoE embedding (candle)
Lightweight MoE model — 8 experts, top-2 routing, 305M active params.
use ;
let provider = new?;
let result = provider.embed?;
assert_eq!;
Safety utilities
use ;
// Mask secrets in logs
let safe = mask_secrets;
// → "Authorization: Bearer [REDACTED]"
// Classify errors
let category = classify_failure;
// → ErrorCategory::Timeout
// Sanitize untrusted output
let clean = sanitize_output?;
Model metadata
Each model in the catalog includes:
| Field | Description |
|---|---|
cost |
Per-million-token pricing (input, output, cache_read, cache_write) |
limit |
Context and output token limits |
modalities |
Input/output modalities (text, image, audio) |
capabilities |
Flags: attachment, reasoning, temperature, tool_call, streaming |
knowledge |
Training data cutoff date |
Why llm-kernel?
| llm-kernel | rig | langchain-rust | |
|---|---|---|---|
| Provider catalog | ✅ 16 providers, 114 models built-in | Manual config | Manual config |
| Feature gates | ✅ 20 independent modules | Monolithic | Monolithic |
| Local embedding | ✅ 44 ONNX + Qwen3 + Nomic MoE | ❌ | ❌ |
| MCP server | ✅ JSON-RPC 2.0 | ❌ | ❌ |
| Knowledge graph | ✅ SQLite + FTS5 + smart recall | ❌ | ❌ |
| Mandatory deps | serde only |
reqwest, tokio, … |
Many |
| Chains / agents | ❌ | ✅ | ✅ |
| RAG pipelines | ❌ | ✅ | ✅ |
llm-kernel is a lightweight foundation layer — compose it with rig or langchain-rust when you need chains, agents, or RAG.
Architecture
┌──────────────────────────────────────────┐
│ Your app │
├──────────────────────────────────────────┤
│ prelude │ ← use llm_kernel::prelude::*;
├───────────────┬──────────┬───────────────┤
│ provider │ client │ discovery │ ← catalog, async LLM, model discovery
│ catalog │ async │ │
├───────────────┴──────────┴───────────────┤
│ graph │ mcp │ embedding │ search │ ← graph, MCP, ONNX/Qwen3/Nomic embed, RRF
├──────────────────────────────────────────┤
│ tokens │ telemetry │ safety │ install │ ← token est., events, masking, wizard
├──────────────────────────────────────────┤
│ secrets │ config │ store │ ← vault, TOML, SQLite infra
└──────────────────────────────────────────┘
LLMClienttrait — unified interface forOpenAIClientandAnthropicClientEmbeddingProvidertrait — unified interface forFastembedProvider(ONNX),Qwen3Provider(candle),NomicMoeProvider(candle),OpenAIEmbeddingClient(remote)ProviderIndex— zero-copy access to embedded catalog, queryable by provider or modelMcpServer— JSON-RPC 2.0 server with stdio transport, Bearer auth, tool registrationSecretVault—HashMap<String, String>with dotenv load/save and symlink guardsgraph— SQLite knowledge graph with FTS5 search, composite scoring recall, BFS traversal, importance decayTelemetryEvent— enum-gated variants for structured observability (no PII)safety— secret masking, error classification, bidi/ANSI/null sanitization
Benchmarks
Criterion benchmarks under benches/:
Examples
# List all providers and models (no API key needed)
# OpenAI chat (requires OPENAI_API_KEY)
# Anthropic streaming (requires ANTHROPIC_API_KEY)
Requirements
- Rust 1.92+ (edition 2024)
Contributing
See CONTRIBUTING.md. PRs welcome.
License
Apache-2.0 © 2026 EpicCounty