lmcpp – llama.cpp's llama-server for Rust
Fully Managed
- Automated Toolchain – Downloads, builds, and manages the
llama.cpp toolchain with [LmcppToolChain].
- Supported Platforms – Linux, macOS, and Windows with CPU, CUDA, and Metal support.
- Multiple Versions – Each release tag and backend is cached separately, allowing you to install multiple versions of
llama.cpp.
Blazing Fast UDS
- UDS IPC – Integrates with
llama-server’s Unix-domain-socket client on Linux, macOS, and Windows.
- Fast! – Is it faster than HTTP? Yes. Is it measurably faster? Maybe.
Fully Typed / Fully Documented
- Server Args – All
llama-server arguments implemented by [ServerArgs].
- Endpoints – Each endpoint has request and response types defined.
- Good Docs – Every parameter was researched to improve upon the original
llama-server documentation.
CLI Tools & Web UI
lmcpp-toolchain-cli – Manage the llama.cpp toolchain: download, build, cache.
lmcpp-server-cli – Start, stop, and list servers.
- Easy Web UI – Use [
LmcppServerLauncher::webui] to start with HTTP and the Web UI enabled.
use lmcpp::*;
fn main() -> LmcppResult<()> {
let server = LmcppServerLauncher::builder()
.server_args(
ServerArgs::builder()
.hf_repo("bartowski/google_gemma-3-1b-it-qat-GGUF")?
.build(),
)
.load()?;
let res = server.completion(
CompletionRequest::builder()
.prompt("Tell me a joke about Rust.")
.n_predict(64),
)?;
println!("Completion response: {:#?}", res.content);
Ok(())
}
// With default model
cargo run --bin lmcpp-server-cli -- --webui
// Or with a specific model from URL:
cargo run --bin lmcpp-server-cli -- --webui -u https://huggingface.co/bartowski/google_gemma-3-1b-it-qat-GGUF/blob/main/google_gemma-3-1b-it-qat-Q4_K_M.gguf
// Or with a specific local model:
cargo run --bin lmcpp-server-cli -- --webui -l /path/to/local/model.gguf
How It Works
Your Rust App
│
├─→ LmcppToolChain (downloads / builds / caches)
│ ↓
├─→ LmcppServerLauncher (spawns & monitors)
│ ↓
└─→ LmcppServer (typed handle over UDS*)
│
├─→ completion() → text generation
└─→ other endpoints → stuff
Endpoints ⇄ Typed Helpers
| HTTP Route |
Helper on LmcppServer |
Request type |
Response type |
POST /completion |
completion() |
[CompletionRequest] |
[CompletionResponse] |
POST /infill |
infill() |
[InfillRequest] |
[CompletionResponse] |
POST /embeddings |
embeddings() |
[EmbeddingsRequest] |
[EmbeddingsResponse] |
POST /tokenize |
tokenize() |
[TokenizeRequest] |
[TokenizeResponse] |
POST /detokenize |
detokenize() |
[DetokenizeRequest] |
[DetokenizeResponse] |
GET /props |
props() |
– |
[PropsResponse] |
| custom |
status() ¹ |
– |
[ServerStatus] |
| Open AI |
open_ai_v1_*() |
[serde_json::Value] |
[serde_json::Value] |
¹ Internal helper for server health.
Supported Platforms
| Platform |
CPU |
CUDA |
Metal |
Binary Sources |
| Linux x64 |
✅ |
✅ |
– |
Pre-built + Source |
| macOS ARM |
✅ |
– |
✅ |
Pre-built + Source |
| macOS x64 |
✅ |
– |
✅ |
Pre-built + Source |
| Windows x64 |
✅ |
✅ |
– |
Pre-built + Source |