chat-mistralrs
Local-inference provider for chat-rs, built on mistral.rs. Loads weights in-process — no HTTP, no daemon. Geared at local multimodal/agentic workflows: Qwen2.5-VL and similar text/image/audio models, structured outputs, and tool calling.
Install
[]
= "0.4.0"
= "0.1.5"
= { = "1", = ["macros", "rt-multi-thread"] }
Or via the umbrella crate: chat-rs = { version = "0.5.0", features = ["mistralrs"] }.
Usage
use MistralRsBuilder;
use ;
let client = new
.with_model
.with_gguf_file
.build
.await?;
let mut chat = new.with_model.build;
let mut msgs = from_user;
let response = chat.complete.await?;
On first use, model files are fetched into the standard Hugging Face cache (~/.cache/huggingface/). Set HF_TOKEN for gated repos.
Capabilities
- Completions — text, image, and audio inputs work today
- Streaming — token-by-token output (requires
streamfeature) - Tool calling & structured outputs — planned
Device Selection
use ;
let client = new
.with_model
.with_device // or specific Cpu / Cuda / Metal
.build
.await?;
Versioning
Tracks the latest mistral.rs release without pinning; upstream churn is treated as normal maintenance.
Feature Flags
Streaming is gated on the stream feature:
= { = "0.1.4", = ["stream"] }