Expand description
Local-inference provider for chat-rs, built on mistral.rs.
Loads weights in-process (no HTTP, no daemon). On first use, model files
are downloaded into the standard Hugging Face cache (~/.cache/huggingface/)
using HF_TOKEN from the environment when present (only required for
gated repos).
use chat_mistralrs::MistralRsBuilder;
let client = MistralRsBuilder::new()
.with_model("Qwen/Qwen2.5-3B-Instruct-GGUF")
.with_gguf_file("qwen2.5-3b-instruct-q4_k_m.gguf")
.build()
.await?;See providers/AGENTS.md for the overall provider architecture.
Re-exports§
pub use builder::DeviceChoice;pub use builder::MistralRsBuilder;pub use builder::WithModel;pub use builder::WithoutModel;pub use client::MistralRsClient;
Modules§
Enums§
- IsqType
- Re-exported for ergonomic use in builder calls. In-situ quantization type specifying the format to apply to model weights.