Skip to main content

Crate mistralrs

Crate mistralrs 

Source
Expand description

This crate is the Rust SDK for mistral.rs, providing an asynchronous interface for LLM inference.

To get started loading a model, check out the following builders:

For loading multiple models simultaneously, use MultiModelBuilder. The returned Model supports _with_model method variants and runtime model management (unload/reload).

§Example

use anyhow::Result;
use mistralrs::{
    IsqType, PagedAttentionMetaBuilder, TextMessageRole, TextMessages, TextModelBuilder,
};

#[tokio::main]
async fn main() -> Result<()> {
    let model = TextModelBuilder::new("microsoft/Phi-3.5-mini-instruct".to_string())
        .with_isq(IsqType::Q8_0)
        .with_logging()
        .with_paged_attn(|| PagedAttentionMetaBuilder::default().build())?
        .build()
        .await?;

    let messages = TextMessages::new()
        .add_message(
            TextMessageRole::System,
            "You are an AI agent with a specialty in programming.",
        )
        .add_message(
            TextMessageRole::User,
            "Hello! How are you? Please write generic binary search function in Rust.",
        );

    let response = model.send_chat_request(messages).await?;

    println!("{}", response.choices[0].message.content.as_ref().unwrap());
    dbg!(
        response.usage.avg_prompt_tok_per_sec,
        response.usage.avg_compl_tok_per_sec
    );

    Ok(())
}

§Streaming example

   use anyhow::Result;
   use mistralrs::{
       ChatCompletionChunkResponse, ChunkChoice, Delta, IsqType, PagedAttentionMetaBuilder,
       Response, TextMessageRole, TextMessages, TextModelBuilder,
   };

   #[tokio::main]
   async fn main() -> Result<()> {
       let model = TextModelBuilder::new("microsoft/Phi-3.5-mini-instruct".to_string())
           .with_isq(IsqType::Q8_0)
           .with_logging()
           .with_paged_attn(|| PagedAttentionMetaBuilder::default().build())?
           .build()
           .await?;

       let messages = TextMessages::new()
           .add_message(
               TextMessageRole::System,
               "You are an AI agent with a specialty in programming.",
           )
           .add_message(
               TextMessageRole::User,
               "Hello! How are you? Please write generic binary search function in Rust.",
           );

       let mut stream = model.stream_chat_request(messages).await?;
       while let Some(chunk) = stream.next().await {
           if let Response::Chunk(ChatCompletionChunkResponse { choices, .. }) = chunk {
               if let Some(ChunkChoice {
                   delta:
                       Delta {
                           content: Some(content),
                           ..
                       },
                   ..
               }) = choices.first()
               {
                   print!("{}", content);
               };
           }
       }
       Ok(())
   }

§MCP example

The MCP client integrates seamlessly with mistral.rs model builders:

use mistralrs::{TextModelBuilder, IsqType, McpClientConfig, McpServerConfig, McpServerSource};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let mcp_config = McpClientConfig {
        servers: vec![/* your server configs */],
        auto_register_tools: true,
        tool_timeout_secs: Some(30),
        max_concurrent_calls: Some(5),
    };
     
    let model = TextModelBuilder::new("path/to/model".to_string())
        .with_isq(IsqType::Q8_0)
        .with_mcp_client(mcp_config)  // MCP tools automatically registered
        .build()
        .await?;
     
    // MCP tools are now available for automatic tool calling
    Ok(())
}

Re-exports§

pub use model_builder_trait::AnyModelBuilder;
pub use model_builder_trait::MultiModelBuilder;
pub use mistralrs_core::llguidance;
pub use schemars;

Modules§

core
Low-level types and internals re-exported from mistralrs_core.
model_builder_trait
speech_utils

Structs§

Agent
An agent that runs an agentic loop with tool calling
AgentBuilder
Builder for creating agents with a fluent API
AgentConfig
Configuration for the agentic loop
AgentResponse
Final response from the agent
AgentStep
Represents a single step in the agent execution
AgentStream
Stream of agent events during execution
AnyMoeConfig
AnyMoeModelBuilder
AudioInput
Raw audio input consisting of PCM samples and a sample rate.
CalledFunction
Called function with name and arguments
ChatCompletionChunkResponse
Chat completion streaming request chunk.
ChatCompletionResponse
An OpenAI compatible chat completion response.
Choice
Chat completion choice.
ChunkChoice
Chat completion streaming chunk choice.
CompletionResponse
An OpenAI compatible completion response.
Delta
Delta in content for streaming response.
DiffusionGenerationParams
DiffusionModelBuilder
Configure a text model with the various parameters for loading, running, and other inference behaviors.
DrySamplingParams
EmbeddingModelBuilder
Configure an embedding model with the various parameters for loading, running, and other inference behaviors.
EmbeddingRequest
A validated embedding request constructed via EmbeddingRequestBuilder.
EmbeddingRequestBuilder
Builder for configuring embedding requests.
Function
Function definition for a tool
GgufLoraModelBuilder
Wrapper of GgufModelBuilder for LoRA models.
GgufModelBuilder
Configure a text GGUF model with the various parameters for loading, running, and other inference behaviors.
GgufXLoraModelBuilder
Wrapper of GgufModelBuilder for X-LoRA models.
LayerTopology
Logprobs
Logprobs per token.
LoraModelBuilder
Wrapper of TextModelBuilder for LoRA models.
McpClient
MCP client that manages connections to multiple MCP servers
McpClientConfig
Configuration for MCP client integration
McpServerConfig
Configuration for an individual MCP server
McpToolInfo
Information about a tool discovered from an MCP server
MistralRs
The MistralRs struct handles sending requests to multiple engines. It is the core multi-threaded component of mistral.rs, and uses mpsc Sender and Receiver primitives to send and receive requests to the appropriate engine based on model ID.
Model
The object used to interact with the model. This can be used with many varietes of models,
and as such may be created with one of:
NormalRequest
A normal request request to the MistralRs.
PagedAttentionConfig
All memory counts in MB. Default for block size is 32.
PagedAttentionMetaBuilder
Builder for PagedAttention metadata.
RequestBuilder
A way to add messages with finer control given.
ResponseMessage
Chat completion response message.
SamplingParams
Sampling params are used to control sampling.
SearchFunctionParameters
SearchResult
SpeculativeConfig
Metadata for a speculative pipeline
SpeechModelBuilder
Configure a text model with the various parameters for loading, running, and other inference behaviors.
Tensor
The core struct for manipulating tensors.
TextMessages
Plain text (chat) messages.
TextModelBuilder
Configure a text model with the various parameters for loading, running, and other inference behaviors.
TextSpeculativeBuilder
Tool
Tool definition
ToolCallResponse
ToolResult
Result of a tool execution
TopLogprob
Top-n logprobs element
Topology
UqffEmbeddingModelBuilder
Configure a UQFF embedding model with the various parameters for loading, running, and other inference behaviors. This wraps and implements DerefMut for the UqffEmbeddingModelBuilder, so users should take care to not call UQFF-related methods.
UqffTextModelBuilder
Configure a UQFF text model with the various parameters for loading, running, and other inference behaviors. This wraps and implements DerefMut for the TextModelBuilder, so users should take care to not call UQFF-related methods.
UqffVisionModelBuilder
Configure a UQFF text model with the various parameters for loading, running, and other inference behaviors. This wraps and implements DerefMut for the VisionModelBuilder, so users should take care to not call UQFF-related methods.
Usage
OpenAI compatible (superset) usage during a request.
VisionMessages
Text (chat) messages with images and/or audios.
VisionModelBuilder
Configure a vision model with the various parameters for loading, running, and other inference behaviors.
WebSearchOptions
XLoraModelBuilder
Wrapper of TextModelBuilder for X-LoRA models.

Enums§

AgentEvent
Events yielded during agent streaming
AgentStopReason
Reason why the agent stopped executing
AnyMoeExpertType
AutoDeviceMapParams
Constraint
Control the constraint with llguidance.
DType
The different types of elements allowed in tensors.
DefaultSchedulerMethod
The scheduler method controld how sequences are scheduled during each step of the engine. For each scheduling step, the scheduler method is used if there are not only running, only waiting sequences, or none. If is it used, then it is used to allow waiting sequences to run.
Device
Cpu, Cuda, or Metal
DeviceMapSetting
DiffusionLoaderType
The architecture to load the vision model as.
EmbeddingRequestInput
An individual embedding input.
ImageGenerationResponseFormat
Image generation response format
IsqType
McpServerSource
Supported MCP server transport sources
MemoryGpuConfig
ModelCategory
Category of the model. This can also be used to extract model-category specific tools, such as the vision model prompt prefixer.
ModelDType
DType for the model.
Request
A request to the Engine, encapsulating the various parameters as well as the mpsc response Sender used to return the Response.
RequestMessage
Message or messages for a Request.
Response
The response enum contains 3 types of variants:
ResponseOk
SchedulerConfig
SearchEmbeddingModel
Embedding model used for ranking web search results internally.
SpeechLoaderType
StopTokens
Stop sequences or ids.
TextMessageRole
A chat message role.
TokenSource
The source of the HF token.
ToolCallType
ToolCallbackType
Unified tool callback that can be sync or async
ToolChoice
ToolType
Type of tool

Traits§

CustomLogitsProcessor
Customizable logits processor.
RequestLike
A type which can be used as a chat request.

Functions§

best_device
Gets the best device, cpu, cuda if compiled with CUDA, or Metal
cross_entropy_loss
The cross-entropy loss.
initialize_logging
This should be called to initialize the debug flag and logging. This should not be called in mistralrs-core code due to Rust usage.
paged_attn_supported
true if built with CUDA (requires Unix) /Metal
parse_isq_value
Parse ISQ value.

Type Aliases§

AsyncToolCallback
Async tool callback type for native async tool support
LlguidanceGrammar
MessageContent
Result
SearchCallback
Callback used to override how search results are gathered. The returned vector must be sorted in decreasing order of relevance.
ToolCallback
Callback used for custom tool functions. Receives the called function (name and JSON arguments) and returns the tool output as a string.

Attribute Macros§

tool
The #[tool] attribute macro for defining tools.