candle-pipelines 0.0.4

Simple, intuitive pipelines for local LLM inference in Rust, powered by Candle. Inspired by Python's Transformers library.
Documentation

candle-pipelines

CI

[!warning] This crate is under active development. APIs may change as features are still being added, and things tweaked.

Simple, intuitive pipelines for local LLM inference in Rust, powered by Candle. API inspired by Python's Transformers.

Available Pipelines

Note: Currently, models are accessible through these pipelines only. Direct model interface coming eventually!

Text Generation Pipeline

Generate text for various applications, supports general completions, as well as function/tool calling, and streamed responses.


Qwen3
Optimized for tool calling and structured output

 Parameter Sizes:
├── 0.6B
├── 1.7B
├── 4B
├── 8B
├── 14B
└── 32B

→ View on HuggingFace


Gemma3
Google's models for general language tasks

 Parameter Sizes:
├── 1B
├── 4B
├── 12B
└── 27B

→ View on HuggingFace

Analysis Pipelines

ModernBERT powers three specialized analysis tasks with shared architecture:


Fill Mask Pipeline

Complete missing words in text

 Available Sizes:
├── Base
└── Large

→ View on HuggingFace


Sentiment Analysis Pipeline

Analyze emotional tone in multiple languages

 Available Sizes:
├── Base
└── Large

→ View on HuggingFace


Zero-shot Classification Pipeline

Classify text without training examples

 Available Sizes:
├── Base
└── Large

→ View on HuggingFace


Technical Note: All ModernBERT pipelines share the same backbone architecture, loading task-specific finetuned weights as needed.

Usage

At this point in development the only way to interact with the models is through the given pipelines, I plan to eventually provide a simple interface to work with the models directly.

Inference will be quite slow at the moment, this is mostly due to not using the CUDA feature when compiling candle. I will be working on integrating this smoothly in future updates for much faster inference.

Text Generation

There are two basic ways to generate text:

  1. By providing a simple prompt string.
  2. By providing a list of messages for chat-like interactions.

Providing a single prompt

Use the completion method for straightforward text generation from a single prompt string.

use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{TextGenerationPipelineBuilder, Qwen3Size};

#[tokio::main]
async fn main() -> Result<()> {
    // 1. Create the pipeline
    let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3Size::Size0_6B)
        .temperature(0.7)
        .top_k(40)
        .build()
        .await?;

    // 2. Generate a completion
    let completion = pipeline.completion("What is the meaning of life?").await?;
    println!("{}", completion);

    Ok(())
}

Providing a list of messages

For more conversational interactions, you can pass a list of messages to the completion method.

The Message struct represents a single message in a chat and has a role (system, user, assistant, or tool) and content. You can create messages using:

  • Message::system(content: &str): For system prompts.
  • Message::user(content: &str): For user prompts.
  • Message::assistant(content: &str): For model responses.
  • Message::tool(content: &str): For tool/function results returned to the model.
use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{TextGenerationPipelineBuilder, Qwen3Size, Message};

#[tokio::main]
async fn main() -> Result<()> {
    // 1. Create the pipeline
    let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3Size::Size0_6B)
        .temperature(0.7)
        .top_k(40)
        .build()
        .await?;

    // 2. Create the messages
    let messages = vec![
        Message::system("You are a helpful assistant."),
        Message::user("What is the meaning of life?"),
    ];

    // 3. Generate a completion
    let completion = pipeline.completion(&messages).await?;
    println!("{}", completion);

    Ok(())
}

Tool Calling

Using tools with models is also made extremely easy, you just define tools using the #[tool] macro, register them with the pipeline, and they're used automatically when relevant.

use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{tool, tools, ErrorStrategy};
use candle_pipelines::text_generation::{Qwen3Size, TextGenerationPipelineBuilder};

// 1. Define tools using the #[tool] macro
#[tool(retries = 5)]  // optional: configure retry attempts
/// Get the humidity for a given city.
fn get_humidity(city: String) -> Result<String> {
    Ok(format!("The humidity is 50% in {}.", city))
}

#[tool]  // defaults to 3 retries
/// Get the temperature for a given city in degrees celsius.
fn get_temperature(city: String) -> Result<String> {
    Ok(format!("The temperature is 20 degrees celsius in {}.", city))
}

#[tokio::main]
async fn main() -> Result<()> {
    // 2. Create the pipeline
    let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3Size::Size0_6B)
        .max_len(8192)
        .tool_error_strategy(ErrorStrategy::ReturnToModel)  // let model handle tool errors
        .build()
        .await?;

    // 3. Register tools (enabled by default)
    pipeline.register_tools(tools![get_temperature, get_humidity]).await;

    // 4. Get a completion (tools are used automatically)
    let completion = pipeline.completion("What's the temp and humidity like in Tokyo?").await?;
    println!("{}", completion);

    Ok(())
}

Tools can also be asynchronous, allowing you to perform network or file I/O directly inside the handler:

use candle_pipelines::error::Result;
use candle_pipelines::text_generation::tool;

#[tool]
/// Echoes a message after waiting for a bit.
async fn delayed_echo(message: String) -> Result<String> {
    tokio::time::sleep(std::time::Duration::from_millis(25)).await;
    Ok(message)
}

Streaming Completions

Use completion_stream to receive tokens as they're generated. If tools are enabled and registered, they're used automatically.

Instead of returning the completion this method returns a stream you can iterate on to receive tokens individually as they are generated by the model instead of just receiving them all at once at the end.

use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{TextGenerationPipelineBuilder, Qwen3Size};
use std::io::Write;

#[tokio::main]
async fn main() -> Result<()> {
    // 1. Create the pipeline
    let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3Size::Size0_6B)
        .max_len(1024)
        .build()
        .await?;

    // 2. Get a completion using stream method
    let mut stream = pipeline.completion_stream(
        "Explain the concept of Large Language Models in simple terms.",
    ).await?;

    // 3. Do something with tokens as they are generated
    while let Some(tok) = stream.next().await {
        print!("{}", tok?);
        std::io::stdout().flush().unwrap();
    }

    Ok(())
}

XML Parsing for Structured Output

You can use the XmlParserBuilder to parse structured outputs from models. This is particularly useful for parsing tool calls and reasoning traces from streams.

use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{
    Qwen3Size, TagParts, TextGenerationPipelineBuilder, XmlParserBuilder,
};

#[tokio::main]
async fn main() -> Result<()> {
    // 1. Build a regular pipeline
    let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3Size::Size0_6B)
        .max_len(1024)
        .build()
        .await?;

    // 2. Create XML parser for specific tags
    let parser = XmlParserBuilder::new()
        .register_tag("think")
        .register_tag("tool_result")
        .register_tag("tool_call")
        .build();

    // 3. Generate streaming completion
    let stream = pipeline
        .completion_stream("Explain your reasoning step by step.")
        .await?;

    // 4. Wrap stream with XML parser
    let mut event_stream = parser.wrap_stream(stream);

    // 5. Process events based on tags
    while let Some(event) = event_stream.next().await {
        match event.tag() {
            Some("think") => match event.part() {
                TagParts::Start => println!("[THINKING]"),
                TagParts::Content => print!("{}", event.get_content()),
                TagParts::End => println!("[END THINKING]"),
            },
            None => match event.part() {
                TagParts::Content => print!("{}", event.get_content()),
                _ => {}
            },
            _ => {}
        }
    }

    Ok(())
}

The XML parser emits events as XML tags are encountered in the stream, enabling real-time processing of structured outputs without waiting for the full response.

Fill Mask (ModernBERT)

use candle_pipelines::error::Result;
use candle_pipelines::fill_mask::{FillMaskPipelineBuilder, ModernBertSize};

fn main() -> Result<()> {
    // 1. Build the pipeline
    let pipeline = FillMaskPipelineBuilder::modernbert(ModernBertSize::Base).build()?;

    // 2. Fill the mask
    let prediction = pipeline.predict("The capital of France is [MASK].")?;

    println!("{}: {:.2}", prediction.word, prediction.score);
    // Output: Paris: 0.98
    Ok(())
}

Sentiment Analysis (ModernBERT Finetune)

use candle_pipelines::error::Result;
use candle_pipelines::sentiment::{SentimentAnalysisPipelineBuilder, ModernBertSize};

fn main() -> Result<()> {
    // 1. Build the pipeline
    let pipeline = SentimentAnalysisPipelineBuilder::modernbert(ModernBertSize::Base).build()?;

    // 2. Analyze sentiment
    let result = pipeline.predict("I love using Rust for my projects!")?;

    println!("Sentiment: {} (confidence: {:.2})", result.label, result.score);
    // Output: Sentiment: positive (confidence: 0.98)
    Ok(())
}

Zero-Shot Classification (ModernBERT NLI Finetune)

Zero-shot classification offers two methods for different use cases:

Single-Label Classification (classify)

Use when you want to classify text into one of several mutually exclusive categories. Probabilities sum to 1.0.

use candle_pipelines::error::Result;
use candle_pipelines::zero_shot::{ZeroShotClassificationPipelineBuilder, ModernBertSize};

fn main() -> Result<()> {
    // 1. Build the pipeline
    let pipeline = ZeroShotClassificationPipelineBuilder::modernbert(ModernBertSize::Base).build()?;

    // 2. Single-label classification
    let text = "The Federal Reserve raised interest rates.";
    let candidate_labels = &["economics", "politics", "technology", "sports"];
    let results = pipeline.classify(text, candidate_labels)?;

    println!("Text: {}", text);
    for result in results {
        println!("- {}: {:.4}", result.label, result.score);
    }
    // Example output (probabilities sum to 1.0):
    // - economics: 0.8721
    // - politics: 0.1134
    // - technology: 0.0098
    // - sports: 0.0047
    
    Ok(())
}

Multi-Label Classification (classify_multi_label)

Use when labels can be independent and multiple labels could apply to the same text. Returns raw entailment probabilities.

use candle_pipelines::error::Result;
use candle_pipelines::zero_shot::{ZeroShotClassificationPipelineBuilder, ModernBertSize};

fn main() -> Result<()> {
    // 1. Build the pipeline
    let pipeline = ZeroShotClassificationPipelineBuilder::modernbert(ModernBertSize::Base).build()?;

    // 2. Multi-label classification
    let text = "I love reading books about machine learning and artificial intelligence.";
    let candidate_labels = &["technology", "education", "reading", "science"];
    let results = pipeline.classify_multi_label(text, candidate_labels)?;

    println!("Text: {}", text);
    for result in results {
        println!("- {}: {:.4}", result.label, result.score);
    }
    // Example output (independent probabilities):
    // - technology: 0.9234
    // - education: 0.8456
    // - reading: 0.9567
    // - science: 0.7821
    
    Ok(())
}

Future Plans

  • Add more model families and sizes
  • Support additional pipelines (summarization, classification)