candle-pipelines

[!warning]
This crate is under active development. APIs may change as features are still being added, and things tweaked.
Simple, intuitive pipelines for local LLM inference in Rust, powered by Candle. API inspired by Python's Transformers.
Available Pipelines
Note: Currently, models are accessible through these pipelines only. Direct model interface coming eventually!
Text Generation Pipeline
Generate text for various applications. Supports completions, tool calling, and token-by-token iteration.
Qwen3
Optimized for tool calling and structured output
Parameter Sizes:
├── 0.6B
├── 1.7B
├── 4B
├── 8B
├── 14B
└── 32B
→ View on HuggingFace
Gemma3
Google's models for general language tasks
Parameter Sizes:
├── 1B
├── 4B
├── 12B
└── 27B
→ View on HuggingFace
Analysis Pipelines
ModernBERT powers three specialized analysis tasks with shared architecture:
Fill Mask Pipeline
Complete missing words in text
Available Sizes:
├── Base
└── Large
→ View on HuggingFace
Sentiment Analysis Pipeline
Analyze emotional tone in multiple languages
Available Sizes:
├── Base
└── Large
→ View on HuggingFace
Zero-shot Classification Pipeline
Classify text without training examples
Available Sizes:
├── Base
└── Large
→ View on HuggingFace
Technical Note: All ModernBERT pipelines share the same backbone architecture, loading task-specific finetuned weights as needed.
Usage
At this point in development the only way to interact with the models is through the given pipelines, I plan to eventually provide a simple interface to work with the models directly.
Inference will be quite slow at the moment, this is mostly due to not using the CUDA feature when compiling candle. I will be working on integrating this smoothly in future updates for much faster inference.
Text Generation
There are two basic ways to generate text:
- By providing a simple prompt string.
- By providing a list of messages for chat-like interactions.
Providing a single prompt
Use the completion method for straightforward text generation from a single prompt string.
use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{TextGenerationPipelineBuilder, Qwen3Size};
fn main() -> Result<()> {
let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3Size::Size0_6B)
.temperature(0.7)
.top_k(40)
.build()?;
let output = pipeline.run("What is the meaning of life?")?;
println!("{}", output.text);
println!("Generated {} tokens", output.stats.tokens_generated);
Ok(())
}
Providing a list of messages
For more conversational interactions, you can pass a list of messages to the run method.
The Message struct represents a single message in a chat and has a role (system, user, assistant, or tool) and content. You can create messages using:
Message::system(content: &str): For system prompts.
Message::user(content: &str): For user prompts.
Message::assistant(content: &str): For model responses.
Message::tool(content: &str): For tool/function results returned to the model.
use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{TextGenerationPipelineBuilder, Qwen3Size, Message};
fn main() -> Result<()> {
let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3Size::Size0_6B)
.temperature(0.7)
.top_k(40)
.build()?;
let messages = vec![
Message::system("You are a helpful assistant."),
Message::user("What is the meaning of life?"),
];
let output = pipeline.run(&messages)?;
println!("{}", output.text);
Ok(())
}
Tool Calling
Using tools with models is also made extremely easy, you just define tools using the #[tool] macro, register them with the pipeline, and they're executed automatically when the model calls them.
use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{tool, tools, ErrorStrategy};
use candle_pipelines::text_generation::{Qwen3Size, TextGenerationPipelineBuilder};
#[tool(retries = 5)] fn get_humidity(city: String) -> Result<String> {
Ok(format!("The humidity is 50% in {}.", city))
}
#[tool] fn get_temperature(city: String) -> Result<String> {
Ok(format!("The temperature is 20 degrees celsius in {}.", city))
}
fn main() -> Result<()> {
let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3Size::Size0_6B)
.max_len(8192)
.tool_error_strategy(ErrorStrategy::ReturnToModel) .build()?;
pipeline.register_tools(tools![get_temperature, get_humidity]);
let output = pipeline.run("What's the temp and humidity like in Tokyo?")?;
println!("{}", output.text);
Ok(())
}
Tools can also be asynchronous, allowing you to perform network or file I/O directly inside the handler:
use candle_pipelines::error::Result;
use candle_pipelines::text_generation::tool;
#[tool]
async fn delayed_echo(message: String) -> Result<String> {
tokio::time::sleep(std::time::Duration::from_millis(25)).await;
Ok(message)
}
Token Iteration
Use run_iter to receive tokens as they're generated. Fully sync - no async runtime needed.
use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{TextGenerationPipelineBuilder, Qwen3Size};
use std::io::Write;
fn main() -> Result<()> {
let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3Size::Size0_6B)
.max_len(1024)
.build()?;
let mut tokens = pipeline.run_iter(
"Explain the concept of Large Language Models in simple terms.",
)?;
for tok in &mut tokens {
print!("{}", tok?);
std::io::stdout().flush().unwrap();
}
let stats = tokens.stats();
println!("\n\nGenerated {} tokens", stats.tokens_generated);
Ok(())
}
XML Parsing for Structured Output
Use XmlParserBuilder to parse structured outputs from models - useful for tool calls and reasoning traces.
use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{
Qwen3Size, TagParts, TextGenerationPipelineBuilder, XmlParserBuilder,
};
fn main() -> Result<()> {
let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3Size::Size0_6B)
.max_len(1024)
.build()?;
let parser = XmlParserBuilder::new()
.register_tag("think")
.register_tag("tool_result")
.register_tag("tool_call")
.build();
let tokens = pipeline.run_iter("Explain your reasoning step by step.")?;
let events = parser.wrap_iterator(tokens);
for event in events {
match event.tag() {
Some("think") => match event.part() {
TagParts::Start => println!("[THINKING]"),
TagParts::Content => print!("{}", event.get_content()),
TagParts::End => println!("[END THINKING]"),
},
None => match event.part() {
TagParts::Content => print!("{}", event.get_content()),
_ => {}
},
_ => {}
}
}
Ok(())
}
The XML parser emits events as tags are encountered, enabling real-time processing without waiting for the full response.
Fill Mask (ModernBERT)
use candle_pipelines::error::Result;
use candle_pipelines::fill_mask::{FillMaskPipelineBuilder, ModernBertSize};
fn main() -> Result<()> {
let pipeline = FillMaskPipelineBuilder::modernbert(ModernBertSize::Base).build()?;
let prediction = pipeline.predict("The capital of France is [MASK].")?;
println!("{}: {:.2}", prediction.word, prediction.score);
Ok(())
}
Sentiment Analysis (ModernBERT Finetune)
use candle_pipelines::error::Result;
use candle_pipelines::sentiment::{SentimentAnalysisPipelineBuilder, ModernBertSize};
fn main() -> Result<()> {
let pipeline = SentimentAnalysisPipelineBuilder::modernbert(ModernBertSize::Base).build()?;
let result = pipeline.predict("I love using Rust for my projects!")?;
println!("Sentiment: {} (confidence: {:.2})", result.label, result.score);
Ok(())
}
Zero-Shot Classification (ModernBERT NLI Finetune)
Zero-shot classification offers two methods for different use cases:
Single-Label Classification (classify)
Use when you want to classify text into one of several mutually exclusive categories. Probabilities sum to 1.0.
use candle_pipelines::error::Result;
use candle_pipelines::zero_shot::{ZeroShotClassificationPipelineBuilder, ModernBertSize};
fn main() -> Result<()> {
let pipeline = ZeroShotClassificationPipelineBuilder::modernbert(ModernBertSize::Base).build()?;
let text = "The Federal Reserve raised interest rates.";
let candidate_labels = &["economics", "politics", "technology", "sports"];
let results = pipeline.classify(text, candidate_labels)?;
println!("Text: {}", text);
for result in results {
println!("- {}: {:.4}", result.label, result.score);
}
Ok(())
}
Multi-Label Classification (classify_multi_label)
Use when labels can be independent and multiple labels could apply to the same text. Returns raw entailment probabilities.
use candle_pipelines::error::Result;
use candle_pipelines::zero_shot::{ZeroShotClassificationPipelineBuilder, ModernBertSize};
fn main() -> Result<()> {
let pipeline = ZeroShotClassificationPipelineBuilder::modernbert(ModernBertSize::Base).build()?;
let text = "I love reading books about machine learning and artificial intelligence.";
let candidate_labels = &["technology", "education", "reading", "science"];
let results = pipeline.classify_multi_label(text, candidate_labels)?;
println!("Text: {}", text);
for result in results {
println!("- {}: {:.4}", result.label, result.score);
}
Ok(())
}
Future Plans
- Add more model families and sizes
- Support additional pipelines (summarization, classification)