candle-pipelines v0.0.1
[!warning] This crate is under active development. APIs may change as features are still being added, and things tweaked.
Simple, intuitive pipelines for local LLM inference in Rust, powered by Candle. API inspired by Python's Transformers.
Available Pipelines
Note: Currently, models are accessible through these pipelines only. Direct model interface coming eventually!
Text Generation Pipeline
Generate text for various applications, supports general completions, as well as function/tool calling, and streamed responses.
Qwen3
Optimized for tool calling and structured output
Parameter Sizes:
├── 0.6B
├── 1.7B
├── 4B
├── 8B
├── 14B
└── 32B
Gemma3
Google's models for general language tasks
Parameter Sizes:
├── 1B
├── 4B
├── 12B
└── 27B
Analysis Pipelines
ModernBERT powers three specialized analysis tasks with shared architecture:
Fill Mask Pipeline
Complete missing words in text
Available Sizes:
├── Base
└── Large
Sentiment Analysis Pipeline
Analyze emotional tone in multiple languages
Available Sizes:
├── Base
└── Large
Zero-shot Classification Pipeline
Classify text without training examples
Available Sizes:
├── Base
└── Large
Technical Note: All ModernBERT pipelines share the same backbone architecture, loading task-specific finetuned weights as needed.
Usage
At this point in development the only way to interact with the models is through the given pipelines, I plan to eventually provide a simple interface to work with the models directly.
Inference will be quite slow at the moment, this is mostly due to not using the CUDA feature when compiling candle. I will be working on integrating this smoothly in future updates for much faster inference.
Text Generation
There are two basic ways to generate text:
- By providing a simple prompt string.
- By providing a list of messages for chat-like interactions.
Providing a single prompt
Use the completion method for straightforward text generation from a single prompt string.
use ;
async
Providing a list of messages
For more conversational interactions, you can pass a list of messages to the completion method.
The Message struct represents a single message in a chat and has a role (system, user, or assistant) and content. You can create messages using:
Message::system(content: &str): For system prompts.Message::user(content: &str): For user prompts.Message::assistant(content: &str): For model responses.
use ;
async
Tool Calling
Using tools with models is also made extremely easy, you just define tools using the tool macro, register them with the pipeline, and enable tool usage.
use ;
use Result;
// 1. Define the tools
/// Get the weather for a given city in degrees celsius.
async
Tools can also be asynchronous, allowing you to perform network or file I/O directly inside the handler:
use Result;
use tool;
/// Echoes a message after waiting for a bit.
async
Streaming Completions
Use completion_stream to receive tokens as they're generated. If tools are enabled and registered, they're used automatically.
Instead of returning the completion this method returns a stream you can iterate on to receive tokens individually as they are generated by the model instead of just receiving them all at once at the end.
The stream is wrapped in a CompletionStream helper with methods like collect()
to gather the full response or take(n) to grab the first few chunks. Both
helpers now return a Result to surface any errors that may occur during
streaming.
use ;
use StreamExt;
use Write;
async
XML Parsing for Structured Output
You can build pipelines with XML parsing capabilities to handle structured outputs from models. This is particularly useful for parsing tool calls, and reasoning traces.
use ;
async
The XML parser also works with streaming completions, emitting events as XML tags are encountered in the stream. This enables real-time processing of structured outputs without waiting for the full response.
Fill Mask (ModernBERT)
use ;
Sentiment Analysis (ModernBERT Finetune)
use ;
Zero-Shot Classification (ModernBERT NLI Finetune)
Zero-shot classification offers two methods for different use cases:
Single-Label Classification (classify)
Use when you want to classify text into one of several mutually exclusive categories. Probabilities sum to 1.0.
use ;
Multi-Label Classification (classify_multi_label)
Use when labels can be independent and multiple labels could apply to the same text. Returns raw entailment probabilities.
use ;
Future Plans
- Add more model families and sizes
- Support additional pipelines (summarization, classification)