transformers v0.0.9
[!warning] This crate is under active development. APIs may change as features are still being added, and things tweaked.
Transformers provides a simple, intuitive interface for Rust developers who want to work with Large Language Models locally, powered by the Candle crate. It offers an API inspired by Python's Transformers, tailored for Rust developers.
Supported Pipelines
- Text Generation
- Sentiment Analysis
- Zero Shot Classification
- Fill Mask
Currently Implemented Models
-
Qwen3 (Text Generation)
- 0.6B
- 1.7B
- 4B
- 8B
- 14B
- 32B
-
Gemma3 (Text Generation)
- 1B
- 4B
- 12B
- 27B
-
ModernBERT (ZeroShot, Sentiment Analysis, Fill Mask)
- Base
- Large
All ModernBERT-based pipelines share the same backbone architecture while loading task-specific finetuned checkpoints.
Usage
At this point in development the only way to interact with the models is through the given pipelines, I plan to eventually provide a simple interface to work with the models directly.
Inference will be quite slow at the moment, this is mostly due to not using the CUDA feature when compiling candle. I will be working on integrating this smoothly in future updates for much faster inference.
Text Generation
There are two basic ways to generate text:
- By providing a simple prompt string.
- By providing a list of messages for chat-like interactions.
Providing a single prompt
Use the prompt_completion
method for straightforward text generation from a single prompt string.
use *;
Providing a list of messages
For more conversational interactions, you can use the message_completion
method, which takes a vector of Message
structs.
The Message
struct represents a single message in a chat and has a role
(system, user, or assistant) and content
. You can create messages using:
Message::system(content: &str)
: For system prompts.Message::user(content: &str)
: For user prompts.Message::assistant(content: &str)
: For model responses.
use TextGenerationPipelineBuilder;
use Messages;
Tool Calling
Using tools with models is also made extremely easy, you just define tools using the tool
macro and make sure to register them with the pipeline and you are good to go.
Using the tools is as easy as calling prompt_completion_with_tools
after having tools registered to the pipeline. Of course there also exists a message_completion_with_tools
method if you'd like to use tools in a conversational context.
use TextGenerationPipelineBuilder;
use Messages;
// 1. Define the tools
/// Get the weather for a given city in degrees celsius.
Streaming Completions
For each of the above methods, so for regular generation, and for tool calling there exist streaming versions
prompt_completion_stream
message_completion_stream
prompt_completion_stream_with_tools
message_completion_stream_with_tools
Instead of returning the completion these methods return a stream you can iterate on to receive tokens individually as they are generated by the model instead of just receiving them all at once at the end.
use TextGenerationPipelineBuilder;
use Messages;
Fill Mask (ModernBERT)
use ;
Sentiment Analysis (ModernBERT Finetune)
use ;
use Result;
Zero-Shot Classification (ModernBERT NLI Finetune)
Zero-shot classification offers two methods for different use cases:
Single-Label Classification (predict
)
Use when you want to classify text into one of several mutually exclusive categories. Probabilities sum to 1.0.
use ;
use Result;
Multi-Label Classification (predict_multi_label
)
Use when labels can be independent and multiple labels could apply to the same text. Returns raw entailment probabilities.
use ;
use Result;
Future Plans
- Add more model families and sizes
- Support additional pipelines (summarization, classification)
- CUDA support for faster inference
- Direct model interface (beyond pipelines)
Credits
A special thanks to Diaconu Radu-Mihai for transferring the transformers
crate name on crates.io