candle-pipelines 0.0.7

# candle-pipelines

<!-- CI / Workflow Badges -->
[<img alt="crates.io" src="https://img.shields.io/crates/v/candle-pipelines.svg?style=for-the-badge&color=fc8d62&logo=rust" height="19">](https://crates.io/crates/candle-pipelines)
[<img alt="docs.rs" src="https://img.shields.io/badge/docs.rs-candle--pipelines-66c2a5?style=for-the-badge&labelColor=555555&logo=docs.rs" height="19">](https://docs.rs/candle-pipelines)
![CI](https://github.com/ljt019/candle_pipelines/actions/workflows/ci.yml/badge.svg)

> [!warning]
> ***This crate is under active development. APIs may change as features are still being added, and things tweaked.***

Simple, intuitive pipelines for local LLM inference in Rust, powered by [Candle](https://github.com/huggingface/candle). API inspired by Python's [Transformers](https://huggingface.co/docs/transformers).

## Available Pipelines

***Note**: Currently, models are accessible through these pipelines only. Direct model interface coming eventually!*

### Text Generation Pipeline

Generate text for various applications. Supports completions, tool calling, and token-by-token iteration.

---

**Qwen3**  
*Optimized for tool calling and structured output*

```markdown
 Parameter Sizes:
├── 0.6B
├── 1.7B
├── 4B
├── 8B
├── 14B
└── 32B
```

[→ View on HuggingFace](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f)

---

**Gemma3**  
*Google's models for general language tasks*

```markdown
 Parameter Sizes:
├── 1B
├── 4B
├── 12B
└── 27B
```

[→ View on HuggingFace](https://huggingface.co/collections/google/gemma-3-release-67c6c6f89c4f76621268bb6d)

---

**Llama 3.2**
*Meta's compact instruction-tuned models*

```markdown
 Parameter Sizes:
├── 1B
└── 3B
```

[→ View on HuggingFace](https://huggingface.co/collections/meta-llama/llama-32-66f448ffc8c32f949b04c8cf)

---

**OLMo-3**
*Allen AI's open language models with tool support*

```markdown
 Parameter Sizes:
├── 7B
└── 32B
```

[→ View on HuggingFace](https://huggingface.co/collections/allenai/olmo-2-67b8b5f3e1b4c2b6c2b1b0a0)

### Analysis Pipelines

ModernBERT powers three specialized analysis tasks with shared architecture:

---

#### **Fill Mask Pipeline**
*Complete missing words in text*

```markdown
 Available Sizes:
├── Base
└── Large
```

[→ View on HuggingFace](https://huggingface.co/answerdotai/ModernBERT-base)

---

#### **Sentiment Analysis Pipeline**
*Analyze emotional tone in multiple languages*

```markdown
 Available Sizes:
├── Base
└── Large
```

[→ View on HuggingFace](https://huggingface.co/clapAI/modernBERT-base-multilingual-sentiment)

---

#### **Zero-shot Classification Pipeline**
*Classify text without training examples*

```markdown
 Available Sizes:
├── Base
└── Large
```

[→ View on HuggingFace](https://huggingface.co/MoritzLaurer/ModernBERT-base-zeroshot-v2.0)

---

***Technical Note**: All ModernBERT pipelines share the same backbone architecture, loading task-specific finetuned weights as needed.*

## Usage

At this point in development the only way to interact with the models is through the given pipelines, I plan to eventually provide a simple interface to work with the models directly.

Inference will be quite slow at the moment, this is mostly due to not using the CUDA feature when compiling candle. I will be working on integrating this smoothly in future updates for much faster inference.

### Text Generation

There are two basic ways to generate text:

1. By providing a simple prompt string.
2. By providing a list of messages for chat-like interactions.

#### Providing a single prompt

Use the `run` method for straightforward text generation from a single prompt string.

```rust
use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{TextGenerationPipelineBuilder, Qwen3};

fn main() -> Result<()> {
    // 1. Create the pipeline
    let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3::Size0_6B)
        .temperature(0.7)
        .top_k(40)
        .build()?;

    // 2. Generate a completion - returns Output { text, stats }
    let output = pipeline.run("What is the meaning of life?")?;
    println!("{}", output.text);
    println!("Generated {} tokens", output.stats.tokens_generated);

    Ok(())
}
```

#### Providing a list of messages

For more conversational interactions, you can pass a list of messages to the `run` method.

The `Message` struct represents a single message in a chat and has a `role` (system, user, assistant, or tool) and `content`. You can create messages using:

- `Message::system(content: &str)`: For system prompts.
- `Message::user(content: &str)`: For user prompts.
- `Message::assistant(content: &str)`: For model responses.
- `Message::tool(content: &str)`: For tool/function results returned to the model.

```rust
use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{TextGenerationPipelineBuilder, Qwen3, Message};

fn main() -> Result<()> {
    // 1. Create the pipeline
    let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3::Size0_6B)
        .temperature(0.7)
        .top_k(40)
        .build()?;

    // 2. Create the messages
    let messages = vec![
        Message::system("You are a helpful assistant."),
        Message::user("What is the meaning of life?"),
    ];

    // 3. Generate a completion
    let output = pipeline.run(&messages)?;
    println!("{}", output.text);

    Ok(())
}
```

#### Tool Calling

Using tools with models is also made extremely easy, you just define tools using the `#[tool]` macro, register them with the pipeline, and they're executed automatically when the model calls them.

```rust
use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{tool, tools, ErrorStrategy};
use candle_pipelines::text_generation::{Qwen3, TextGenerationPipelineBuilder};

// 1. Define tools using the #[tool] macro
#[tool(retries = 5)]  // optional: configure retry attempts
/// Get the humidity for a given city.
fn get_humidity(city: String) -> Result<String> {
    Ok(format!("The humidity is 50% in {}.", city))
}

#[tool]  // defaults to 3 retries
/// Get the temperature for a given city in degrees celsius.
fn get_temperature(city: String) -> Result<String> {
    Ok(format!("The temperature is 20 degrees celsius in {}.", city))
}

fn main() -> Result<()> {
    // 2. Create the pipeline
    let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3::Size0_6B)
        .max_len(8192)
        .tool_error_strategy(ErrorStrategy::ReturnToModel)  // let model handle tool errors
        .build()?;

    // 3. Register tools (enabled by default)
    pipeline.register_tools(tools![get_temperature, get_humidity]);

    // 4. Get a completion - tools are used automatically
    let output = pipeline.run("What's the temp and humidity like in Tokyo?")?;
    println!("{}", output.text);

    Ok(())
}
```

Tools can also be asynchronous, allowing you to perform network or file I/O directly inside the handler:

```rust
use candle_pipelines::error::Result;
use candle_pipelines::text_generation::tool;

#[tool]
/// Echoes a message after waiting for a bit.
async fn delayed_echo(message: String) -> Result<String> {
    tokio::time::sleep(std::time::Duration::from_millis(25)).await;
    Ok(message)
}
```

#### Token Iteration

Use `run_iter` to receive tokens as they're generated. Fully sync - no async runtime needed.

```rust
use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{TextGenerationPipelineBuilder, Qwen3};
use std::io::Write;

fn main() -> Result<()> {
    // 1. Create the pipeline
    let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3::Size0_6B)
        .max_len(1024)
        .build()?;

    // 2. Iterate over tokens as they're generated
    let mut tokens = pipeline.run_iter(
        "Explain the concept of Large Language Models in simple terms.",
    )?;

    // 3. Print tokens as they arrive
    for tok in &mut tokens {
        print!("{}", tok?);
        std::io::stdout().flush().unwrap();
    }

    // 4. Get stats after iteration
    let stats = tokens.stats();
    println!("\n\nGenerated {} tokens", stats.tokens_generated);

    Ok(())
}
```

#### XML Parsing for Structured Output

Use `XmlParser` to parse structured outputs from models - useful for reasoning traces like `<think>` blocks.

```rust
use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{
    Event, Qwen3, TagPart, TextGenerationPipelineBuilder, XmlTag,
};

// 1. Define which tags to parse using an enum
#[derive(Debug, Clone, PartialEq, XmlTag)]
enum Tags {
    Think,      // matches <think>
    Answer,     // matches <answer>
}

fn main() -> Result<()> {
    // 2. Build a regular pipeline
    let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3::Size0_6B)
        .max_len(1024)
        .build()?;

    // 3. Create parser from tag enum
    let parser = Tags::parser();

    // 4. Get token iterator and wrap with XML parser
    let tokens = pipeline.run_iter("Think step by step, then answer.")?;
    let events = parser.parse_iter(tokens);

    // 5. Process events using pattern matching
    for event in events {
        match event? {
            Event::Tag { tag: Tags::Think, part } => match part {
                TagPart::Opened { .. } => println!("[THINKING]"),
                TagPart::Content { text } => print!("{}", text),
                TagPart::Closed { .. } => println!("[END THINKING]"),
            },
            Event::Tag { tag: Tags::Answer, part } => match part {
                TagPart::Content { text } => print!("{}", text),
                _ => {}
            },
            Event::Content { text } => print!("{}", text),
        }
    }

    Ok(())
}
```

The XML parser emits events as tags are encountered, enabling real-time processing without waiting for the full response.

### Fill Mask (ModernBERT)

```rust
use candle_pipelines::error::Result;
use candle_pipelines::fill_mask::{FillMaskPipelineBuilder, ModernBertSize};

fn main() -> Result<()> {
    // 1. Build the pipeline
    let pipeline = FillMaskPipelineBuilder::modernbert(ModernBertSize::Base).build()?;

    // 2. Fill the mask
    let output = pipeline.run("The capital of France is [MASK].")?;

    println!("{}: {:.2}", output.prediction.token, output.prediction.score);
    // Output: Paris: 0.98
    Ok(())
}
```

### Sentiment Analysis (ModernBERT Finetune)

```rust
use candle_pipelines::error::Result;
use candle_pipelines::sentiment::{SentimentAnalysisPipelineBuilder, ModernBertSize};

fn main() -> Result<()> {
    // 1. Build the pipeline
    let pipeline = SentimentAnalysisPipelineBuilder::modernbert(ModernBertSize::Base).build()?;

    // 2. Analyze sentiment
    let output = pipeline.run("I love using Rust for my projects!")?;

    println!("Sentiment: {} (confidence: {:.2})", output.prediction.label, output.prediction.score);
    // Output: Sentiment: positive (confidence: 0.98)
    Ok(())
}
```

### Zero-Shot Classification (ModernBERT NLI Finetune)

Zero-shot classification offers two methods for different use cases:

#### Single-Label Classification (`run`)

Use when you want to classify text into one of several **mutually exclusive** categories. Probabilities sum to 1.0.

```rust
use candle_pipelines::error::Result;
use candle_pipelines::zero_shot::{ZeroShotClassificationPipelineBuilder, ModernBertSize};

fn main() -> Result<()> {
    // 1. Build the pipeline
    let pipeline = ZeroShotClassificationPipelineBuilder::modernbert(ModernBertSize::Base).build()?;

    // 2. Single-label classification
    let text = "The Federal Reserve raised interest rates.";
    let labels = &["economics", "politics", "technology", "sports"];
    let output = pipeline.run(text, labels)?;

    println!("Text: {}", text);
    for p in &output.predictions {
        println!("- {}: {:.4}", p.label, p.score);
    }
    // Example output (probabilities sum to 1.0):
    // - economics: 0.8721
    // - politics: 0.1134
    // - technology: 0.0098
    // - sports: 0.0047
    
    Ok(())
}
```

#### Multi-Label Classification (`run_multi_label`)

Use when labels can be **independent** and multiple labels could apply to the same text. Returns raw entailment probabilities.

```rust
use candle_pipelines::error::Result;
use candle_pipelines::zero_shot::{ZeroShotClassificationPipelineBuilder, ModernBertSize};

fn main() -> Result<()> {
    // 1. Build the pipeline
    let pipeline = ZeroShotClassificationPipelineBuilder::modernbert(ModernBertSize::Base).build()?;

    // 2. Multi-label classification
    let text = "I love reading books about machine learning and artificial intelligence.";
    let labels = &["technology", "education", "reading", "science"];
    let output = pipeline.run_multi_label(text, labels)?;

    println!("Text: {}", text);
    for p in &output.predictions {
        println!("- {}: {:.4}", p.label, p.score);
    }
    // Example output (independent probabilities):
    // - technology: 0.9234
    // - education: 0.8456
    // - reading: 0.9567
    // - science: 0.7821
    
    Ok(())
}
```

## Future Plans

- Add more model families and sizes
- Support additional pipelines (summarization, classification)