chat-mlx 0.0.0

Local-inference chat-rs provider (and CLI) for MiniCPM5 / Llama / Qwen models on Apple Silicon via MLX.
1
2
3
4
5
6
7
8
9
10
11
12
use mlx_rs::{Array, error::Exception};

/// A per-step logit transform for constrained decoding: `mask` restricts which
/// tokens may be sampled next, and `accept` advances internal state with the
/// token that was chosen. Implemented by `parsers::json::JsonConstraint`.
pub trait LogitMask {
    /// Return `logits` with disallowed tokens pushed to `-inf`.
    fn mask(&self, logits: &Array) -> Result<Array, Exception>;

    /// Record the token that was actually sampled.
    fn accept(&mut self, token: u32);
}