lfm 0.1.1

Rust ONNX inference for LiquidAI LFM2.5-VL — a 450M-parameter vision-language model with schema-constrained sampling via llguidance. Implements the engine-agnostic llmtask::Task contract, so any Task written against llmtask runs through lfm unchanged.

Overview

lfm is the LiquidAI LFM2.5-VL inference engine on Rust + ONNX Runtime + llguidance:

Engine — sync, single-threaded; built on ort 2.0. Engine::run<T: Task<Value = serde_json::Value>> accepts any llmtask::Task whose grammar is JSON Schema, Lark, or Regex. Schema-constrained sampling is enforced by llguidance token-mask filtering. Engine::generate is the unconstrained path for free-form text.
ImageAnalysisTask — built-in image-analysis preset that produces the canonical llmtask::ImageAnalysis output type, sharing the schema and parser with qwen.
Bundled assets — the bundled feature ships LFM2.5-VL's tokenizer, chat template, and preprocessor configs as include_bytes!. Engine::from_onnx_dir then accepts a directory containing only the three ONNX graphs; no separate tokenizer download required.
Wasm-friendly preprocessing — preproc::Preprocessor, TileGrid, and EXIF-aware decode helpers compile under --no-default-features --features decoders (no ort, no tokenizers).

Why an `llmtask`-driven engine?

A bespoke lfm::Task would force every prompt + schema + parser to be rewritten against the next inference engine. Implementing llmtask::Task instead means the same Task code targets lfm (llguidance), qwen (mistralrs), or any future llmtask-compatible backend without modification — only the hardware backend selection differs.

                                ┌──────────────────────────┐
   YourTask: impl Task   ──▶    │   llmtask::Task contract │   ──▶  lfm / qwen / …
                                │     prompt + Grammar     │
                                │     parse → Output       │
                                └──────────────────────────┘

Because lfm's backend is llguidance, all three llmtask::Grammar variants (JSON Schema, Lark, Regex) are accepted — engines that only speak JSON Schema (e.g. qwen) reject the others via UnsupportedGrammar, and the caller can route to lfm.

Features

All three Grammar variants — JSON Schema, Lark, and Regex are all native to llguidance, so any llmtask::Task runs through Engine::run. The HIR-anchored regex validator on the Grammar side matches engine semantics exactly (no substring vs. full-match drift).
Bundled tokenizer + configs (bundled feature, default) — Engine::from_onnx_dir accepts an ONNX-only directory; tokenizer / chat template / preprocessor configs are embedded in the binary at compile time. Engine::from_dir is the strict constructor that byte-validates a supplied tokenizer + chat template against the bundled blobs to catch silent prompt-envelope drift.
Hybrid KV/conv-state cache decoder — LFM2 architecture has 10 conv-state layers and 6 attention layers at sparse indices. decoder.rs manages the non-contiguous cache layout transparently.
Wasm-friendly preprocessing — drop the inference and bundled defaults to get a pure-CPU image-preprocessing surface (Preprocessor, TileGrid, EXIF-aware decode) usable from wasm32-unknown-unknown.
GPU acceleration — cuda, tensorrt, directml, rocm, coreml ORT execution providers gated behind feature flags. None are required for CPU inference.
Admission-control DoS guards — bounded request shape (max messages, max content parts), text-size cap, image-count lower bound from min_image_tokens, header-time decoded-buffer cap, and a special-token denylist seeded from the live tokenizer's added_vocabulary. All run BEFORE any image decode or template render.

Example

From a HuggingFace download (tokenizer.json + configs in dir)

use lfm::{
    ChatContent, ChatMessage, ContentPart, Engine, ImageInput, Options,
    RequestOptions,
};
use smol_str::SmolStr;

fn main() -> lfm::Result<()> {
    let model_dir = std::env::var("LFM_MODEL_PATH")
        .expect("set LFM_MODEL_PATH=/path/to/LFM2.5-VL-450M-ONNX");

    let mut engine = Engine::from_dir(&model_dir, Options::default())?;

    let messages = vec![ChatMessage {
        role: SmolStr::new_static("user"),
        content: ChatContent::Parts(vec![
            ContentPart::Image,
            ContentPart::Text("Describe this image.".into()),
        ]),
    }];
    let images = vec![ImageInput::Path(std::path::Path::new("photo.jpg"))];

    let text = engine.generate(&messages, &images, &RequestOptions::default())?;
    println!("{text}");
    Ok(())
}

ONNX-only dir + bundled tokenizer

If you've downloaded just the ONNX files (not tokenizer.json and the JSON configs), use Engine::from_onnx_dir. The tokenizer + configs are embedded in the binary and written to a temp file on first use.

use lfm::{Engine, Options, RequestOptions};

fn main() -> lfm::Result<()> {
    let onnx_dir = std::env::var("LFM_ONNX_PATH")
        .expect("set LFM_ONNX_PATH=/path/with/onnx-files-only");
    let mut engine = Engine::from_onnx_dir(onnx_dir, Options::default())?;
    // … same usage as Engine::from_dir
    # let _ = engine; let _ = RequestOptions::default();
    Ok(())
}

Structured output via the `ImageAnalysisTask` preset

use lfm::{
    ChatContent, ChatMessage, ContentPart, Engine, ImageAnalysisTask, ImageInput,
    Options, RequestOptions, Task,
};
use smol_str::SmolStr;

fn main() -> lfm::Result<()> {
    let model_dir = std::env::var("LFM_MODEL_PATH").unwrap();
    let mut engine = Engine::from_dir(&model_dir, Options::default())?;
    let task = ImageAnalysisTask::default();

    let messages = vec![ChatMessage {
        role: SmolStr::new_static("user"),
        content: ChatContent::Parts(vec![
            ContentPart::Image,
            ContentPart::Text(task.prompt().to_owned()),
        ]),
    }];
    let images = vec![ImageInput::Path(std::path::Path::new("frame.jpg"))];

    let analysis = engine.run(&task, &messages, &images, &RequestOptions::default())?;
    println!("{analysis:#?}");
    Ok(())
}

Installation

[dependencies]
lfm = "0.1"

Download the ONNX artifacts from LiquidAI/LFM2.5-VL-450M-ONNX and set LFM_MODEL_PATH to the directory containing them:

vision_encoder.onnx
embed_tokens.onnx
decoder_model_merged.onnx
tokenizer.json   (optional — bundled if absent and `bundled` feature is on)

Cargo features

Defaults: ["inference", "bundled", "decoders"].

Feature	Default	What it adds
`inference`	yes	Pulls `ort`, `tokenizers`, `llguidance`, `minijinja`. Activates `Engine`. Native targets only.
`bundled`	yes	Embeds `tokenizer.json` + JSON configs (~4.5 MB) at compile time; adds `Engine::from_onnx_dir`. Implies `inference`.
`decoders`	yes	Activates JPEG/PNG decoding via the `image` crate.
`serde`	no	`Serialize`/`Deserialize` on `Options`, `RequestOptions`, `ThreadOptions`, `ImageBudget`.
`cuda`	no	NVIDIA GPUs (Linux / Windows). Requires CUDA toolkit + cuDNN. Implies `inference`.
`tensorrt`	no	NVIDIA, optimized inference. Falls back to CUDA, then CPU. Implies `inference`.
`directml`	no	Windows GPUs (any vendor) via DirectX 12. Implies `inference`.
`rocm`	no	AMD GPUs (Linux). Requires ROCm SDK. Implies `inference`.
`coreml`	no	macOS / iOS via Core ML (Neural Engine + GPU + Metal). Implies `inference`.
`integration`	no	Enables the integration test (`tests/integration.rs`). Requires `LFM_MODEL_PATH`.

GPU execution-provider features are off by default — none are required for CPU inference, and each requires its vendor SDK at build time.

Wasm / preprocessing-only build

cargo build --target wasm32-unknown-unknown --no-default-features --features decoders

The public surface under --no-default-features --features decoders is preproc::Preprocessor, preproc::TileGrid, preproc::PreprocessedImage, preproc::decode_bytes_with_orientation, options::*, and error::*.

Architecture

Per-image vision encoding → text+image embedding splice → hybrid KV/conv cache decoder loop → optional schema-constrained sampling.

Graph	Role	Size
`vision_encoder.onnx`	SigLIP2 image encoder — single image per call	~86M params
`embed_tokens.onnx`	Token embedding lookup table	—
`decoder_model_merged.onnx`	LFM2 hybrid LM: 10 conv-state + 6 KV-attn layers (sparse cache)	~350M params

The decoder manages a sparse hybrid cache: conv-state layers store recurrent state (not KV pairs), so cache indices are non-contiguous. Schema-constrained sampling is handled by llguidance masking the logits at each step to enforce the Grammar from the Task.

Multi-image note: the vision encoder accepts one image per call. Batched multi-image calls produce silently-wrong embeddings — Engine::generate/run iterate per-image and concatenate the flat image_features outputs in source order.

MSRV

Rust 1.95.

License

lfm is dual-licensed under the MIT license and the Apache License, Version 2.0.

The LFM2.5-VL model weights this crate runs are governed by the LFM Open License v1.0. Verify your use case complies with Liquid AI's terms separately from this crate's license.