lfm 0.1.1

Rust ONNX inference for LiquidAI LFM2.5-VL (vision-language) models — implements the engine-agnostic llmtask::Task contract via llguidance for schema-constrained sampling
Documentation

Rust ONNX inference for LiquidAI LFM2.5-VL — a 450M-parameter vision-language model with schema-constrained sampling via llguidance. Implements the engine-agnostic llmtask::Task contract, so any Task written against llmtask runs through lfm unchanged.

Overview

lfm is the LiquidAI LFM2.5-VL inference engine on Rust + ONNX Runtime + llguidance:

  • Engine — sync, single-threaded; built on ort 2.0. Engine::run<T: Task<Value = serde_json::Value>> accepts any llmtask::Task whose grammar is JSON Schema, Lark, or Regex. Schema-constrained sampling is enforced by llguidance token-mask filtering. Engine::generate is the unconstrained path for free-form text.
  • ImageAnalysisTask — built-in image-analysis preset that produces the canonical llmtask::ImageAnalysis output type, sharing the schema and parser with qwen.
  • Bundled assets — the bundled feature ships LFM2.5-VL's tokenizer, chat template, and preprocessor configs as include_bytes!. Engine::from_onnx_dir then accepts a directory containing only the three ONNX graphs; no separate tokenizer download required.
  • Wasm-friendly preprocessingpreproc::Preprocessor, TileGrid, and EXIF-aware decode helpers compile under --no-default-features --features decoders (no ort, no tokenizers).

Why an llmtask-driven engine?

A bespoke lfm::Task would force every prompt + schema + parser to be rewritten against the next inference engine. Implementing llmtask::Task instead means the same Task code targets lfm (llguidance), qwen (mistralrs), or any future llmtask-compatible backend without modification — only the hardware backend selection differs.

                                ┌──────────────────────────┐
   YourTask: impl Task   ──▶    │   llmtask::Task contract │   ──▶  lfm / qwen / …
                                │     prompt + Grammar     │
                                │     parse → Output       │
                                └──────────────────────────┘

Because lfm's backend is llguidance, all three llmtask::Grammar variants (JSON Schema, Lark, Regex) are accepted — engines that only speak JSON Schema (e.g. qwen) reject the others via UnsupportedGrammar, and the caller can route to lfm.

Features

  • All three Grammar variants — JSON Schema, Lark, and Regex are all native to llguidance, so any llmtask::Task runs through Engine::run. The HIR-anchored regex validator on the Grammar side matches engine semantics exactly (no substring vs. full-match drift).
  • Bundled tokenizer + configs (bundled feature, default)Engine::from_onnx_dir accepts an ONNX-only directory; tokenizer / chat template / preprocessor configs are embedded in the binary at compile time. Engine::from_dir is the strict constructor that byte-validates a supplied tokenizer + chat template against the bundled blobs to catch silent prompt-envelope drift.
  • Hybrid KV/conv-state cache decoder — LFM2 architecture has 10 conv-state layers and 6 attention layers at sparse indices. decoder.rs manages the non-contiguous cache layout transparently.
  • Wasm-friendly preprocessing — drop the inference and bundled defaults to get a pure-CPU image-preprocessing surface (Preprocessor, TileGrid, EXIF-aware decode) usable from wasm32-unknown-unknown.
  • GPU accelerationcuda, tensorrt, directml, rocm, coreml ORT execution providers gated behind feature flags. None are required for CPU inference.
  • Admission-control DoS guards — bounded request shape (max messages, max content parts), text-size cap, image-count lower bound from min_image_tokens, header-time decoded-buffer cap, and a special-token denylist seeded from the live tokenizer's added_vocabulary. All run BEFORE any image decode or template render.

Example

From a HuggingFace download (tokenizer.json + configs in dir)

use lfm::{
    ChatContent, ChatMessage, ContentPart, Engine, ImageInput, Options,
    RequestOptions,
};
use smol_str::SmolStr;

fn main() -> lfm::Result<()> {
    let model_dir = std::env::var("LFM_MODEL_PATH")
        .expect("set LFM_MODEL_PATH=/path/to/LFM2.5-VL-450M-ONNX");

    let mut engine = Engine::from_dir(&model_dir, Options::default())?;

    let messages = vec![ChatMessage {
        role: SmolStr::new_static("user"),
        content: ChatContent::Parts(vec![
            ContentPart::Image,
            ContentPart::Text("Describe this image.".into()),
        ]),
    }];
    let images = vec![ImageInput::Path(std::path::Path::new("photo.jpg"))];

    let text = engine.generate(&messages, &images, &RequestOptions::default())?;
    println!("{text}");
    Ok(())
}

ONNX-only dir + bundled tokenizer

If you've downloaded just the ONNX files (not tokenizer.json and the JSON configs), use Engine::from_onnx_dir. The tokenizer + configs are embedded in the binary and written to a temp file on first use.

use lfm::{Engine, Options, RequestOptions};

fn main() -> lfm::Result<()> {
    let onnx_dir = std::env::var("LFM_ONNX_PATH")
        .expect("set LFM_ONNX_PATH=/path/with/onnx-files-only");
    let mut engine = Engine::from_onnx_dir(onnx_dir, Options::default())?;
    // … same usage as Engine::from_dir
    # let _ = engine; let _ = RequestOptions::default();
    Ok(())
}

Structured output via the ImageAnalysisTask preset

use lfm::{
    ChatContent, ChatMessage, ContentPart, Engine, ImageAnalysisTask, ImageInput,
    Options, RequestOptions, Task,
};
use smol_str::SmolStr;

fn main() -> lfm::Result<()> {
    let model_dir = std::env::var("LFM_MODEL_PATH").unwrap();
    let mut engine = Engine::from_dir(&model_dir, Options::default())?;
    let task = ImageAnalysisTask::default();

    let messages = vec![ChatMessage {
        role: SmolStr::new_static("user"),
        content: ChatContent::Parts(vec![
            ContentPart::Image,
            ContentPart::Text(task.prompt().to_owned()),
        ]),
    }];
    let images = vec![ImageInput::Path(std::path::Path::new("frame.jpg"))];

    let analysis = engine.run(&task, &messages, &images, &RequestOptions::default())?;
    println!("{analysis:#?}");
    Ok(())
}

Installation

[dependencies]
lfm = "0.1"

Download the ONNX artifacts from LiquidAI/LFM2.5-VL-450M-ONNX and set LFM_MODEL_PATH to the directory containing them:

vision_encoder.onnx
embed_tokens.onnx
decoder_model_merged.onnx
tokenizer.json   (optional — bundled if absent and `bundled` feature is on)

Cargo features

Defaults: ["inference", "bundled", "decoders"].

Feature Default What it adds
inference yes Pulls ort, tokenizers, llguidance, minijinja. Activates Engine. Native targets only.
bundled yes Embeds tokenizer.json + JSON configs (~4.5 MB) at compile time; adds Engine::from_onnx_dir. Implies inference.
decoders yes Activates JPEG/PNG decoding via the image crate.
serde no Serialize/Deserialize on Options, RequestOptions, ThreadOptions, ImageBudget.
cuda no NVIDIA GPUs (Linux / Windows). Requires CUDA toolkit + cuDNN. Implies inference.
tensorrt no NVIDIA, optimized inference. Falls back to CUDA, then CPU. Implies inference.
directml no Windows GPUs (any vendor) via DirectX 12. Implies inference.
rocm no AMD GPUs (Linux). Requires ROCm SDK. Implies inference.
coreml no macOS / iOS via Core ML (Neural Engine + GPU + Metal). Implies inference.
integration no Enables the integration test (tests/integration.rs). Requires LFM_MODEL_PATH.

GPU execution-provider features are off by default — none are required for CPU inference, and each requires its vendor SDK at build time.

Wasm / preprocessing-only build

cargo build --target wasm32-unknown-unknown --no-default-features --features decoders

The public surface under --no-default-features --features decoders is preproc::Preprocessor, preproc::TileGrid, preproc::PreprocessedImage, preproc::decode_bytes_with_orientation, options::*, and error::*.

Architecture

Per-image vision encoding → text+image embedding splice → hybrid KV/conv cache decoder loop → optional schema-constrained sampling.

Graph Role Size
vision_encoder.onnx SigLIP2 image encoder — single image per call ~86M params
embed_tokens.onnx Token embedding lookup table
decoder_model_merged.onnx LFM2 hybrid LM: 10 conv-state + 6 KV-attn layers (sparse cache) ~350M params

The decoder manages a sparse hybrid cache: conv-state layers store recurrent state (not KV pairs), so cache indices are non-contiguous. Schema-constrained sampling is handled by llguidance masking the logits at each step to enforce the Grammar from the Task.

Multi-image note: the vision encoder accepts one image per call. Batched multi-image calls produce silently-wrong embeddings — Engine::generate/run iterate per-image and concatenate the flat image_features outputs in source order.

MSRV

Rust 1.95.

License

lfm is dual-licensed under the MIT license and the Apache License, Version 2.0.

The LFM2.5-VL model weights this crate runs are governed by the LFM Open License v1.0. Verify your use case complies with Liquid AI's terms separately from this crate's license.

Copyright (c) 2026 FinDIT Studio authors.