qwen3-vl 0.1.1

Qwen3-VL vision-language structured-output engine over mistralrs, implementing the engine-agnostic llmtask::Task contract
Documentation

Qwen3-VL-2B structured-output inference engine — async, mistralrs-backed, JSON-Schema-constrained. Implements the engine-agnostic llmtask::Task contract so the same prompt + schema + parser runs on any llmtask-compatible backend (lfm, qwen3-vl, …) without translation.

Overview

qwen3-vl runs the Qwen3-VL-2B-Instruct vision-language model through mistralrs with JSON-Schema-constrained sampling. It implements the engine-agnostic llmtask::Task contract — so any Task written against llmtask runs through qwen3-vl unchanged, and the public API stays backend-pluggable.

  • Engine — async, mistralrs-backed Qwen3-VL inference. Engine::run<T: Task<Value = serde_json::Value>> accepts any JSON-schema task; the result is decoded by the task's parse impl.
  • ImageAnalysisTask — built-in image-analysis preset (single-image VLM scene description). Owns the prompt, the JSON schema, and the resilient parser ported from the legacy findit-qwen service. Produces the canonical llmtask::ImageAnalysis output type.
  • CPU by default, opt-in GPUqwen3-vl does not re-export mistralrs's hardware-backend features. Consumers depend on mistralrs directly with the desired backend (metal / cuda / cudnn / …); Cargo unifies feature sets and qwen3-vl picks up the selection.

Why an llmtask-driven engine?

A bespoke qwen3_vl::Task would force every prompt + schema + parser to be rewritten against the next inference engine. Implementing llmtask::Task instead means the same Task code targets qwen3-vl (mistralrs), lfm (llguidance), or any future llmtask-compatible backend without modification — only the hardware backend selection differs.

                                ┌──────────────────────────┐
   YourTask: impl Task   ──▶    │   llmtask::Task contract │   ──▶  qwen3-vl / lfm / …
                                │     prompt + Grammar     │
                                │     parse → Output       │
                                └──────────────────────────┘

Features

  • Async, single-engine inferenceEngine::run(&task, images).await. No built-in cancellation token; wrap with tokio::time::timeout or tokio::select!.
  • Bounded inference timeout — every Engine::run is wrapped in tokio::time::timeout(EngineOptions::inference_timeout) (default 300 s). A stuck model (Metal JIT stall, GPU memory exhaustion) surfaces as Error::InferenceTimeout instead of blocking the caller indefinitely.
  • finish_reason discipline — mistralrs's Choice::finish_reason != "stop" (e.g. "length", "model_length") is surfaced as Error::Truncated BEFORE the parser runs, so partial JSON can never silently land in a downstream search index.
  • Sampler-options validationRequestOptions::validate rejects out-of-range values (negative temperature, top_p > 1.0, top_k = 0) at the engine boundary instead of hitting undefined behavior inside mistralrs's sampler.
  • Resilient JSON parser (ImageAnalysisTask)TagList / DetectionLabels accept list-or-string forms; #[serde(deny_unknown_fields)] on the schema struct; required arrays set to null are rejected (not coerced to empty); an indexable-content gate surfaces decoder/model regressions as JsonParseError::NoUsableFields by default.
  • Indexing-safe greedy defaultEngineOptions::new embeds RequestOptions::deterministic() (greedy, temperature = 0.0) so retries / timeouts / backfills produce bit-stable ImageAnalysis across runs. Swap to the model-card stochastic sampler with .with_request(RequestOptions::new()) or Engine::run_with.

Example

use qwen3_vl::{Engine, EngineOptions, image_analysis::ImageAnalysisTask};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Engine-level default sampler is greedy (deterministic) so retries
    // and backfills produce bit-stable ImageAnalysis values. Swap to the
    // Qwen3-VL model card stochastic profile via
    // `.with_request(RequestOptions::new())`, or override per-call with
    // `Engine::run_with`.
    let engine = Engine::load(EngineOptions::new("/path/to/qwen3-vl-2b")).await?;
    let task = ImageAnalysisTask::new();

    let images = vec![
        image::open("scene_keyframe_1.jpg")?,
        image::open("scene_keyframe_2.jpg")?,
    ];

    let result = engine.run(&task, images).await?;
    println!("scene: {:?}, tags: {:?}", result.scene(), result.tags());
    Ok(())
}

Engine::run consumes Vec<DynamicImage> because mistralrs 0.8's MultimodalMessages::add_image_message takes the vec by value — borrowing would force a silent .to_vec() clone of decoded image data.

Per-call sampler override

# use qwen3_vl::{Engine, RequestOptions, image_analysis::ImageAnalysisTask};
# async fn x(engine: Engine, task: ImageAnalysisTask, images: Vec<image::DynamicImage>)
#   -> Result<(), Box<dyn std::error::Error>> {
let opts = RequestOptions::new()
    .with_temperature(0.3)
    .with_top_k(50);
let result = engine.run_with(&task, images, &opts).await?;
# Ok(()) }

Installation

[dependencies]
qwen3-vl = "0.1"
use qwen3_vl::{Engine, EngineOptions};

Hardware backend selection

Default features are CPU-only — qwen3-vl builds out of the box on every host mistralrs supports. To enable a hardware backend (Metal, CUDA, etc.), depend on mistralrs directly and select its feature; Cargo unifies feature sets across all references to the same crate, so qwen3-vl automatically picks up your selection:

[dependencies]
qwen3-vl  = "0.1"
# Pick at most one primary GPU backend; accelerated BLAS / cuDNN /
# NCCL / flash-attn options layer on top.
mistralrs = { version = "0.8", features = ["metal"] }    # Apple Metal
# mistralrs = { version = "0.8", features = ["cuda"] }   # NVIDIA CUDA
# mistralrs = { version = "0.8", features = ["accelerate"] }  # Apple Accelerate BLAS (CPU)

The full backend matrix mistralrs supports: metal, cuda, cudnn, flash-attn, accelerate, mkl, nccl, ring. Each may require an external toolchain (Xcode Command Line Tools for metal / accelerate, the CUDA toolkit for cuda, etc.) — see the mistralrs README for prerequisites.

Cargo features

Feature Default What it adds
integration no Enables tests/integration_scene.rs (needs QWEN_MODEL_PATH and ~4 GB of weights)
trace-output no Logs raw model output at tracing::trace level — heavyweight; debugging only

MSRV

Rust 1.95.

License

qwen3-vl is dual-licensed under the MIT license and the Apache License, Version 2.0.

The Qwen3-VL model weights this crate runs are governed by their own license — see the model card for terms.

Copyright (c) 2026 FinDIT Studio authors.