burn_autogaze 0.21.6

AutoGaze inference, fixation traces, and crisp mask visualization for Burn
Documentation

burn_autogaze

test deploy github pages crates.io docs.rs

Burn-native inference for the NVIDIA AutoGaze model, with fixation traces, multi-scale token-cell masks, interframe reconstruction, and native/wasm Bevy demos.

input mask output

features

high-level

  • Loads Hugging Face config.json and model.safetensors weights.
  • Runs on ndarray, WebGPU, CUDA, and wasm/WebGPU backends.
  • Supports the realtime resize path used for low-latency streams.
  • Supports AnyRes-style tiled inference for high-resolution inspection.
  • Decodes NVIDIA multi-scale masks as 2x2, 4x4, 7x7, and 14x14 token cells, including tiled source-frame remapping.
  • Provides interframe output accumulation, gaze/update ratio, and output PSNR helpers.
  • Exposes tensor-native pipeline nodes for camera, file, RGBA, Burn tensor, and downstream sparse-token integrations.
  • Ships a wasm-bindgen API and a symmetric native/wasm Bevy viewer.

cargo features

feature default target notes
ndarray yes native CPU reference backend
webgpu yes native/web Burn WebGPU backend
wgpu no native/web Burn WGPU backend without selecting the WebGPU compiler feature
cuda no native CUDA backend
wasm no wasm32 wasm-bindgen API over Burn WebGPU
bevy-web-demo no wasm32 compatibility alias for demo builds

burn support

burn_autogaze burn burn-store status
0.21.x 0.21.x 0.21.x current
<0.21 <0.21 <0.21 not supported in this repo

quick start

use burn::backend::{wgpu, WebGpu};
use burn_autogaze::{
    AutoGazeClipShape, AutoGazeInferenceMode, AutoGazePipeline, AutoGazeRgbaClipShape,
};

let device = wgpu::WgpuDevice::default();
wgpu::init_setup::<wgpu::graphics::AutoGraphicsApi>(&device, Default::default());

let pipeline = AutoGazePipeline::<WebGpu>::from_hf_dir("/path/to/AutoGaze", &device)?;

let shape = AutoGazeClipShape::new(16, 3, 720, 1280);
let frames = vec![0.0; shape.num_values()];
let trace = pipeline.trace_clip_from_frames_with_mode(
    &frames,
    shape,
    10,
    AutoGazeInferenceMode::ResizeToModelInput,
)?;

let rgba = vec![0_u8; shape.clip_len * shape.height * shape.width * 4];
let rgba_trace = pipeline.trace_rgba_clip_with_mode(
    &rgba,
    AutoGazeRgbaClipShape::new(shape.clip_len, shape.height, shape.width),
    10,
    AutoGazeInferenceMode::ResizeToModelInput,
    &device,
)?;

# Ok::<(), anyhow::Error>(())

ResizeToModelInput is the recommended realtime path. The model uses a fixed per-frame token vocabulary (265 tokens for the NVIDIA multi-scale config), so high-resolution operation is handled by tiled chunking rather than by passing a larger patch sequence into one decoder call. TiledResizeToGrid resizes frames into complete 224px tile grids before stitching each scale back into source coordinates.

bevy viewer

cargo run -p bevy_burn_autogaze
cargo run -p bevy_burn_autogaze -- --mode realtime
cargo run -p bevy_burn_autogaze -- --mode tiled --visualization-mode interframe

The no-arg native default starts with patch-diff sparse masks to avoid NVIDIA model load/init overhead: realtime camera input, adaptive display transfer, interframe output, PSNR overlay, a live quality slider, and no periodic visualization keyframes. Camera frames keep updating with the latest accepted mask while the next sparse-mask inference job is in flight. Pass --mask-source autogaze to run the NVIDIA model, and --max-gaze-tokens-each-frame 0 for its full configured budget. The Bevy crate selects native or wasm dependencies by target, so platform features are not needed for normal runs.

See crates/bevy_burn_autogaze/README.md for CLI/query parameters, static-source browser runs, performance summaries, and Pages demo notes.

wasm

cd web
npm run build:wasm
npm run serve

WasmAutoGaze.create(configJson, safetensors) performs async WebGPU setup, loads model bytes, accepts RGBA clips, and returns mask/output buffers plus ratio and PSNR metrics. See web/README.md for the wasm-bindgen API.

docs

validation

cargo run -p xtask -- release-readiness
cargo test
cargo clippy --all-targets -- -D warnings
cargo check -p bevy_burn_autogaze --target wasm32-unknown-unknown

Hardware-specific CUDA/WebGPU tests and browser perf checks are documented in docs/validation.md.