burn_autogaze
Burn-native inference for the NVIDIA AutoGaze model, with fixation traces, multi-scale token-cell masks, interframe reconstruction, and native/wasm Bevy demos.
| input | mask | output |
|---|---|---|
features
high-level
- Loads Hugging Face
config.jsonandmodel.safetensorsweights. - Runs on ndarray, WebGPU, CUDA, and wasm/WebGPU backends.
- Supports the realtime resize path used for low-latency streams.
- Supports AnyRes-style tiled inference for high-resolution inspection.
- Decodes NVIDIA multi-scale masks as
2x2,4x4,7x7, and14x14token cells, including tiled source-frame remapping. - Provides interframe output accumulation, gaze/update ratio, and output PSNR helpers.
- Exposes tensor-native pipeline nodes for camera, file, RGBA, Burn tensor, and downstream sparse-token integrations.
- Ships a wasm-bindgen API and a symmetric native/wasm Bevy viewer.
cargo features
| feature | default | target | notes |
|---|---|---|---|
ndarray |
yes | native | CPU reference backend |
webgpu |
yes | native/web | Burn WebGPU backend |
wgpu |
no | native/web | Burn WGPU backend without selecting the WebGPU compiler feature |
cuda |
no | native | CUDA backend |
wasm |
no | wasm32 | wasm-bindgen API over Burn WebGPU |
bevy-web-demo |
no | wasm32 | compatibility alias for demo builds |
burn support
| burn_autogaze | burn | burn-store | status |
|---|---|---|---|
0.21.x |
0.21.x |
0.21.x |
current |
<0.21 |
<0.21 |
<0.21 |
not supported in this repo |
quick start
use ;
use ;
let device = default;
;
let pipeline = from_hf_dir?;
let shape = new;
let frames = vec!;
let trace = pipeline.trace_clip_from_frames_with_mode?;
let rgba = vec!;
let rgba_trace = pipeline.trace_rgba_clip_with_mode?;
# Ok::
ResizeToModelInput is the recommended realtime path. The model uses a fixed
per-frame token vocabulary (265 tokens for the NVIDIA multi-scale config), so
high-resolution operation is handled by tiled chunking rather than by passing a
larger patch sequence into one decoder call. TiledResizeToGrid resizes frames
into complete 224px tile grids before stitching each scale back into source
coordinates.
bevy viewer
The no-arg native default starts with patch-diff sparse masks to avoid NVIDIA
model load/init overhead: realtime camera input, adaptive display transfer,
interframe output, PSNR overlay, a live quality slider, and no periodic
visualization keyframes. Camera frames keep updating with the latest accepted
mask while the next sparse-mask inference job is in flight. Pass
--mask-source autogaze to run the NVIDIA model, and
--max-gaze-tokens-each-frame 0 for its full configured budget. The Bevy crate
selects native or wasm dependencies by target, so platform features are not
needed for normal runs.
See crates/bevy_burn_autogaze/README.md for CLI/query parameters, static-source browser runs, performance summaries, and Pages demo notes.
wasm
WasmAutoGaze.create(configJson, safetensors) performs async WebGPU setup,
loads model bytes, accepts RGBA clips, and returns mask/output buffers plus
ratio and PSNR metrics. See web/README.md for the
wasm-bindgen API.
docs
- docs/api.md: core pipeline, visualization, readout, and sparse-token adapters.
- docs/README.md: checked-in birds assets and regeneration command.
- docs/benchmarking.md: benchmark lanes and useful filters.
- docs/performance-goal-loop.md: iterative performance, quality, metrics, and hot-path cleanup loop.
- docs/validation.md: release, wasm, browser, and upstream fixture gates.
- docs/sparse-readout-integration.md: boundary between AutoGaze geometry and downstream sparse patchification.
validation
Hardware-specific CUDA/WebGPU tests and browser perf checks are documented in docs/validation.md.