# burn_autogaze
[](https://github.com/mosure/burn_autogaze/actions?query=workflow%3Atest)
[](https://github.com/mosure/burn_autogaze/actions?query=workflow%3A%22deploy+github+pages%22)
[](https://crates.io/crates/burn_autogaze)
[](https://docs.rs/burn_autogaze)
Burn-native inference for the
[NVIDIA AutoGaze](https://huggingface.co/nvidia/AutoGaze) model, with
fixation traces, multi-scale token-cell masks, interframe reconstruction, and
native/wasm Bevy demos.
| <img src="./docs/autogaze_birds_input.gif" alt="birds input clip" title="https://www.vecteezy.com/free-videos Wildlife Stock Videos by Vecteezy"> | <img src="./docs/autogaze_birds_mask.gif" alt="scale-colored autogaze token-cell mask"> | <img src="./docs/autogaze_birds_output.gif" alt="autogaze interframe output stream"> |
## features
### high-level
- Loads Hugging Face `config.json` and `model.safetensors` weights.
- Runs on ndarray, WebGPU, CUDA, and wasm/WebGPU backends.
- Supports the realtime resize path used for low-latency streams.
- Supports AnyRes-style tiled inference for high-resolution inspection.
- Decodes NVIDIA multi-scale masks as `2x2`, `4x4`, `7x7`, and `14x14` token
cells, including tiled source-frame remapping.
- Provides interframe output accumulation, gaze/update ratio, and output PSNR
helpers.
- Exposes tensor-native pipeline nodes for camera, file, RGBA, Burn tensor, and
downstream sparse-token integrations.
- Ships a wasm-bindgen API and a symmetric native/wasm Bevy viewer.
### cargo features
| `ndarray` | yes | native | CPU reference backend |
| `webgpu` | yes | native/web | Burn WebGPU backend |
| `wgpu` | no | native/web | Burn WGPU backend without selecting the WebGPU compiler feature |
| `cuda` | no | native | CUDA backend |
| `wasm` | no | wasm32 | wasm-bindgen API over Burn WebGPU |
| `bevy-web-demo` | no | wasm32 | compatibility alias for demo builds |
## burn support
| burn_autogaze | burn | burn-store | status |
| `<0.21` | `<0.21` | `<0.21` | not supported in this repo |
## quick start
```rust,no_run
use burn::backend::{wgpu, WebGpu};
use burn_autogaze::{
AutoGazeClipShape, AutoGazeInferenceMode, AutoGazePipeline, AutoGazeRgbaClipShape,
};
let device = wgpu::WgpuDevice::default();
wgpu::init_setup::<wgpu::graphics::AutoGraphicsApi>(&device, Default::default());
let pipeline = AutoGazePipeline::<WebGpu>::from_hf_dir("/path/to/AutoGaze", &device)?;
let shape = AutoGazeClipShape::new(16, 3, 720, 1280);
let frames = vec![0.0; shape.num_values()];
let trace = pipeline.trace_clip_from_frames_with_mode(
&frames,
shape,
10,
AutoGazeInferenceMode::ResizeToModelInput,
)?;
let rgba = vec![0_u8; shape.clip_len * shape.height * shape.width * 4];
let rgba_trace = pipeline.trace_rgba_clip_with_mode(
&rgba,
AutoGazeRgbaClipShape::new(shape.clip_len, shape.height, shape.width),
10,
AutoGazeInferenceMode::ResizeToModelInput,
&device,
)?;
# Ok::<(), anyhow::Error>(())
```
`ResizeToModelInput` is the recommended realtime path. The model uses a fixed
per-frame token vocabulary (`265` tokens for the NVIDIA multi-scale config), so
high-resolution operation is handled by tiled chunking rather than by passing a
larger patch sequence into one decoder call. `TiledResizeToGrid` resizes frames
into complete `224px` tile grids before stitching each scale back into source
coordinates.
## bevy viewer
```sh
cargo run -p bevy_burn_autogaze
cargo run -p bevy_burn_autogaze -- --mode realtime
cargo run -p bevy_burn_autogaze -- --mode tiled --visualization-mode interframe
```
The no-arg native default starts with patch-diff sparse masks to avoid NVIDIA
model load/init overhead: realtime camera input, adaptive display transfer,
interframe output, PSNR overlay, a live quality slider, and no periodic
visualization keyframes. Camera frames keep updating with the latest accepted
mask while the next sparse-mask inference job is in flight. Pass
`--mask-source autogaze` to run the NVIDIA model, and
`--max-gaze-tokens-each-frame 0` for its full configured budget. The Bevy crate
selects native or wasm dependencies by target, so platform features are not
needed for normal runs.
See [crates/bevy_burn_autogaze/README.md](./crates/bevy_burn_autogaze/README.md)
for CLI/query parameters, static-source browser runs, performance summaries,
and Pages demo notes.
## wasm
```sh
cd web
npm run build:wasm
npm run serve
```
`WasmAutoGaze.create(configJson, safetensors)` performs async WebGPU setup,
loads model bytes, accepts RGBA clips, and returns mask/output buffers plus
ratio and PSNR metrics. See [web/README.md](./web/README.md) for the
wasm-bindgen API.
## docs
- [docs/api.md](./docs/api.md): core pipeline, visualization, readout, and
sparse-token adapters.
- [docs/README.md](./docs/README.md): checked-in birds assets and regeneration
command.
- [docs/benchmarking.md](./docs/benchmarking.md): benchmark lanes and useful
filters.
- [docs/performance-goal-loop.md](./docs/performance-goal-loop.md):
iterative performance, quality, metrics, and hot-path cleanup loop.
- [docs/validation.md](./docs/validation.md): release, wasm, browser, and
upstream fixture gates.
- [docs/sparse-readout-integration.md](./docs/sparse-readout-integration.md):
boundary between AutoGaze geometry and downstream sparse patchification.
## validation
```sh
cargo run -p xtask -- release-readiness
cargo test
cargo clippy --all-targets -- -D warnings
cargo check -p bevy_burn_autogaze --target wasm32-unknown-unknown
```
Hardware-specific CUDA/WebGPU tests and browser perf checks are documented in
[docs/validation.md](./docs/validation.md).