pub const DEFAULT_BATCH_SIZE: usize = 8;Expand description
Default training batch size.
§⚠ Memory warning
The ViT-B sensor encoder produces N = 2 448 patch tokens per sample.
Attention score tensors scale as B × H × chunk × N, so even with
attn_chunk_size = 64 a batch of 8 samples at fp32 consumes:
8 × 12 × 64 × 2448 × 4 bytes ≈ 60 MB per chunk (forward only)The Burn autodiff tape holds ALL chunk intermediates simultaneously
during the backward pass — multiply by ceil(N / chunk) chunks and
by depth layers. Keep batch_size ≤ 8 for ViT-B on a 16 GB GPU.
Use --cpu with a smaller model config for quick experiments.