pub const DEFAULT_BATCH_SIZE: usize = 32;
Messages per model-inference + write batch. e5 truncates at 512 tokens, so a 32-row batch’s padded attention transient stays bounded.