pub fn matmul_vec(
output: &mut [f32],
input: &[f32],
weight: &[f32],
k: usize,
n: usize,
)Expand description
Optimized matrix-vector multiply using NEON dot products.
For single-token inference (m=1), computes dot products between the input vector and each weight row.
Weight layout: [n, k] (row-major), so weight row j is at offset j*k.