pub fn parallel_sgemv(
matrix: &Array2<f32>,
vector: &ArrayView1<'_, f32>,
) -> Array1<f32>Expand description
Parallel matrix-vector multiply via row-sharded BLAS sgemv.
See call site in search_semantic for the rationale; in short,
Accelerate’s level-2 BLAS is single-threaded on macOS, so we shard
the matrix into row-chunks and call sgemv per worker to saturate
aggregate memory bandwidth.
§Panics
Panics if ndarray returns a non-contiguous slice from
Array2::slice(s![start..end, ..]). Row slices of a row-major
matrix are always contiguous, so this is structurally unreachable;
the panic guards against future layout changes that would silently
break correctness.