pub fn spmv_parallel(matrix: &CsrMatrix, x: &[f64]) -> Result<Vec<f64>>
Parallel SpMV using row-based parallelism.
Each row is processed independently, making this suitable for GPU execution.