pub fn gpu_matrix_add(a_data: Vec<f64>, b_data: Vec<f64>) -> PyResult<Vec<f64>>
Add two row-major matrices element-wise.
Both vectors must have the same length (= rows × cols).
Returns PyValueError on length mismatch.
PyValueError