pub fn to_col_major<'a, S: Data<Elem = f64>>(
a: &'a ArrayBase<S, Ix2>,
) -> Cow<'a, [f64]>Expand description
Repack a 2D ndarray::ArrayBase (row-major) into the column-major
layout expected by every cuBLAS / cuSOLVER entry point.
Walks each column once via ndarray’s iter (no per-element bounds checks)
and extends into a pre-sized Vec. On large-scale inputs (n≈3×10⁵,
p≈35) this replaces a per-element a[[row, col]] indexing loop that
dominated the host side of every GPU dispatch.
Fast path: if the input is already F-order (column-major, contiguous in memory-order), borrow its raw buffer directly — no allocation, no copy. Standard row-major ndarrays still go through the permutation path.