pub fn lu_decomp(a: &[Vec<f64>]) -> (Vec<Vec<f64>>, Vec<Vec<f64>>, Vec<usize>)
LU decomposition with partial pivoting.
Returns (L, U, P) where P is the permutation vector.