pub struct DataSet {
pub x: Array2<f64>,
pub y: Array1<f64>,
pub feature_names: Vec<String>,
pub target_name: String,
}Expand description
A dataset of input features x (shape [n_rows, n_vars]) and targets y ([n_rows]).
Fields§
§x: Array2<f64>Feature matrix, one row per observation.
y: Array1<f64>Target vector, one entry per observation.
feature_names: Vec<String>Names of the feature columns (length n_vars).
target_name: StringName of the target column.
Implementations§
Source§impl DataSet
impl DataSet
Sourcepub fn from_arrays(x: Array2<f64>, y: Array1<f64>) -> Result<Self>
pub fn from_arrays(x: Array2<f64>, y: Array1<f64>) -> Result<Self>
Build a dataset from in-memory arrays.
§Errors
Returns PhopError::ShapeMismatch if the row counts of x and y differ.
Sourcepub fn standardized(&self) -> (DataSet, Standardizer)
pub fn standardized(&self) -> (DataSet, Standardizer)
Produce a z-scored copy of the dataset together with the Standardizer that maps
predictions back to the original target units.
Each feature column and the target are centered and scaled to unit variance; constant columns (zero variance) are left centered with a unit scale so the transform stays finite.
Sourcepub fn select(&self, rows: &[usize]) -> Result<DataSet>
pub fn select(&self, rows: &[usize]) -> Result<DataSet>
Build a sub-dataset from the given row indices (used by minibatching).
§Errors
Returns PhopError::ShapeMismatch if any index is out of range.
Sourcepub fn minibatches(&self, size: usize, seed: u64) -> Vec<DataSet>
pub fn minibatches(&self, size: usize, seed: u64) -> Vec<DataSet>
Partition the data axis into shuffled minibatches of (at most) size rows.
The shuffle is seeded for reproducibility (Risk T3 mitigation: bounds per-step memory by
letting the optimizer consume the data in chunks). A size of 0 or one >= the row
count yields a single batch containing all rows.
Sourcepub fn from_csv<P: AsRef<Path>>(path: P) -> Result<Self>
pub fn from_csv<P: AsRef<Path>>(path: P) -> Result<Self>
Load a dataset from a CSV file.
The file is expected to have a header row. By default the last column is
taken as the target y and all preceding columns as features x.
§Errors
Returns an error if the file cannot be read, parsed, or has fewer than two columns.
Sourcepub fn from_csv_with_target<P: AsRef<Path>>(
path: P,
target: Option<usize>,
) -> Result<Self>
pub fn from_csv_with_target<P: AsRef<Path>>( path: P, target: Option<usize>, ) -> Result<Self>
Load a dataset from a CSV file, optionally choosing which column is the target.
target is a 0-based column index; None selects the last column. All other columns
become features x, preserving their header order.
§Errors
Returns an error if the file cannot be read or parsed, has fewer than two columns, or if
target is out of range.
Sourcepub fn from_csv_columns<P: AsRef<Path>>(
path: P,
features: &[usize],
target: usize,
) -> Result<Self>
pub fn from_csv_columns<P: AsRef<Path>>( path: P, features: &[usize], target: usize, ) -> Result<Self>
Load a dataset from a CSV file selecting an explicit subset of feature columns and a target column (all 0-based indices). Feature columns appear in the order given.
§Errors
Returns an error if the file cannot be read/parsed, has fewer than two columns, any index is out of range, or the target appears among the features.
Source§impl DataSet
impl DataSet
Sourcepub fn to_dimensionless(
&self,
feature_dims: &[Dimension],
) -> Result<(DataSet, Vec<Vec<i32>>)>
pub fn to_dimensionless( &self, feature_dims: &[Dimension], ) -> Result<(DataSet, Vec<Vec<i32>>)>
Reduce the features to their dimensionless Buckingham-π groups, given each feature’s
Dimension. The new feature columns are the monomials ∏ xᵢ^{eᵢ} for each π-group; the
target is left unchanged. The π-group exponent vectors are returned alongside.
§Errors
Returns PhopError::ShapeMismatch if feature_dims.len() != n_vars, or
PhopError::NotConverged if the inputs are dimensionally independent (no π-groups exist).
Trait Implementations§
Auto Trait Implementations§
impl Freeze for DataSet
impl RefUnwindSafe for DataSet
impl Send for DataSet
impl Sync for DataSet
impl Unpin for DataSet
impl UnsafeUnpin for DataSet
impl UnwindSafe for DataSet
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
impl<T> Read<Exclusive, BecauseExclusive> for Twhere
T: ?Sized,
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.