#[non_exhaustive]pub struct Dataset {
pub features: Vec<Vec<f64>>,
pub target: Vec<f64>,
pub feature_names: Vec<String>,
pub target_name: String,
pub class_labels: Option<Vec<String>>,
/* private fields */
}Expand description
A tabular dataset with features and a target column.
Features are stored column-major (features[feature_idx][sample_idx])
for cache-friendly access during tree split evaluation.
Fields (Non-exhaustive)§
This struct is marked as non-exhaustive
Struct { .. } syntax; cannot be matched against without a wildcard ..; and struct update syntax will not work.features: Vec<Vec<f64>>Feature columns: features[feature_idx][sample_idx].
target: Vec<f64>Target values: target[sample_idx].
feature_names: Vec<String>Feature column names.
target_name: StringTarget column name.
class_labels: Option<Vec<String>>Class label mapping (index → label string) for classification tasks.
Implementations§
Source§impl Dataset
impl Dataset
Sourcepub fn new(
features: Vec<Vec<f64>>,
target: Vec<f64>,
feature_names: Vec<String>,
target_name: impl Into<String>,
) -> Self
pub fn new( features: Vec<Vec<f64>>, target: Vec<f64>, feature_names: Vec<String>, target_name: impl Into<String>, ) -> Self
Create a dataset from pre-computed features and target.
§Panics
Panics if feature columns have mismatched lengths, or if
feature_names.len() != features.len().
Sourcepub fn from_matrix(
matrix: DenseMatrix,
target: Vec<f64>,
feature_names: Vec<String>,
target_name: impl Into<String>,
) -> Self
pub fn from_matrix( matrix: DenseMatrix, target: Vec<f64>, feature_names: Vec<String>, target_name: impl Into<String>, ) -> Self
Create a dataset from a DenseMatrix, target, and column names.
The features field is populated from the matrix for backward compat.
Sourcepub fn matrix(&self) -> &DenseMatrix
pub fn matrix(&self) -> &DenseMatrix
The contiguous column-major feature matrix.
Lazily built from features on first access. Subsequent calls
return the cached matrix without recomputation.
Sourcepub fn n_features(&self) -> usize
pub fn n_features(&self) -> usize
Number of features (columns).
Sourcepub fn sample(&self, idx: usize) -> Vec<f64>
pub fn sample(&self, idx: usize) -> Vec<f64>
Get a single sample (row) as a vector of feature values.
Sourcepub fn feature_matrix(&self) -> Vec<Vec<f64>>
pub fn feature_matrix(&self) -> Vec<Vec<f64>>
Get the feature matrix as row-major [n_samples][n_features].
Sourcepub fn flat_feature_matrix(&mut self) -> &[f64]
pub fn flat_feature_matrix(&mut self) -> &[f64]
Get a contiguous row-major feature buffer, computing on first call.
Layout: [sample_0_feat_0, sample_0_feat_1, ..., sample_n_feat_m].
Subsequent calls return the cached slice without recomputation.
Sourcepub fn sample_row<'a>(&self, cache: &'a [f64], idx: usize) -> &'a [f64]
pub fn sample_row<'a>(&self, cache: &'a [f64], idx: usize) -> &'a [f64]
Get a zero-copy row slice from a pre-computed flat feature buffer.
cache should be the result of Dataset::flat_feature_matrix.
Sourcepub fn subset(&self, indices: &[usize]) -> Self
pub fn subset(&self, indices: &[usize]) -> Self
Create a subset of this dataset with the given sample indices.
Sourcepub fn sync_matrix(&mut self)
pub fn sync_matrix(&mut self)
Clear the cached matrix so it will be lazily rebuilt from features
on the next call to matrix().
Call this after mutating features in place (e.g. after a
transformer’s transform() step).
Sourcepub fn invalidate_matrix(&mut self)
pub fn invalidate_matrix(&mut self)
Mark the matrix cache as stale after in-place feature mutations.
The matrix will be lazily rebuilt from features on next access.
Sourcepub fn validate_finite(&self) -> Result<()>
pub fn validate_finite(&self) -> Result<()>
Returns Err(InvalidData) if any feature or target value is NaN or ±Inf.
Sourcepub fn validate_no_inf(&self) -> Result<()>
pub fn validate_no_inf(&self) -> Result<()>
Returns Err(InvalidData) if any feature or target value is ±Inf.
Unlike validate_finite, this allows NaN
values (useful for imputers that intentionally handle NaN).
Sourcepub fn with_class_labels(self, labels: Vec<String>) -> Self
pub fn with_class_labels(self, labels: Vec<String>) -> Self
Attach class labels for classification.
Sourcepub fn from_sparse(
csc: CscMatrix,
target: Vec<f64>,
feature_names: Vec<String>,
target_name: impl Into<String>,
) -> Self
pub fn from_sparse( csc: CscMatrix, target: Vec<f64>, feature_names: Vec<String>, target_name: impl Into<String>, ) -> Self
Create a dataset from a sparse CSC matrix.
The features field is left empty. Call ensure_dense
before accessing features directly on a sparse dataset.
Sourcepub fn sparse_csc(&self) -> Option<&CscMatrix>
pub fn sparse_csc(&self) -> Option<&CscMatrix>
Get the sparse CSC matrix if available.
Sourcepub fn sparse_csr(&self) -> Option<CsrMatrix>
pub fn sparse_csr(&self) -> Option<CsrMatrix>
Get the sparse CSR matrix (converted from CSC on demand).
Sourcepub fn summary(&self) -> Vec<ColumnStats>
pub fn summary(&self) -> Vec<ColumnStats>
Compute descriptive statistics for every feature column and the target.
Returns one ColumnStats per feature (in order) followed by one for
the target column. NaN values are excluded from all computations.
Standard deviation uses ddof=1 (sample std) to match pandas.
Sourcepub fn describe(&self)
pub fn describe(&self)
Print a pandas-style descriptive statistics table to stdout.
Internally calls summary().
Sourcepub fn ensure_dense(&mut self)
pub fn ensure_dense(&mut self)
Populate the features field from sparse storage.
No-op if the dataset is already dense. After calling this,
features[j][i] is available as usual.
Trait Implementations§
Auto Trait Implementations§
impl !Freeze for Dataset
impl RefUnwindSafe for Dataset
impl Send for Dataset
impl Sync for Dataset
impl Unpin for Dataset
impl UnsafeUnpin for Dataset
impl UnwindSafe for Dataset
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more