Skip to main content

Dataset

Struct Dataset 

Source
#[non_exhaustive]
pub struct Dataset { pub features: Vec<Vec<f64>>, pub target: Vec<f64>, pub feature_names: Vec<String>, pub target_name: String, pub class_labels: Option<Vec<String>>, /* private fields */ }
Expand description

A tabular dataset with features and a target column.

Features are stored column-major (features[feature_idx][sample_idx]) for cache-friendly access during tree split evaluation.

Fields (Non-exhaustive)§

This struct is marked as non-exhaustive
Non-exhaustive structs could have additional fields added in future. Therefore, non-exhaustive structs cannot be constructed in external crates using the traditional Struct { .. } syntax; cannot be matched against without a wildcard ..; and struct update syntax will not work.
§features: Vec<Vec<f64>>

Feature columns: features[feature_idx][sample_idx].

§target: Vec<f64>

Target values: target[sample_idx].

§feature_names: Vec<String>

Feature column names.

§target_name: String

Target column name.

§class_labels: Option<Vec<String>>

Class label mapping (index → label string) for classification tasks.

Implementations§

Source§

impl Dataset

Source

pub fn new( features: Vec<Vec<f64>>, target: Vec<f64>, feature_names: Vec<String>, target_name: impl Into<String>, ) -> Self

Create a dataset from pre-computed features and target.

§Panics

Panics if feature columns have mismatched lengths, or if feature_names.len() != features.len().

Source

pub fn from_matrix( matrix: DenseMatrix, target: Vec<f64>, feature_names: Vec<String>, target_name: impl Into<String>, ) -> Self

Create a dataset from a DenseMatrix, target, and column names.

The features field is populated from the matrix for backward compat.

Source

pub fn matrix(&self) -> &DenseMatrix

The contiguous column-major feature matrix.

Lazily built from features on first access. Subsequent calls return the cached matrix without recomputation.

Source

pub fn n_samples(&self) -> usize

Number of samples (rows).

Source

pub fn n_features(&self) -> usize

Number of features (columns).

Source

pub fn n_classes(&self) -> usize

Number of unique classes in the target (for classification).

Source

pub fn feature(&self, idx: usize) -> &[f64]

Get a single feature column by index.

Source

pub fn sample(&self, idx: usize) -> Vec<f64>

Get a single sample (row) as a vector of feature values.

Source

pub fn feature_matrix(&self) -> Vec<Vec<f64>>

Get the feature matrix as row-major [n_samples][n_features].

Source

pub fn flat_feature_matrix(&mut self) -> &[f64]

Get a contiguous row-major feature buffer, computing on first call.

Layout: [sample_0_feat_0, sample_0_feat_1, ..., sample_n_feat_m]. Subsequent calls return the cached slice without recomputation.

Source

pub fn sample_row<'a>(&self, cache: &'a [f64], idx: usize) -> &'a [f64]

Get a zero-copy row slice from a pre-computed flat feature buffer.

cache should be the result of Dataset::flat_feature_matrix.

Source

pub fn subset(&self, indices: &[usize]) -> Self

Create a subset of this dataset with the given sample indices.

Source

pub fn sync_matrix(&mut self)

Clear the cached matrix so it will be lazily rebuilt from features on the next call to matrix().

Call this after mutating features in place (e.g. after a transformer’s transform() step).

Source

pub fn invalidate_matrix(&mut self)

Mark the matrix cache as stale after in-place feature mutations.

The matrix will be lazily rebuilt from features on next access.

Source

pub fn validate_finite(&self) -> Result<()>

Returns Err(InvalidData) if any feature or target value is NaN or ±Inf.

Source

pub fn validate_no_inf(&self) -> Result<()>

Returns Err(InvalidData) if any feature or target value is ±Inf.

Unlike validate_finite, this allows NaN values (useful for imputers that intentionally handle NaN).

Source

pub fn with_class_labels(self, labels: Vec<String>) -> Self

Attach class labels for classification.

Source

pub fn from_sparse( csc: CscMatrix, target: Vec<f64>, feature_names: Vec<String>, target_name: impl Into<String>, ) -> Self

Create a dataset from a sparse CSC matrix.

The features field is left empty. Call ensure_dense before accessing features directly on a sparse dataset.

Source

pub fn is_sparse(&self) -> bool

Whether this dataset uses sparse storage.

Source

pub fn sparse_csc(&self) -> Option<&CscMatrix>

Get the sparse CSC matrix if available.

Source

pub fn sparse_csr(&self) -> Option<CsrMatrix>

Get the sparse CSR matrix (converted from CSC on demand).

Source

pub fn summary(&self) -> Vec<ColumnStats>

Compute descriptive statistics for every feature column and the target.

Returns one ColumnStats per feature (in order) followed by one for the target column. NaN values are excluded from all computations. Standard deviation uses ddof=1 (sample std) to match pandas.

Source

pub fn describe(&self)

Print a pandas-style descriptive statistics table to stdout.

Internally calls summary().

Source

pub fn ensure_dense(&mut self)

Populate the features field from sparse storage.

No-op if the dataset is already dense. After calling this, features[j][i] is available as usual.

Trait Implementations§

Source§

impl Clone for Dataset

Source§

fn clone(&self) -> Dataset

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Dataset

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.