Struct FittedOneHotEncoder

Source

pub struct FittedOneHotEncoder<F> { /* private fields */ }

Expand description

A fitted one-hot encoder holding the sorted-unique category set per input column, plus the precomputed output-column layout.

Created by calling Fit::fit on a OneHotEncoder. Mirrors scikit-learn’s OneHotEncoder.categories_ (a list of arrays of the actual sorted-unique values, _BaseEncoder._fit:99).

Implementations§

Source §

impl<F: Float + Send + Sync + 'static> FittedOneHotEncoder<F>

Source

pub fn categories(&self) -> &[Vec<F>]

Return the learned sorted-unique category set for each input column (categories_).

categories()[j][idx] is the value encoded by output column offsets[j] + idx. Mirrors scikit-learn’s OneHotEncoder.categories_.

Source

pub fn n_categories(&self) -> Vec<usize>

Return the number of distinct categories for each input feature column, i.e. the width of each per-feature one-hot block.

Source

pub fn n_features(&self) -> usize

Return the number of input feature columns.

Source

pub fn n_output_features(&self) -> usize

Return the total number of output columns (Σ categories_[j].len()).

Source

pub fn handle_unknown(&self) -> OneHotHandleUnknown

Return the configured unknown-category strategy (handle_unknown), threaded from the unfitted OneHotEncoder.

Source

pub fn drop_idx_(&self) -> &[Option<usize>]

Return the per-feature drop index (drop_idx_).

drop_idx_()[j] is Some(d) if category categories_[j][d] is dropped from feature j’s one-hot block (its block width is one less than categories_[j].len(), and that category encodes to an all-zero block), or None if no category is dropped from that feature. Mirrors scikit-learn’s public drop_idx_ attribute (_encoders.py:608-615). With drop=None (the default) every entry is None.

Source

pub fn infrequent_categories(&self) -> Vec<Vec<F>>

Return the infrequent category values for each feature (infrequent_categories_).

infrequent_categories()[j] is the sorted list of category values from categories_[j] that were grouped into the single trailing “infrequent” output column (because their training count fell below min_frequency and/or beyond the max_categories limit). An EMPTY inner Vec means feature j had no infrequent categories (scikit-learn returns None there; an empty list is the representable equivalent). With infrequent grouping disabled every entry is empty. Mirrors scikit-learn’s OneHotEncoder.infrequent_categories_ (_encoders.py:254-262,:625-633): category[indices] over _infrequent_indices.

Source

pub fn inverse_transform(&self, x: &Array2<F>) -> Result<Array2<F>, FerroError>

Invert a one-hot encoded matrix back to the original category values.

For each input feature j the per-feature block x[:, offsets[j] .. offsets[j] + categories_[j].len()] is reduced to a single category via argmax (the index of the maximum value in the block, first-max on ties — numpy argmax semantics), and the original value categories_[j][argmax] is written to out[[i, j]]. This mirrors scikit-learn’s OneHotEncoder.inverse_transform (sklearn/preprocessing/_encoders.py:1136-1139): labels = sub.argmax(axis=1); X_tr[:, i] = cats[labels].

After the argmax, an all-zero block (a row whose per-feature block sums to zero) cannot be inverted. With no drop and the default handle_unknown='error' (the only mode ferrolearn ships — REQ-4/5), this is an error, matching sklearn’s ValueError("Samples [...] can not be inverted when drop=None and handle_unknown='error' because they contain all zeros") (_encoders.py:1160-1168). A proper one-hot row from Transform::transform has exactly one 1 per block, so argmax always finds it and the block sum is never zero.

§Errors

FerroError::InsufficientSamples if x has zero rows (sklearn check_array requires a minimum of 1 sample).
FerroError::ShapeMismatch if x.ncols() != n_output (sklearn’s “Shape of the passed X data is not correct” ValueError, _encoders.py:1100-1104).
FerroError::InvalidParameter if any per-feature block is all-zero (the sklearn all-zeros ValueError, _encoders.py:1164-1168).

Never panics: every block slice is bounds-checked (R-CODE-2).

Source

pub fn get_feature_names_out(&self) -> Vec<String>

Return the output feature names, one per output column.

For each input feature j, for each category c in categories_[j], emits format!("x{j}_{c}") where c is rendered to match Python’s str(np.float64(c)). This mirrors scikit-learn’s OneHotEncoder.get_feature_names_out with the default input_features (["x0", "x1", ...]) and the "concat" name combiner (feature + "_" + str(category), _encoders.py:1217,1224). For the whole-number fixture [[2,0],[5,1],[9,0],[5,1]] this yields ["x0_2.0", "x0_5.0", "x0_9.0", "x1_0.0", "x1_1.0"].

§Float-rendering divergence (HONEST, R-HONEST-3)

The category is rendered via Self::category_label, which appends .0 to integer-valued floats (2.0 → "2.0", -3.0 → "-3.0", matching Python) and uses Rust’s shortest round-trip Display otherwise (2.5 → "2.5"). For category values in the usual categorical range (small whole or fractional numbers) this is byte-identical to Python. It DIVERGES for extreme magnitudes: Python’s repr/str switches to scientific notation at |v| >= 1e16 and 0 < |v| < 1e-4 (1e+20, 1e-07), while Rust’s Display prints the full decimal (100000000000000000000, 0.0000001). Such values are not plausible one-hot categories; the divergence is documented rather than papered over. NaN renders as "nan" (matching Python’s str(nan)).

Source §

impl<F: Float + Send + Sync + 'static> FittedOneHotEncoder<F>

Convenience: encode a 1-D array of numeric categories.

This wraps the input in a single-column Array2<F> and returns the encoded result with one-hot columns for that single feature, matching the membership encoding of Transform::transform.

Source

pub fn transform_1d(&self, x: &[F]) -> Result<Array2<F>, FerroError>

Transform a 1-D slice of numeric category values.

§Errors

Returns an error if the encoder was fitted on more than one column, or if any value is an unknown category (not in the learned categories_[0]).

Trait Implementations§

Source §

impl<F: Clone> Clone for FittedOneHotEncoder<F>

Source §

fn clone(&self) -> FittedOneHotEncoder<F>

Returns a duplicate of the value. Read more

1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Source §

impl<F: Debug> Debug for FittedOneHotEncoder<F>

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Source §

impl<F: Float + Send + Sync + 'static> Transform<ArrayBase<OwnedRepr<F>, Dim<[usize; 2]>>> for FittedOneHotEncoder<F>

Source §

fn transform(&self, x: &Array2<F>) -> Result<Array2<F>, FerroError>

Transform numeric categorical data into a dense one-hot encoded matrix.

Each value is one-hot by category membership: for input column j the value x[[i, j]] is matched (by exact equality) against categories_[j], and the bit at output column offsets[j] + idx is set, where idx is the value’s position in the sorted-unique set. The per-feature one-hot blocks are concatenated left-to-right, matching scikit-learn’s OneHotEncoder(sparse_output=False) output column layout (_BaseEncoder._transform, _encoders.py:206-240).

A value not present in categories_[j] is an unknown category. Its handling depends on the configured handle_unknown (OneHotEncoder::with_handle_unknown):

OneHotHandleUnknown::Error (the default): returns an error, matching sklearn’s handle_unknown='error' (ValueError("Found unknown categories … during transform"), _encoders.py:209-214).
OneHotHandleUnknown::Ignore: leaves that feature’s one-hot block all-zero for this row (no column is set), matching sklearn’s handle_unknown='ignore' (_encoders.py:215-240: the unknown row is masked out so no encoded column is set). Every KNOWN feature still emits its normal one-hot bit.

The +/-inf rejection (#2225), the ncols guard, and the 0-row handling are unaffected by handle_unknown: a non-finite +/-inf value is invalid input (not an unknown category) and still errors even in Ignore mode.

§Errors

Returns FerroError::ShapeMismatch if the number of columns does not match the number of features seen during fitting.

Returns FerroError::InvalidParameter if any value is an unknown category (not in the learned categories_[j] set) AND handle_unknown is OneHotHandleUnknown::Error (the default); under OneHotHandleUnknown::Ignore an unknown category never errors. Also returned if any value is +/-infinite (invalid input, #2225).

Source §

type Output = ArrayBase<OwnedRepr<F>, Dim<[usize; 2]>>

The transformed output type.

Source §

type Error = FerroError

The error type returned by transform.

Auto Trait Implementations§

§

impl<F> UnwindSafe for FittedOneHotEncoder<F>
where F: UnwindSafe,

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> ByRef<T> for T

Source §

fn by_ref(&self) -> &T

Source §

impl<T> CloneToUninit for T
where T: Clone,

Source §

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

Source §

impl<T> DistributionExt for T
where T: ?Sized,

Source §

fn rand<T>(&self, rng: &mut (impl Rng + ?Sized)) -> T
where Self: Distribution<T>,

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T, U> Imply<T> for U
where T: ?Sized, U: ?Sized,

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §