pub struct FittedOneHotEncoder<F> { /* private fields */ }Expand description
A fitted one-hot encoder holding the sorted-unique category set per input column, plus the precomputed output-column layout.
Created by calling Fit::fit on a OneHotEncoder. Mirrors
scikit-learn’s OneHotEncoder.categories_ (a list of arrays of the actual
sorted-unique values, _BaseEncoder._fit:99).
Implementations§
Source§impl<F: Float + Send + Sync + 'static> FittedOneHotEncoder<F>
impl<F: Float + Send + Sync + 'static> FittedOneHotEncoder<F>
Sourcepub fn categories(&self) -> &[Vec<F>]
pub fn categories(&self) -> &[Vec<F>]
Return the learned sorted-unique category set for each input column
(categories_).
categories()[j][idx] is the value encoded by output column
offsets[j] + idx. Mirrors scikit-learn’s OneHotEncoder.categories_.
Sourcepub fn n_categories(&self) -> Vec<usize>
pub fn n_categories(&self) -> Vec<usize>
Return the number of distinct categories for each input feature column, i.e. the width of each per-feature one-hot block.
Sourcepub fn n_features(&self) -> usize
pub fn n_features(&self) -> usize
Return the number of input feature columns.
Sourcepub fn n_output_features(&self) -> usize
pub fn n_output_features(&self) -> usize
Return the total number of output columns (Σ categories_[j].len()).
Sourcepub fn handle_unknown(&self) -> OneHotHandleUnknown
pub fn handle_unknown(&self) -> OneHotHandleUnknown
Return the configured unknown-category strategy (handle_unknown),
threaded from the unfitted OneHotEncoder.
Sourcepub fn drop_idx_(&self) -> &[Option<usize>]
pub fn drop_idx_(&self) -> &[Option<usize>]
Return the per-feature drop index (drop_idx_).
drop_idx_()[j] is Some(d) if category categories_[j][d] is dropped
from feature j’s one-hot block (its block width is one less than
categories_[j].len(), and that category encodes to an all-zero block),
or None if no category is dropped from that feature. Mirrors
scikit-learn’s public drop_idx_ attribute (_encoders.py:608-615). With
drop=None (the default) every entry is None.
Sourcepub fn infrequent_categories(&self) -> Vec<Vec<F>>
pub fn infrequent_categories(&self) -> Vec<Vec<F>>
Return the infrequent category values for each feature
(infrequent_categories_).
infrequent_categories()[j] is the sorted list of category values from
categories_[j] that were grouped into the single trailing “infrequent”
output column (because their training count fell below min_frequency
and/or beyond the max_categories limit). An EMPTY inner Vec means
feature j had no infrequent categories (scikit-learn returns None
there; an empty list is the representable equivalent). With infrequent
grouping disabled every entry is empty. Mirrors scikit-learn’s
OneHotEncoder.infrequent_categories_
(_encoders.py:254-262,:625-633): category[indices] over
_infrequent_indices.
Sourcepub fn inverse_transform(&self, x: &Array2<F>) -> Result<Array2<F>, FerroError>
pub fn inverse_transform(&self, x: &Array2<F>) -> Result<Array2<F>, FerroError>
Invert a one-hot encoded matrix back to the original category values.
For each input feature j the per-feature block
x[:, offsets[j] .. offsets[j] + categories_[j].len()] is reduced to a
single category via argmax (the index of the maximum value in the
block, first-max on ties — numpy argmax semantics), and the original
value categories_[j][argmax] is written to out[[i, j]]. This mirrors
scikit-learn’s OneHotEncoder.inverse_transform
(sklearn/preprocessing/_encoders.py:1136-1139):
labels = sub.argmax(axis=1); X_tr[:, i] = cats[labels].
After the argmax, an all-zero block (a row whose per-feature block
sums to zero) cannot be inverted. With no drop and the default
handle_unknown='error' (the only mode ferrolearn ships — REQ-4/5), this
is an error, matching sklearn’s
ValueError("Samples [...] can not be inverted when drop=None and handle_unknown='error' because they contain all zeros")
(_encoders.py:1160-1168). A proper one-hot row from
Transform::transform has exactly one 1 per block, so argmax always
finds it and the block sum is never zero.
§Errors
FerroError::InsufficientSamplesifxhas zero rows (sklearncheck_arrayrequires a minimum of 1 sample).FerroError::ShapeMismatchifx.ncols() != n_output(sklearn’s “Shape of the passed X data is not correct”ValueError,_encoders.py:1100-1104).FerroError::InvalidParameterif any per-feature block is all-zero (the sklearn all-zerosValueError,_encoders.py:1164-1168).
Never panics: every block slice is bounds-checked (R-CODE-2).
Sourcepub fn get_feature_names_out(&self) -> Vec<String>
pub fn get_feature_names_out(&self) -> Vec<String>
Return the output feature names, one per output column.
For each input feature j, for each category c in categories_[j],
emits format!("x{j}_{c}") where c is rendered to match Python’s
str(np.float64(c)). This mirrors scikit-learn’s
OneHotEncoder.get_feature_names_out with the default input_features
(["x0", "x1", ...]) and the "concat" name combiner
(feature + "_" + str(category), _encoders.py:1217,1224). For the
whole-number fixture [[2,0],[5,1],[9,0],[5,1]] this yields
["x0_2.0", "x0_5.0", "x0_9.0", "x1_0.0", "x1_1.0"].
§Float-rendering divergence (HONEST, R-HONEST-3)
The category is rendered via Self::category_label, which appends .0
to integer-valued floats (2.0 → "2.0", -3.0 → "-3.0", matching
Python) and uses Rust’s shortest round-trip Display otherwise
(2.5 → "2.5"). For category values in the usual categorical range
(small whole or fractional numbers) this is byte-identical to Python.
It DIVERGES for extreme magnitudes: Python’s repr/str switches to
scientific notation at |v| >= 1e16 and 0 < |v| < 1e-4
(1e+20, 1e-07), while Rust’s Display prints the full decimal
(100000000000000000000, 0.0000001). Such values are not plausible
one-hot categories; the divergence is documented rather than papered over.
NaN renders as "nan" (matching Python’s str(nan)).
Source§impl<F: Float + Send + Sync + 'static> FittedOneHotEncoder<F>
Convenience: encode a 1-D array of numeric categories.
impl<F: Float + Send + Sync + 'static> FittedOneHotEncoder<F>
Convenience: encode a 1-D array of numeric categories.
This wraps the input in a single-column Array2<F> and returns the encoded
result with one-hot columns for that single feature, matching the membership
encoding of Transform::transform.
Sourcepub fn transform_1d(&self, x: &[F]) -> Result<Array2<F>, FerroError>
pub fn transform_1d(&self, x: &[F]) -> Result<Array2<F>, FerroError>
Transform a 1-D slice of numeric category values.
§Errors
Returns an error if the encoder was fitted on more than one column, or if
any value is an unknown category (not in the learned categories_[0]).
Trait Implementations§
Source§impl<F: Clone> Clone for FittedOneHotEncoder<F>
impl<F: Clone> Clone for FittedOneHotEncoder<F>
Source§fn clone(&self) -> FittedOneHotEncoder<F>
fn clone(&self) -> FittedOneHotEncoder<F>
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl<F: Debug> Debug for FittedOneHotEncoder<F>
impl<F: Debug> Debug for FittedOneHotEncoder<F>
Source§impl<F: Float + Send + Sync + 'static> Transform<ArrayBase<OwnedRepr<F>, Dim<[usize; 2]>>> for FittedOneHotEncoder<F>
impl<F: Float + Send + Sync + 'static> Transform<ArrayBase<OwnedRepr<F>, Dim<[usize; 2]>>> for FittedOneHotEncoder<F>
Source§fn transform(&self, x: &Array2<F>) -> Result<Array2<F>, FerroError>
fn transform(&self, x: &Array2<F>) -> Result<Array2<F>, FerroError>
Transform numeric categorical data into a dense one-hot encoded matrix.
Each value is one-hot by category membership: for input column j the
value x[[i, j]] is matched (by exact equality) against categories_[j],
and the bit at output column offsets[j] + idx is set, where idx is the
value’s position in the sorted-unique set. The per-feature one-hot blocks
are concatenated left-to-right, matching scikit-learn’s
OneHotEncoder(sparse_output=False) output column layout
(_BaseEncoder._transform, _encoders.py:206-240).
A value not present in categories_[j] is an unknown category. Its
handling depends on the configured handle_unknown
(OneHotEncoder::with_handle_unknown):
OneHotHandleUnknown::Error(the default): returns an error, matching sklearn’shandle_unknown='error'(ValueError("Found unknown categories … during transform"),_encoders.py:209-214).OneHotHandleUnknown::Ignore: leaves that feature’s one-hot block all-zero for this row (no column is set), matching sklearn’shandle_unknown='ignore'(_encoders.py:215-240: the unknown row is masked out so no encoded column is set). Every KNOWN feature still emits its normal one-hot bit.
The +/-inf rejection (#2225), the ncols guard, and the 0-row handling are
unaffected by handle_unknown: a non-finite +/-inf value is invalid input
(not an unknown category) and still errors even in Ignore mode.
§Errors
Returns FerroError::ShapeMismatch if the number of columns does not
match the number of features seen during fitting.
Returns FerroError::InvalidParameter if any value is an unknown
category (not in the learned categories_[j] set) AND handle_unknown
is OneHotHandleUnknown::Error (the default); under
OneHotHandleUnknown::Ignore an unknown category never errors. Also
returned if any value is +/-infinite (invalid input, #2225).
Source§type Error = FerroError
type Error = FerroError
transform.Auto Trait Implementations§
impl<F> Freeze for FittedOneHotEncoder<F>
impl<F> RefUnwindSafe for FittedOneHotEncoder<F>where
F: RefUnwindSafe,
impl<F> Send for FittedOneHotEncoder<F>where
F: Send,
impl<F> Sync for FittedOneHotEncoder<F>where
F: Sync,
impl<F> Unpin for FittedOneHotEncoder<F>where
F: Unpin,
impl<F> UnsafeUnpin for FittedOneHotEncoder<F>
impl<F> UnwindSafe for FittedOneHotEncoder<F>where
F: UnwindSafe,
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T, U> Imply<T> for U
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§unsafe fn to_subset_unchecked(&self) -> SS
unsafe fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.