pub struct OrdinalEncoder { /* private fields */ }Expand description
An unfitted ordinal encoder.
Calling Fit::fit on an Array2<String> learns, for each column, a
mapping from the unique string categories (sorted lexicographically)
to consecutive integers 0, 1, 2, ..., and returns a
FittedOrdinalEncoder.
Unknown categories at transform time are, by default, rejected
(HandleUnknown::Error). Configuring
with_handle_unknown with
HandleUnknown::UseEncodedValue plus
with_unknown_value instead encodes
unknown categories as the supplied sentinel (which may be f64::NAN),
matching scikit-learn’s OrdinalEncoder(handle_unknown='use_encoded_value').
§Examples
use ferrolearn_preprocess::ordinal_encoder::OrdinalEncoder;
use ferrolearn_core::traits::{Fit, Transform};
use ndarray::Array2;
let enc = OrdinalEncoder::new();
let data = Array2::from_shape_vec(
(3, 2),
vec![
"cat".to_string(), "small".to_string(),
"dog".to_string(), "large".to_string(),
"cat".to_string(), "small".to_string(),
],
).unwrap();
let fitted = enc.fit(&data, &()).unwrap();
let encoded = fitted.transform(&data).unwrap();
// Output is `Array2<f64>`, matching sklearn's `dtype=np.float64` default.
assert_eq!(encoded[[0, 0]], 0.0); // "cat" is index 0 in col 0
assert_eq!(encoded[[1, 0]], 1.0); // "dog" is index 1 in col 0Implementations§
Source§impl OrdinalEncoder
impl OrdinalEncoder
Sourcepub fn new() -> Self
pub fn new() -> Self
Create a new OrdinalEncoder with scikit-learn’s defaults
(handle_unknown='error', no unknown_value).
Sourcepub fn with_categories(self, categories: Vec<Vec<String>>) -> Self
pub fn with_categories(self, categories: Vec<Vec<String>>) -> Self
Set the explicit per-column category lists (categories=[list, ...]).
Each lists[j] is the ordered category set for column j, used as
given at fit time — the order is preserved (NOT re-sorted), so the
assigned ordinal indices follow the supplied order, matching
scikit-learn’s OrdinalEncoder(categories=...)
(sklearn/preprocessing/_encoders.py:114).
At fit time the number of lists must equal the number of input columns,
no list may contain duplicates, and (under the default
handle_unknown='error') every value seen in the data must appear in its
column’s list; otherwise Fit::fit returns an error. See Fit::fit
for the exact validation contract.
Sourcepub fn categories_param(&self) -> &Categories
pub fn categories_param(&self) -> &Categories
Return the configured categories strategy (Categories::Auto or
Categories::Explicit).
Named categories_param to avoid colliding with
FittedOrdinalEncoder::categories, which returns the learned
per-column category lists after fitting.
Sourcepub fn with_handle_unknown(self, handle_unknown: HandleUnknown) -> Self
pub fn with_handle_unknown(self, handle_unknown: HandleUnknown) -> Self
Set the unknown-category strategy (handle_unknown).
With HandleUnknown::UseEncodedValue an unknown_value must also be
supplied via with_unknown_value;
otherwise Fit::fit returns an error (matching scikit-learn’s
validation).
Sourcepub fn with_unknown_value(self, unknown_value: f64) -> Self
pub fn with_unknown_value(self, unknown_value: f64) -> Self
Set the sentinel written for unknown categories under
HandleUnknown::UseEncodedValue. May be f64::NAN.
Setting this while handle_unknown is HandleUnknown::Error causes
Fit::fit to return an error (matching scikit-learn’s validation).
Sourcepub fn handle_unknown(&self) -> HandleUnknown
pub fn handle_unknown(&self) -> HandleUnknown
Return the configured unknown-category strategy.
Sourcepub fn unknown_value(&self) -> Option<f64>
pub fn unknown_value(&self) -> Option<f64>
Return the configured unknown-category sentinel, if any.
Sourcepub fn with_min_frequency(self, min_frequency: usize) -> Self
pub fn with_min_frequency(self, min_frequency: usize) -> Self
Set the minimum-frequency threshold for infrequent grouping
(min_frequency, integer count).
At fit time a category whose count in the training data is strictly
less than min_frequency is grouped with the other infrequent
categories into a single trailing ordinal index n_frequent for that
feature (the frequent categories keep ordinal indices 0..n_frequent in
their original sorted order), matching scikit-learn’s
OrdinalEncoder(min_frequency=...) integer form
(sklearn/preprocessing/_encoders.py:1289-1297, _identify_infrequent
:295-296 category_count < self.min_frequency).
Unlike crate::OneHotEncoder, the infrequent group collapses to ONE
ordinal index (not a one-hot column), so categories_ is unchanged
(all categories retained) — only the emitted ordinal code is shared.
SCOPE (R-HONEST-3): only the integer-count form is supported. sklearn
also accepts a FLOAT min_frequency interpreted as the fraction
min_frequency * n_samples (_encoders.py:1296-1297,:297-299); the
float-fraction form is NOT-STARTED here.
Sourcepub fn with_max_categories(self, max_categories: usize) -> Self
pub fn with_max_categories(self, max_categories: usize) -> Self
Set the maximum number of output ordinal codes per feature for infrequent
grouping (max_categories).
At fit time, if a feature would otherwise produce more than
max_categories distinct ordinal codes, the least-frequent categories
are grouped into the single trailing infrequent index so the number of
codes is at most max_categories (the infrequent group itself counts
toward the limit). Mirrors scikit-learn’s
OrdinalEncoder(max_categories=...)
(sklearn/preprocessing/_encoders.py:1301-1315, _identify_infrequent
:303-315).
Sourcepub fn min_frequency(&self) -> Option<usize>
pub fn min_frequency(&self) -> Option<usize>
Return the configured minimum-frequency threshold (min_frequency), or
None if infrequent grouping by frequency is disabled.
Sourcepub fn max_categories(&self) -> Option<usize>
pub fn max_categories(&self) -> Option<usize>
Return the configured maximum ordinal-code limit (max_categories), or
None if no limit is imposed.
Trait Implementations§
Source§impl Clone for OrdinalEncoder
impl Clone for OrdinalEncoder
Source§fn clone(&self) -> OrdinalEncoder
fn clone(&self) -> OrdinalEncoder
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for OrdinalEncoder
impl Debug for OrdinalEncoder
Source§impl Default for OrdinalEncoder
impl Default for OrdinalEncoder
Source§fn default() -> OrdinalEncoder
fn default() -> OrdinalEncoder
Source§impl Fit<ArrayBase<OwnedRepr<String>, Dim<[usize; 2]>>, ()> for OrdinalEncoder
impl Fit<ArrayBase<OwnedRepr<String>, Dim<[usize; 2]>>, ()> for OrdinalEncoder
Source§fn fit(
&self,
x: &Array2<String>,
_y: &(),
) -> Result<FittedOrdinalEncoder, FerroError>
fn fit( &self, x: &Array2<String>, _y: &(), ) -> Result<FittedOrdinalEncoder, FerroError>
Fit the encoder by building per-column category-to-index mappings.
With the default categories='auto' (Categories::Auto), categories
are recorded in lexicographic order in each column, matching
scikit-learn’s OrdinalEncoder.categories_.
With explicit categories (Categories::Explicit, set via
OrdinalEncoder::with_categories), the user-provided lists are used in
the given order (NOT re-sorted), and the ordinal indices follow that
order, mirroring scikit-learn (sklearn/preprocessing/_encoders.py:114).
§Errors
Returns FerroError::InsufficientSamples if the input has zero rows.
Returns FerroError::ShapeMismatch if explicit categories are set but
the number of category lists differs from the number of input columns
(sklearn _encoders.py:85-89 “Shape mismatch: if categories is an array,
it has to be of shape (n_features,).”).
Returns FerroError::InvalidParameter if an explicit category list
contains duplicate elements (sklearn _encoders.py:136-141), or — under
the default HandleUnknown::Error — if a value seen in the data is not
in its column’s explicit list (sklearn _encoders.py:153-160 “Found
unknown categories … during fit”; SKIPPED under
HandleUnknown::UseEncodedValue).
Returns FerroError::InvalidParameter for the handle_unknown /
unknown_value validation failures (mirroring scikit-learn’s
TypeError/ValueError at _encoders.py:1473-1526): selecting
HandleUnknown::UseEncodedValue without an unknown_value; setting an
unknown_value while in HandleUnknown::Error mode; or an
unknown_value that collides with an already-used encoding index.
Source§type Fitted = FittedOrdinalEncoder
type Fitted = FittedOrdinalEncoder
fit.Source§type Error = FerroError
type Error = FerroError
fit.Source§impl FitTransform<ArrayBase<OwnedRepr<String>, Dim<[usize; 2]>>> for OrdinalEncoder
impl FitTransform<ArrayBase<OwnedRepr<String>, Dim<[usize; 2]>>> for OrdinalEncoder
Source§fn fit_transform(&self, x: &Array2<String>) -> Result<Array2<f64>, FerroError>
fn fit_transform(&self, x: &Array2<String>) -> Result<Array2<f64>, FerroError>
Fit the encoder on x and return the encoded output in one step.
§Errors
Returns an error if fitting or transformation fails.
Source§type FitError = FerroError
type FitError = FerroError
Source§impl Transform<ArrayBase<OwnedRepr<String>, Dim<[usize; 2]>>> for OrdinalEncoder
Implement Transform on the unfitted encoder to satisfy the
FitTransform: Transform supertrait bound.
impl Transform<ArrayBase<OwnedRepr<String>, Dim<[usize; 2]>>> for OrdinalEncoder
Implement Transform on the unfitted encoder to satisfy the
FitTransform: Transform supertrait bound.
Auto Trait Implementations§
impl Freeze for OrdinalEncoder
impl RefUnwindSafe for OrdinalEncoder
impl Send for OrdinalEncoder
impl Sync for OrdinalEncoder
impl Unpin for OrdinalEncoder
impl UnsafeUnpin for OrdinalEncoder
impl UnwindSafe for OrdinalEncoder
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T, U> Imply<T> for U
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§unsafe fn to_subset_unchecked(&self) -> SS
unsafe fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.