Skip to main content

OrdinalEncoder

Struct OrdinalEncoder 

Source
pub struct OrdinalEncoder { /* private fields */ }
Expand description

An unfitted ordinal encoder.

Calling Fit::fit on an Array2<String> learns, for each column, a mapping from the unique string categories (sorted lexicographically) to consecutive integers 0, 1, 2, ..., and returns a FittedOrdinalEncoder.

Unknown categories at transform time are, by default, rejected (HandleUnknown::Error). Configuring with_handle_unknown with HandleUnknown::UseEncodedValue plus with_unknown_value instead encodes unknown categories as the supplied sentinel (which may be f64::NAN), matching scikit-learn’s OrdinalEncoder(handle_unknown='use_encoded_value').

§Examples

use ferrolearn_preprocess::ordinal_encoder::OrdinalEncoder;
use ferrolearn_core::traits::{Fit, Transform};
use ndarray::Array2;

let enc = OrdinalEncoder::new();
let data = Array2::from_shape_vec(
    (3, 2),
    vec![
        "cat".to_string(), "small".to_string(),
        "dog".to_string(), "large".to_string(),
        "cat".to_string(), "small".to_string(),
    ],
).unwrap();
let fitted = enc.fit(&data, &()).unwrap();
let encoded = fitted.transform(&data).unwrap();
// Output is `Array2<f64>`, matching sklearn's `dtype=np.float64` default.
assert_eq!(encoded[[0, 0]], 0.0); // "cat" is index 0 in col 0
assert_eq!(encoded[[1, 0]], 1.0); // "dog" is index 1 in col 0

Implementations§

Source§

impl OrdinalEncoder

Source

pub fn new() -> Self

Create a new OrdinalEncoder with scikit-learn’s defaults (handle_unknown='error', no unknown_value).

Source

pub fn with_categories(self, categories: Vec<Vec<String>>) -> Self

Set the explicit per-column category lists (categories=[list, ...]).

Each lists[j] is the ordered category set for column j, used as given at fit time — the order is preserved (NOT re-sorted), so the assigned ordinal indices follow the supplied order, matching scikit-learn’s OrdinalEncoder(categories=...) (sklearn/preprocessing/_encoders.py:114).

At fit time the number of lists must equal the number of input columns, no list may contain duplicates, and (under the default handle_unknown='error') every value seen in the data must appear in its column’s list; otherwise Fit::fit returns an error. See Fit::fit for the exact validation contract.

Source

pub fn categories_param(&self) -> &Categories

Return the configured categories strategy (Categories::Auto or Categories::Explicit).

Named categories_param to avoid colliding with FittedOrdinalEncoder::categories, which returns the learned per-column category lists after fitting.

Source

pub fn with_handle_unknown(self, handle_unknown: HandleUnknown) -> Self

Set the unknown-category strategy (handle_unknown).

With HandleUnknown::UseEncodedValue an unknown_value must also be supplied via with_unknown_value; otherwise Fit::fit returns an error (matching scikit-learn’s validation).

Source

pub fn with_unknown_value(self, unknown_value: f64) -> Self

Set the sentinel written for unknown categories under HandleUnknown::UseEncodedValue. May be f64::NAN.

Setting this while handle_unknown is HandleUnknown::Error causes Fit::fit to return an error (matching scikit-learn’s validation).

Source

pub fn handle_unknown(&self) -> HandleUnknown

Return the configured unknown-category strategy.

Source

pub fn unknown_value(&self) -> Option<f64>

Return the configured unknown-category sentinel, if any.

Source

pub fn with_min_frequency(self, min_frequency: usize) -> Self

Set the minimum-frequency threshold for infrequent grouping (min_frequency, integer count).

At fit time a category whose count in the training data is strictly less than min_frequency is grouped with the other infrequent categories into a single trailing ordinal index n_frequent for that feature (the frequent categories keep ordinal indices 0..n_frequent in their original sorted order), matching scikit-learn’s OrdinalEncoder(min_frequency=...) integer form (sklearn/preprocessing/_encoders.py:1289-1297, _identify_infrequent :295-296 category_count < self.min_frequency).

Unlike crate::OneHotEncoder, the infrequent group collapses to ONE ordinal index (not a one-hot column), so categories_ is unchanged (all categories retained) — only the emitted ordinal code is shared.

SCOPE (R-HONEST-3): only the integer-count form is supported. sklearn also accepts a FLOAT min_frequency interpreted as the fraction min_frequency * n_samples (_encoders.py:1296-1297,:297-299); the float-fraction form is NOT-STARTED here.

Source

pub fn with_max_categories(self, max_categories: usize) -> Self

Set the maximum number of output ordinal codes per feature for infrequent grouping (max_categories).

At fit time, if a feature would otherwise produce more than max_categories distinct ordinal codes, the least-frequent categories are grouped into the single trailing infrequent index so the number of codes is at most max_categories (the infrequent group itself counts toward the limit). Mirrors scikit-learn’s OrdinalEncoder(max_categories=...) (sklearn/preprocessing/_encoders.py:1301-1315, _identify_infrequent :303-315).

Source

pub fn min_frequency(&self) -> Option<usize>

Return the configured minimum-frequency threshold (min_frequency), or None if infrequent grouping by frequency is disabled.

Source

pub fn max_categories(&self) -> Option<usize>

Return the configured maximum ordinal-code limit (max_categories), or None if no limit is imposed.

Trait Implementations§

Source§

impl Clone for OrdinalEncoder

Source§

fn clone(&self) -> OrdinalEncoder

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for OrdinalEncoder

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for OrdinalEncoder

Source§

fn default() -> OrdinalEncoder

Returns the “default value” for a type. Read more
Source§

impl Fit<ArrayBase<OwnedRepr<String>, Dim<[usize; 2]>>, ()> for OrdinalEncoder

Source§

fn fit( &self, x: &Array2<String>, _y: &(), ) -> Result<FittedOrdinalEncoder, FerroError>

Fit the encoder by building per-column category-to-index mappings.

With the default categories='auto' (Categories::Auto), categories are recorded in lexicographic order in each column, matching scikit-learn’s OrdinalEncoder.categories_.

With explicit categories (Categories::Explicit, set via OrdinalEncoder::with_categories), the user-provided lists are used in the given order (NOT re-sorted), and the ordinal indices follow that order, mirroring scikit-learn (sklearn/preprocessing/_encoders.py:114).

§Errors

Returns FerroError::InsufficientSamples if the input has zero rows.

Returns FerroError::ShapeMismatch if explicit categories are set but the number of category lists differs from the number of input columns (sklearn _encoders.py:85-89 “Shape mismatch: if categories is an array, it has to be of shape (n_features,).”).

Returns FerroError::InvalidParameter if an explicit category list contains duplicate elements (sklearn _encoders.py:136-141), or — under the default HandleUnknown::Error — if a value seen in the data is not in its column’s explicit list (sklearn _encoders.py:153-160 “Found unknown categories … during fit”; SKIPPED under HandleUnknown::UseEncodedValue).

Returns FerroError::InvalidParameter for the handle_unknown / unknown_value validation failures (mirroring scikit-learn’s TypeError/ValueError at _encoders.py:1473-1526): selecting HandleUnknown::UseEncodedValue without an unknown_value; setting an unknown_value while in HandleUnknown::Error mode; or an unknown_value that collides with an already-used encoding index.

Source§

type Fitted = FittedOrdinalEncoder

The fitted model type returned by fit.
Source§

type Error = FerroError

The error type returned by fit.
Source§

impl FitTransform<ArrayBase<OwnedRepr<String>, Dim<[usize; 2]>>> for OrdinalEncoder

Source§

fn fit_transform(&self, x: &Array2<String>) -> Result<Array2<f64>, FerroError>

Fit the encoder on x and return the encoded output in one step.

§Errors

Returns an error if fitting or transformation fails.

Source§

type FitError = FerroError

The error type for the combined fit-transform operation.
Source§

impl Transform<ArrayBase<OwnedRepr<String>, Dim<[usize; 2]>>> for OrdinalEncoder

Implement Transform on the unfitted encoder to satisfy the FitTransform: Transform supertrait bound.

Source§

fn transform(&self, _x: &Array2<String>) -> Result<Array2<f64>, FerroError>

Always returns an error — the encoder must be fitted first.

Source§

type Output = ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>

The transformed output type.
Source§

type Error = FerroError

The error type returned by transform.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> ByRef<T> for T

Source§

fn by_ref(&self) -> &T

Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> DistributionExt for T
where T: ?Sized,

Source§

fn rand<T>(&self, rng: &mut (impl Rng + ?Sized)) -> T
where Self: Distribution<T>,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Imply<T> for U
where T: ?Sized, U: ?Sized,

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<SS, SP> SupersetOf<SS> for SP
where SS: SubsetOf<SP>,

Source§

fn to_subset(&self) -> Option<SS>

The inverse inclusion map: attempts to construct self from the equivalent element of its superset. Read more
Source§

fn is_in_subset(&self) -> bool

Checks if self is actually part of its subset T (and can be converted to it).
Source§

fn to_subset_unchecked(&self) -> SS

Use with care! Same as self.to_subset but without any property checks. Always succeeds.
Source§

fn from_subset(element: &SS) -> SP

The inclusion map: converts self to the equivalent element of its superset.
Source§

impl<SS, SP> SupersetOf<SS> for SP
where SS: SubsetOf<SP>,

Source§

fn to_subset(&self) -> Option<SS>

The inverse inclusion map: attempts to construct self from the equivalent element of its superset. Read more
Source§

fn is_in_subset(&self) -> bool

Checks if self is actually part of its subset T (and can be converted to it).
Source§

unsafe fn to_subset_unchecked(&self) -> SS

Use with care! Same as self.to_subset but without any property checks. Always succeeds.
Source§

fn from_subset(element: &SS) -> SP

The inclusion map: converts self to the equivalent element of its superset.
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V