Skip to main content

FittedOrdinalEncoder

Struct FittedOrdinalEncoder 

Source
pub struct FittedOrdinalEncoder { /* private fields */ }
Expand description

A fitted ordinal encoder holding per-column category-to-index mappings.

Created by calling Fit::fit on an OrdinalEncoder.

Implementations§

Source§

impl FittedOrdinalEncoder

Source

pub fn categories(&self) -> &[Vec<String>]

Return the ordered category list for each column.

categories()[j][i] is the category that maps to integer i in column j.

Source

pub fn infrequent_categories(&self) -> Vec<Vec<String>>

Return the infrequent category values for each feature (infrequent_categories_).

infrequent_categories()[j] is the sorted list of category values from categories[j] that were grouped into the single trailing “infrequent” ordinal code (because their training count fell below min_frequency and/or beyond the max_categories limit). An EMPTY inner Vec means feature j had no infrequent categories (scikit-learn returns None there; an empty list is the representable equivalent). With infrequent grouping disabled every entry is empty. Mirrors scikit-learn’s OrdinalEncoder.infrequent_categories_ (_encoders.py:255-262): category[indices] over _infrequent_indices.

Source

pub fn n_features(&self) -> usize

Return the number of input columns (features).

Source

pub fn n_features_in(&self) -> usize

Return the number of features seen during fit.

Mirrors scikit-learn’s n_features_in_ attribute (set by _validate_data at fit, sklearn/base.py). Equal to n_features; the distinct name matches sklearn’s fitted-attribute surface (REQ-10).

Source

pub fn get_feature_names_out( &self, input_features: Option<&[String]>, ) -> Result<Vec<String>, FerroError>

Return the output feature names, one per input feature.

OrdinalEncoder is a OneToOneFeatureMixin (one output column per input column), so get_feature_names_out returns the INPUT feature names unchanged (sklearn/utils/_set_output / OneToOneFeatureMixin. get_feature_names_out): with input_features = None the default names ["x0", "x1", ...] (_check_feature_names_in), otherwise the supplied names verbatim.

§Errors

Returns FerroError::ShapeMismatch if input_features is Some but its length differs from n_features_in (sklearn raises ValueError("input_features should have length equal to number of features ...")).

Source

pub fn handle_unknown(&self) -> HandleUnknown

Return the configured unknown-category strategy.

Source

pub fn unknown_value(&self) -> Option<f64>

Return the configured unknown-category sentinel, if any.

Source

pub fn inverse_transform( &self, x: &Array2<f64>, ) -> Result<Array2<String>, FerroError>

Convert ordinal indices back to the original category strings.

This is the inverse of Transform::transform: each f64 cell is read as an ordinal index into the per-column categories_ learned at fit time, and the corresponding category string is returned. Reusing the SHIPPED categories_ (REQ-1), inverse_transform(transform(X)) == X for any X whose every category was seen during fit (a bit-exact roundtrip on the default Error-mode encoder). Mirrors scikit-learn’s OrdinalEncoder.inverse_transform (sklearn/preprocessing/_encoders.py:1595), X_tr[:, i] = self.categories_[i][labels].

§Index contract (faithful to sklearn / numpy)

Mirrors sklearn’s labels.astype("int64") (_encoders.py:1664) followed by numpy fancy indexing categories_[j][labels] (:1679):

  • truncates non-integers toward zero (1.5 → index 1 → that category; 0.70) — Rust f64 as i64 matches the C-style cast.
  • wraps small negatives via numpy negative indexing (-1.0categories_[j][len-1], the LAST category; -2.0len-2), raising only once the wrapped index still leaves [0, len) (-3.0 with 2 categories → IndexError).
  • errors on an out-of-range positive ordinal (9.0 with 2 categories → sklearn IndexError) and on a non-finite cell (NaN/±inf overflow the astype("int64") cast → sklearn IndexError/ValueError; guarded explicitly because Rust’s f64 as i64 saturates NaN→0, which would diverge).

The roundtrip, held-out valid-ordinal, truncation, and negative-wrap paths all match sklearn; out-of-range / non-finite both error.

§use_encoded_valueNone (SCOPE LIMITATION, R-HONEST-3)

With HandleUnknown::UseEncodedValue, sklearn maps a cell equal to unknown_value back to None (_encoders.py:1673, X_tr[mask, idx] = None). ferrolearn’s Array2<String> output container cannot represent None (it would require Array2<Option<String>>). The configured unknown_value is itself out of the valid [0, len) range (e.g. -1), so such a cell hits the out-of-range error path: this inverse therefore ERRORS where sklearn returns [[None, ...]]. This is a documented divergence, not a silent wrong-string — the honest behavior is to error rather than fabricate a category. The default Error-mode encoder produces only valid ordinals, so its inverse is COMPLETE and bit-exact.

§Errors

Returns FerroError::InsufficientSamples if the input has zero rows (symmetry with transform’s #2220 guard and sklearn’s check_array).

Returns FerroError::ShapeMismatch if the number of columns does not match the number of features seen during fitting (sklearn’s _encoders.py:1619 “Shape of the passed X data is not correct”).

Returns FerroError::InvalidParameter if any cell is not an exact non-negative integer in [0, categories_[j].len()) (sklearn’s IndexError, plus the strict negative/non-integer contract above).

Trait Implementations§

Source§

impl Clone for FittedOrdinalEncoder

Source§

fn clone(&self) -> FittedOrdinalEncoder

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for FittedOrdinalEncoder

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Transform<ArrayBase<OwnedRepr<String>, Dim<[usize; 2]>>> for FittedOrdinalEncoder

Source§

fn transform(&self, x: &Array2<String>) -> Result<Array2<f64>, FerroError>

Transform string categories to ordinal indices, returned as f64.

Each cell is the (lexicographic) category index cast to f64. The ordinal VALUES are unchanged from the integer mapping; only the output container dtype is f64, matching scikit-learn’s OrdinalEncoder(dtype=np.float64) default (sklearn/preprocessing/_encoders.py:1262). A configurable non-float64 output dtype (e.g. int32) is OUT OF SCOPE here — ferrolearn’s output is the fixed sklearn DEFAULT f64; a dtype param is a follow-on design (blocker #1158). f64 exactly represents every integer up to 2^53, so the cast is lossless for any realistic category count.

§Errors

Returns FerroError::ShapeMismatch if the number of columns does not match the number of features seen during fitting.

Returns FerroError::InvalidParameter if any category was not seen during fitting AND handle_unknown is HandleUnknown::Error (the default). Under HandleUnknown::UseEncodedValue, unknown categories are instead encoded as the configured unknown_value sentinel (which may be f64::NAN), matching sklearn _encoders.py:1591.

Source§

type Output = ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>

The transformed output type.
Source§

type Error = FerroError

The error type returned by transform.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> ByRef<T> for T

Source§

fn by_ref(&self) -> &T

Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> DistributionExt for T
where T: ?Sized,

Source§

fn rand<T>(&self, rng: &mut (impl Rng + ?Sized)) -> T
where Self: Distribution<T>,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Imply<T> for U
where T: ?Sized, U: ?Sized,

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<SS, SP> SupersetOf<SS> for SP
where SS: SubsetOf<SP>,

Source§

fn to_subset(&self) -> Option<SS>

The inverse inclusion map: attempts to construct self from the equivalent element of its superset. Read more
Source§

fn is_in_subset(&self) -> bool

Checks if self is actually part of its subset T (and can be converted to it).
Source§

fn to_subset_unchecked(&self) -> SS

Use with care! Same as self.to_subset but without any property checks. Always succeeds.
Source§

fn from_subset(element: &SS) -> SP

The inclusion map: converts self to the equivalent element of its superset.
Source§

impl<SS, SP> SupersetOf<SS> for SP
where SS: SubsetOf<SP>,

Source§

fn to_subset(&self) -> Option<SS>

The inverse inclusion map: attempts to construct self from the equivalent element of its superset. Read more
Source§

fn is_in_subset(&self) -> bool

Checks if self is actually part of its subset T (and can be converted to it).
Source§

unsafe fn to_subset_unchecked(&self) -> SS

Use with care! Same as self.to_subset but without any property checks. Always succeeds.
Source§

fn from_subset(element: &SS) -> SP

The inclusion map: converts self to the equivalent element of its superset.
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V