Struct FittedOrdinalEncoder

Source

pub struct FittedOrdinalEncoder { /* private fields */ }

Expand description

A fitted ordinal encoder holding per-column category-to-index mappings.

Created by calling Fit::fit on an OrdinalEncoder.

Implementations§

Source §

impl FittedOrdinalEncoder

Source

pub fn categories(&self) -> &[Vec<String>]

Return the ordered category list for each column.

categories()[j][i] is the category that maps to integer i in column j.

Source

pub fn infrequent_categories(&self) -> Vec<Vec<String>>

Return the infrequent category values for each feature (infrequent_categories_).

infrequent_categories()[j] is the sorted list of category values from categories[j] that were grouped into the single trailing “infrequent” ordinal code (because their training count fell below min_frequency and/or beyond the max_categories limit). An EMPTY inner Vec means feature j had no infrequent categories (scikit-learn returns None there; an empty list is the representable equivalent). With infrequent grouping disabled every entry is empty. Mirrors scikit-learn’s OrdinalEncoder.infrequent_categories_ (_encoders.py:255-262): category[indices] over _infrequent_indices.

Source

pub fn n_features(&self) -> usize

Return the number of input columns (features).

Source

pub fn n_features_in(&self) -> usize

Return the number of features seen during fit.

Mirrors scikit-learn’s n_features_in_ attribute (set by _validate_data at fit, sklearn/base.py). Equal to n_features; the distinct name matches sklearn’s fitted-attribute surface (REQ-10).

Source

pub fn get_feature_names_out( &self, input_features: Option<&[String]>, ) -> Result<Vec<String>, FerroError>

Return the output feature names, one per input feature.

OrdinalEncoder is a OneToOneFeatureMixin (one output column per input column), so get_feature_names_out returns the INPUT feature names unchanged (sklearn/utils/_set_output / OneToOneFeatureMixin. get_feature_names_out): with input_features = None the default names ["x0", "x1", ...] (_check_feature_names_in), otherwise the supplied names verbatim.

§Errors

Returns FerroError::ShapeMismatch if input_features is Some but its length differs from n_features_in (sklearn raises ValueError("input_features should have length equal to number of features ...")).

Source

pub fn handle_unknown(&self) -> HandleUnknown

Return the configured unknown-category strategy.

Source

pub fn unknown_value(&self) -> Option<f64>

Return the configured unknown-category sentinel, if any.

Source

pub fn inverse_transform( &self, x: &Array2<f64>, ) -> Result<Array2<String>, FerroError>

Convert ordinal indices back to the original category strings.

This is the inverse of Transform::transform: each f64 cell is read as an ordinal index into the per-column categories_ learned at fit time, and the corresponding category string is returned. Reusing the SHIPPED categories_ (REQ-1), inverse_transform(transform(X)) == X for any X whose every category was seen during fit (a bit-exact roundtrip on the default Error-mode encoder). Mirrors scikit-learn’s OrdinalEncoder.inverse_transform (sklearn/preprocessing/_encoders.py:1595), X_tr[:, i] = self.categories_[i][labels].

§Index contract (faithful to sklearn / numpy)

Mirrors sklearn’s labels.astype("int64") (_encoders.py:1664) followed by numpy fancy indexing categories_[j][labels] (:1679):

truncates non-integers toward zero (1.5 → index 1 → that category; 0.7 → 0) — Rust f64 as i64 matches the C-style cast.
wraps small negatives via numpy negative indexing (-1.0 → categories_[j][len-1], the LAST category; -2.0 → len-2), raising only once the wrapped index still leaves [0, len) (-3.0 with 2 categories → IndexError).
errors on an out-of-range positive ordinal (9.0 with 2 categories → sklearn IndexError) and on a non-finite cell (NaN/±inf overflow the astype("int64") cast → sklearn IndexError/ValueError; guarded explicitly because Rust’s f64 as i64 saturates NaN→0, which would diverge).

The roundtrip, held-out valid-ordinal, truncation, and negative-wrap paths all match sklearn; out-of-range / non-finite both error.

§`use_encoded_value` → `None` (SCOPE LIMITATION, R-HONEST-3)

With HandleUnknown::UseEncodedValue, sklearn maps a cell equal to unknown_value back to None (_encoders.py:1673, X_tr[mask, idx] = None). ferrolearn’s Array2<String> output container cannot represent None (it would require Array2<Option<String>>). The configured unknown_value is itself out of the valid [0, len) range (e.g. -1), so such a cell hits the out-of-range error path: this inverse therefore ERRORS where sklearn returns [[None, ...]]. This is a documented divergence, not a silent wrong-string — the honest behavior is to error rather than fabricate a category. The default Error-mode encoder produces only valid ordinals, so its inverse is COMPLETE and bit-exact.

§Errors

Returns FerroError::InsufficientSamples if the input has zero rows (symmetry with transform’s #2220 guard and sklearn’s check_array).

Returns FerroError::ShapeMismatch if the number of columns does not match the number of features seen during fitting (sklearn’s _encoders.py:1619 “Shape of the passed X data is not correct”).

Returns FerroError::InvalidParameter if any cell is not an exact non-negative integer in [0, categories_[j].len()) (sklearn’s IndexError, plus the strict negative/non-integer contract above).

Trait Implementations§

Source §

impl Clone for FittedOrdinalEncoder

Source §

fn clone(&self) -> FittedOrdinalEncoder

Returns a duplicate of the value. Read more

1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Source §

impl Debug for FittedOrdinalEncoder

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Source §

impl Transform<ArrayBase<OwnedRepr<String>, Dim<[usize; 2]>>> for FittedOrdinalEncoder

Source §

fn transform(&self, x: &Array2<String>) -> Result<Array2<f64>, FerroError>

Transform string categories to ordinal indices, returned as f64.

Each cell is the (lexicographic) category index cast to f64. The ordinal VALUES are unchanged from the integer mapping; only the output container dtype is f64, matching scikit-learn’s OrdinalEncoder(dtype=np.float64) default (sklearn/preprocessing/_encoders.py:1262). A configurable non-float64 output dtype (e.g. int32) is OUT OF SCOPE here — ferrolearn’s output is the fixed sklearn DEFAULT f64; a dtype param is a follow-on design (blocker #1158). f64 exactly represents every integer up to 2^53, so the cast is lossless for any realistic category count.

§Errors

Returns FerroError::ShapeMismatch if the number of columns does not match the number of features seen during fitting.

Returns FerroError::InvalidParameter if any category was not seen during fitting AND handle_unknown is HandleUnknown::Error (the default). Under HandleUnknown::UseEncodedValue, unknown categories are instead encoded as the configured unknown_value sentinel (which may be f64::NAN), matching sklearn _encoders.py:1591.

Source §

type Output = ArrayBase<OwnedRepr<f64>, Dim<[usize; 2]>>

The transformed output type.

Source §

type Error = FerroError

The error type returned by transform.

Auto Trait Implementations§

§

impl UnwindSafe for FittedOrdinalEncoder

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> ByRef<T> for T

Source §

fn by_ref(&self) -> &T

Source §

impl<T> CloneToUninit for T
where T: Clone,

Source §

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

Source §

impl<T> DistributionExt for T
where T: ?Sized,

Source §

fn rand<T>(&self, rng: &mut (impl Rng + ?Sized)) -> T
where Self: Distribution<T>,

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T, U> Imply<T> for U
where T: ?Sized, U: ?Sized,

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §