pub struct FittedOrdinalEncoder { /* private fields */ }Expand description
A fitted ordinal encoder holding per-column category-to-index mappings.
Created by calling Fit::fit on an OrdinalEncoder.
Implementations§
Source§impl FittedOrdinalEncoder
impl FittedOrdinalEncoder
Sourcepub fn categories(&self) -> &[Vec<String>]
pub fn categories(&self) -> &[Vec<String>]
Return the ordered category list for each column.
categories()[j][i] is the category that maps to integer i in column j.
Sourcepub fn infrequent_categories(&self) -> Vec<Vec<String>>
pub fn infrequent_categories(&self) -> Vec<Vec<String>>
Return the infrequent category values for each feature
(infrequent_categories_).
infrequent_categories()[j] is the sorted list of category values from
categories[j] that were grouped into the single trailing “infrequent”
ordinal code (because their training count fell below min_frequency
and/or beyond the max_categories limit). An EMPTY inner Vec means
feature j had no infrequent categories (scikit-learn returns None
there; an empty list is the representable equivalent). With infrequent
grouping disabled every entry is empty. Mirrors scikit-learn’s
OrdinalEncoder.infrequent_categories_ (_encoders.py:255-262):
category[indices] over _infrequent_indices.
Sourcepub fn n_features(&self) -> usize
pub fn n_features(&self) -> usize
Return the number of input columns (features).
Sourcepub fn n_features_in(&self) -> usize
pub fn n_features_in(&self) -> usize
Return the number of features seen during fit.
Mirrors scikit-learn’s n_features_in_ attribute (set by _validate_data
at fit, sklearn/base.py). Equal to n_features; the
distinct name matches sklearn’s fitted-attribute surface (REQ-10).
Sourcepub fn get_feature_names_out(
&self,
input_features: Option<&[String]>,
) -> Result<Vec<String>, FerroError>
pub fn get_feature_names_out( &self, input_features: Option<&[String]>, ) -> Result<Vec<String>, FerroError>
Return the output feature names, one per input feature.
OrdinalEncoder is a OneToOneFeatureMixin (one output column per input
column), so get_feature_names_out returns the INPUT feature names
unchanged (sklearn/utils/_set_output / OneToOneFeatureMixin. get_feature_names_out): with input_features = None the default names
["x0", "x1", ...] (_check_feature_names_in), otherwise the supplied
names verbatim.
§Errors
Returns FerroError::ShapeMismatch if input_features is Some but its
length differs from n_features_in (sklearn raises
ValueError("input_features should have length equal to number of features ...")).
Sourcepub fn handle_unknown(&self) -> HandleUnknown
pub fn handle_unknown(&self) -> HandleUnknown
Return the configured unknown-category strategy.
Sourcepub fn unknown_value(&self) -> Option<f64>
pub fn unknown_value(&self) -> Option<f64>
Return the configured unknown-category sentinel, if any.
Sourcepub fn inverse_transform(
&self,
x: &Array2<f64>,
) -> Result<Array2<String>, FerroError>
pub fn inverse_transform( &self, x: &Array2<f64>, ) -> Result<Array2<String>, FerroError>
Convert ordinal indices back to the original category strings.
This is the inverse of Transform::transform: each f64 cell is read
as an ordinal index into the per-column categories_ learned at fit
time, and the corresponding category string is returned. Reusing the
SHIPPED categories_ (REQ-1), inverse_transform(transform(X)) == X for
any X whose every category was seen during fit (a bit-exact roundtrip
on the default Error-mode encoder). Mirrors scikit-learn’s
OrdinalEncoder.inverse_transform (sklearn/preprocessing/_encoders.py:1595),
X_tr[:, i] = self.categories_[i][labels].
§Index contract (faithful to sklearn / numpy)
Mirrors sklearn’s labels.astype("int64") (_encoders.py:1664) followed
by numpy fancy indexing categories_[j][labels] (:1679):
- truncates non-integers toward zero (
1.5→ index1→ that category;0.7→0) — Rustf64 as i64matches the C-style cast. - wraps small negatives via numpy negative indexing (
-1.0→categories_[j][len-1], the LAST category;-2.0→len-2), raising only once the wrapped index still leaves[0, len)(-3.0with 2 categories →IndexError). - errors on an out-of-range positive ordinal (
9.0with 2 categories → sklearnIndexError) and on a non-finite cell (NaN/±inf overflow theastype("int64")cast → sklearnIndexError/ValueError; guarded explicitly because Rust’sf64 as i64saturates NaN→0, which would diverge).
The roundtrip, held-out valid-ordinal, truncation, and negative-wrap paths all match sklearn; out-of-range / non-finite both error.
§use_encoded_value → None (SCOPE LIMITATION, R-HONEST-3)
With HandleUnknown::UseEncodedValue, sklearn maps a cell equal to
unknown_value back to None (_encoders.py:1673,
X_tr[mask, idx] = None). ferrolearn’s Array2<String> output container
cannot represent None (it would require Array2<Option<String>>).
The configured unknown_value is itself out of the valid [0, len)
range (e.g. -1), so such a cell hits the out-of-range error path: this
inverse therefore ERRORS where sklearn returns [[None, ...]]. This is a
documented divergence, not a silent wrong-string — the honest behavior is
to error rather than fabricate a category. The default Error-mode
encoder produces only valid ordinals, so its inverse is COMPLETE and
bit-exact.
§Errors
Returns FerroError::InsufficientSamples if the input has zero rows
(symmetry with transform’s #2220 guard and sklearn’s check_array).
Returns FerroError::ShapeMismatch if the number of columns does not
match the number of features seen during fitting (sklearn’s
_encoders.py:1619 “Shape of the passed X data is not correct”).
Returns FerroError::InvalidParameter if any cell is not an exact
non-negative integer in [0, categories_[j].len()) (sklearn’s
IndexError, plus the strict negative/non-integer contract above).
Trait Implementations§
Source§impl Clone for FittedOrdinalEncoder
impl Clone for FittedOrdinalEncoder
Source§fn clone(&self) -> FittedOrdinalEncoder
fn clone(&self) -> FittedOrdinalEncoder
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for FittedOrdinalEncoder
impl Debug for FittedOrdinalEncoder
Source§impl Transform<ArrayBase<OwnedRepr<String>, Dim<[usize; 2]>>> for FittedOrdinalEncoder
impl Transform<ArrayBase<OwnedRepr<String>, Dim<[usize; 2]>>> for FittedOrdinalEncoder
Source§fn transform(&self, x: &Array2<String>) -> Result<Array2<f64>, FerroError>
fn transform(&self, x: &Array2<String>) -> Result<Array2<f64>, FerroError>
Transform string categories to ordinal indices, returned as f64.
Each cell is the (lexicographic) category index cast to f64. The
ordinal VALUES are unchanged from the integer mapping; only the output
container dtype is f64, matching scikit-learn’s
OrdinalEncoder(dtype=np.float64) default
(sklearn/preprocessing/_encoders.py:1262). A configurable non-float64
output dtype (e.g. int32) is OUT OF SCOPE here — ferrolearn’s output is
the fixed sklearn DEFAULT f64; a dtype param is a follow-on design
(blocker #1158). f64 exactly represents every integer up to 2^53, so
the cast is lossless for any realistic category count.
§Errors
Returns FerroError::ShapeMismatch if the number of columns does not
match the number of features seen during fitting.
Returns FerroError::InvalidParameter if any category was not seen
during fitting AND handle_unknown is HandleUnknown::Error (the
default). Under HandleUnknown::UseEncodedValue, unknown categories
are instead encoded as the configured unknown_value sentinel (which may
be f64::NAN), matching sklearn _encoders.py:1591.
Source§type Error = FerroError
type Error = FerroError
transform.Auto Trait Implementations§
impl Freeze for FittedOrdinalEncoder
impl RefUnwindSafe for FittedOrdinalEncoder
impl Send for FittedOrdinalEncoder
impl Sync for FittedOrdinalEncoder
impl Unpin for FittedOrdinalEncoder
impl UnsafeUnpin for FittedOrdinalEncoder
impl UnwindSafe for FittedOrdinalEncoder
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T, U> Imply<T> for U
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§unsafe fn to_subset_unchecked(&self) -> SS
unsafe fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.