1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
//! Divergence audit (REQ-5, #1160): `OrdinalEncoder` `handle_unknown='use_encoded_value'`
//! non-integer `unknown_value` validation vs scikit-learn 1.5.2.
//!
//! sklearn `sklearn/preprocessing/_encoders.py:1481-1487` (`OrdinalEncoder.fit`):
//! ```text
//! elif not isinstance(self.unknown_value, numbers.Integral):
//! raise TypeError(
//! "unknown_value should be an integer or "
//! "np.nan when "
//! "handle_unknown is 'use_encoded_value', "
//! f"got {self.unknown_value}."
//! )
//! ```
//! This `isinstance(.., Integral)` guard fires BEFORE the range/collision check
//! (`:1518-1526`), so ANY non-integer, non-nan `unknown_value` (e.g. `1.5`,
//! `2.5`, `-1.5`, even out-of-range `100.5`) raises `TypeError` — regardless of
//! magnitude or whether it would collide with an encoding index.
//!
//! ferrolearn (`ferrolearn-preprocess/src/ordinal_encoder.rs:329-345`) takes an
//! `f64` `unknown_value` and only runs its collision branch when
//! `v.fract() == 0.0`; a non-integer float skips ALL validation and `fit`
//! returns `Ok`. There is no analog of sklearn's `:1481` integrality rejection,
//! so ferrolearn OVER-ACCEPTS (R-DEV-2 over-rejection's mirror: over-acceptance).
//!
//! LIVE sklearn 1.5.2 oracle (run from /tmp):
//! ```text
//! $ python3 -c "import numpy as np; from sklearn.preprocessing import OrdinalEncoder
//! for v in [1.5, 2.5, -1.5, 100.5]:
//! try:
//! OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=v)\
//! .fit([['cat'],['dog'],['cat']]); print(repr(v),'OK')
//! except Exception as e: print(repr(v), type(e).__name__)"
//! -> 1.5 TypeError
//! 2.5 TypeError # out-of-range [0,2) and still TypeError (Integral check first)
//! -1.5 TypeError
//! 100.5 TypeError
//! ```
//! Expected: every non-integer non-nan `unknown_value` makes `fit` return `Err`.
//! Actual (ferrolearn): `fit` returns `Ok` for all of them.
//!
//! Tracking: #2221
use Fit;
use ;
use Array2;
/// Divergence: ferrolearn's `OrdinalEncoder::fit` diverges from
/// `sklearn/preprocessing/_encoders.py:1481` for a non-integer, non-nan
/// `unknown_value` under `handle_unknown='use_encoded_value'`.
/// sklearn raises `TypeError` ("unknown_value should be an integer or np.nan");
/// ferrolearn returns `Ok` because its only check is `v.fract() == 0.0` for the
/// range/collision branch and it has no integrality guard.
/// Tracking: #2221