ferrolearn-python 0.3.0

Python bindings for ferrolearn via PyO3
Documentation

ferrolearn-python

Python bindings for the ferrolearn machine learning framework, built with PyO3. Provides a scikit-learn compatible API backed by Rust for performance.

Validated against scikit-learn 1.8.0 head-to-head with 144 paired measurements across all 54 exposed estimators — see the workspace BENCHMARKS.md for the full report.

Installation

pip install ferrolearn

Or for development from this repo:

cd ferrolearn-python
maturin develop --release

Available estimators (54 total)

Regressors (17)

LinearRegression, Ridge, Lasso, ElasticNet, BayesianRidge, ARDRegression, HuberRegressor, QuantileRegressor, DecisionTreeRegressor, RandomForestRegressor, ExtraTreesRegressor, GradientBoostingRegressor, HistGradientBoostingRegressor, KNeighborsRegressor, KernelRidge.

Classifiers (19)

LogisticRegression, RidgeClassifier, LinearSVC, QuadraticDiscriminantAnalysis, GaussianNB, MultinomialNB, BernoulliNB, ComplementNB, DecisionTreeClassifier, ExtraTreeClassifier, RandomForestClassifier, ExtraTreesClassifier, AdaBoostClassifier, BaggingClassifier, GradientBoostingClassifier, HistGradientBoostingClassifier, KNeighborsClassifier, NearestCentroid.

Clusterers (6)

KMeans, MiniBatchKMeans, DBSCAN, AgglomerativeClustering, Birch, GaussianMixture.

Decomposition / dimensionality reduction (8)

PCA, IncrementalPCA, TruncatedSVD, FastICA, NMF, KernelPCA, SparsePCA, FactorAnalysis.

Preprocessing & kernel approximation (6)

StandardScaler, MinMaxScaler, MaxAbsScaler, RobustScaler, PowerTransformer, Nystroem, RBFSampler.

Example

from ferrolearn import Ridge, RandomForestClassifier, KMeans, GaussianMixture
import numpy as np

# Regression
X = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
y = np.array([1.0, 2.0, 3.0])
model = Ridge(alpha=1.0)
model.fit(X, y)
print(model.predict(X))

# Classification
X = np.random.randn(200, 5)
y = (X.sum(axis=1) > 0).astype(int)
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X, y)
print(clf.score(X, y))

# Clustering
gmm = GaussianMixture(n_components=3, random_state=42)
labels = gmm.fit(X).predict(X)

All wrappers inherit from scikit-learn's BaseEstimator and the appropriate mixin (RegressorMixin, ClassifierMixin, ClusterMixin, TransformerMixin), so score(), fit_transform(), pipeline composition, and cross_val_score all work out of the box.

Performance & parity

Geomean speedups vs scikit-learn 1.8.0 across the 144-row head-to-head bench:

Family n compared fit geomean predict geomean mean Δ score
regressor 43 8.21× 4.39× -0.0006 R²
classifier 51 6.75× 8.88× +0.0035 accuracy
cluster 15 1.35× +0.0000 ARI (perfect parity)
decomp 15 5.16× 4.56×
preprocess 14 9.82× 2.74×

Reproduce with:

maturin develop --release
python ferrolearn-bench/head_to_head_full.py > h2h.json
python ferrolearn-bench/render_head_to_head.py h2h.json > REPORT.md

License

Licensed under either of Apache License, Version 2.0 or MIT License at your option.