Crate pkboost

Crate pkboost 

Source
Expand description

§PKBoost: Shannon-Guided Gradient Boosting

Crates.io Documentation License: GPL-3.0

PKBoost (Performance-Based Knowledge Booster) is an adaptive gradient boosting library built from scratch in Rust, specifically designed for extreme class imbalance and concept drift scenarios.

§Key Features

  • Extreme Imbalance Handling: Outperforms XGBoost/LightGBM on datasets with <5% minority class
  • Drift Detection & Adaptation: Automatically detects concept drift and triggers model adaptation
  • Shannon Entropy Guidance: Splits optimized using information theory for minority class
  • Auto-Tuning: No hyperparameter tuning required - auto-configures based on data
  • Multi-Task Support: Binary classification, multi-class, and regression
  • Built-in Metrics: PR-AUC, ROC-AUC, F1, RMSE, R², and more

§Quick Start

use pkboost::{OptimizedPKBoostShannon, calculate_pr_auc, calculate_roc_auc};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Your data: Vec<Vec<f64>> for features, Vec<f64> for labels (0.0 or 1.0)
    let x_train: Vec<Vec<f64>> = vec![vec![1.0, 2.0], vec![3.0, 4.0]];
    let y_train: Vec<f64> = vec![0.0, 1.0];
    let x_test: Vec<Vec<f64>> = vec![vec![1.5, 2.5]];
    let y_test: Vec<f64> = vec![0.0];

    // Create model with auto-tuning (recommended)
    let mut model = OptimizedPKBoostShannon::auto(&x_train, &y_train);

    // Train with optional validation set for early stopping
    model.fit(&x_train, &y_train, None, true)?;

    // Predict probabilities
    let predictions = model.predict_proba(&x_test)?;

    // Evaluate
    let pr_auc = calculate_pr_auc(&y_test, &predictions);
    let roc_auc = calculate_roc_auc(&y_test, &predictions);
    println!("PR-AUC: {:.4}, ROC-AUC: {:.4}", pr_auc, roc_auc);

    Ok(())
}

§Multi-Class Classification

use pkboost::MultiClassPKBoost;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let x_train: Vec<Vec<f64>> = vec![/* your data */];
    let y_train: Vec<f64> = vec![0.0, 1.0, 2.0]; // Class labels: 0, 1, 2, ...
    let x_test: Vec<Vec<f64>> = vec![/* test data */];

    // Specify number of classes
    let mut model = MultiClassPKBoost::new(3);
    
    // Train
    model.fit(&x_train, &y_train, None, true)?;

    // Get class probabilities [n_samples, n_classes]
    let probs = model.predict_proba(&x_test)?;
    
    // Or get predicted class indices
    let predictions = model.predict(&x_test)?;

    Ok(())
}

§Regression

use pkboost::{PKBoostRegressor, calculate_rmse, calculate_r2};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let x_train: Vec<Vec<f64>> = vec![/* your data */];
    let y_train: Vec<f64> = vec![/* continuous targets */];
    let x_test: Vec<Vec<f64>> = vec![/* test data */];
    let y_test: Vec<f64> = vec![/* test targets */];

    // Create regressor with auto configuration
    let mut model = PKBoostRegressor::auto(&x_train, &y_train);
    
    // Train
    model.fit(&x_train, &y_train, None, true)?;

    // Predict
    let predictions = model.predict(&x_test)?;

    // Evaluate
    let rmse = calculate_rmse(&y_test, &predictions);
    let r2 = calculate_r2(&y_test, &predictions);
    println!("RMSE: {:.4}, R²: {:.4}", rmse, r2);

    Ok(())
}

§Adaptive Model with Drift Detection

For streaming data or scenarios where data distribution changes over time:

use pkboost::AdversarialLivingBooster;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let x_train: Vec<Vec<f64>> = vec![/* initial training data */];
    let y_train: Vec<f64> = vec![/* initial labels */];

    // Create adaptive model
    let mut model = AdversarialLivingBooster::new(&x_train, &y_train);
    
    // Initial training
    model.fit_initial(&x_train, &y_train, None, true)?;

    // As new data arrives, observe it (model adapts automatically)
    let x_new: Vec<Vec<f64>> = vec![/* new batch */];
    let y_new: Vec<f64> = vec![/* new labels */];
    model.observe_batch(&x_new, &y_new, true)?;

    // Check model state
    println!("Vulnerability score: {:.4}", model.get_vulnerability_score());
    println!("Metamorphosis count: {}", model.get_metamorphosis_count());

    Ok(())
}

§Builder Pattern (Advanced Configuration)

For fine-grained control over hyperparameters:

use pkboost::OptimizedPKBoostShannon;

let model = OptimizedPKBoostShannon::builder()
    .n_estimators(200)
    .learning_rate(0.05)
    .max_depth(6)
    .min_samples_split(10)
    .reg_lambda(1.0)
    .gamma(0.1)
    .subsample(0.8)
    .colsample_bytree(0.8)
    .early_stopping_rounds(20)
    .histogram_bins(32)
    .mi_weight(0.1)           // Mutual information weight for imbalance
    .scale_pos_weight(5.0)    // Weight for positive class
    .build();

§Core Types

TypeDescription
OptimizedPKBoostShannonBinary classification with Shannon entropy guidance
MultiClassPKBoostMulti-class classification via One-vs-Rest
PKBoostRegressorRegression with MSE, Huber, or Poisson loss
AdversarialLivingBoosterAdaptive model with drift detection

§Metrics

FunctionDescription
calculate_pr_aucPrecision-Recall AUC (best for imbalanced data)
calculate_roc_aucReceiver Operating Characteristic AUC
calculate_rmseRoot Mean Squared Error
calculate_maeMean Absolute Error
calculate_r2R² coefficient of determination

§Model Serialization

PKBoost models implement serde::Serialize and serde::Deserialize:

use pkboost::OptimizedPKBoostShannon;

// Save model
let model = OptimizedPKBoostShannon::auto(&x_train, &y_train);
let json = serde_json::to_string(&model)?;
std::fs::write("model.json", json)?;

// Load model
let json = std::fs::read_to_string("model.json")?;
let model: OptimizedPKBoostShannon = serde_json::from_str(&json)?;

§When to Use PKBoost

✅ Good fit:

  • Extreme class imbalance (<5% minority class)
  • Fraud detection, anomaly detection, rare event prediction
  • Data that evolves over time (concept drift)
  • When you want good results without hyperparameter tuning

❌ Consider alternatives for:

  • Perfectly balanced datasets (XGBoost may be faster)
  • Very small datasets (<1,000 samples)

§Author

Pushp Kharat - GitHub

§License

This project is licensed under the GPL-3.0 License.

Re-exports§

pub use adversarial::AdversarialEnsemble;
pub use auto_params::auto_params;
pub use auto_params::AutoHyperParams;
pub use auto_params::DataStats;
pub use histogram_builder::OptimizedHistogramBuilder;
pub use huber_loss::HuberLoss;
pub use living_booster::AdversarialLivingBooster;
pub use living_regressor::AdaptiveRegressor;
pub use living_regressor::SystemState;
pub use loss::LossType;
pub use loss::MSELoss;
pub use loss::OptimizedShannonLoss;
pub use loss::PoissonLoss;
pub use metabolism::FeatureMetabolism;
pub use metrics::calculate_pr_auc;
pub use metrics::calculate_roc_auc;
pub use metrics::calculate_shannon_entropy;
pub use model::OptimizedPKBoostShannon;
pub use multiclass::MultiClassPKBoost;
pub use optimized_data::CachedHistogram;
pub use optimized_data::TransposedData;
pub use partitioned_classifier::PartitionConfig;
pub use partitioned_classifier::PartitionMethod;
pub use partitioned_classifier::PartitionedClassifier;
pub use partitioned_classifier::PartitionedClassifierBuilder;
pub use partitioned_classifier::TaskType;
pub use precision::AdaptiveCompute;
pub use precision::PrecisionLevel;
pub use precision::ProgressiveBuffer;
pub use precision::ProgressivePrecision;
pub use regression::calculate_mad;
pub use regression::calculate_mae;
pub use regression::calculate_r2;
pub use regression::calculate_rmse;
pub use regression::detect_outliers;
pub use regression::MSELoss as RegressionMSELoss;
pub use regression::PKBoostRegressor;
pub use regression::RegressionLossType;
pub use tree::HistSplitResult;
pub use tree::OptimizedTreeShannon;
pub use tree::TreeParams;
pub use constants::*;

Modules§

adaptive_parallel
adversarial
auto_params
auto_tuner
constants
fork_parallel
histogram_builder
huber_loss
living_booster
living_regressor
loss
metabolism
metrics
model
multiclass
optimized_data
partitioned_classifier
precision
python_bindings
regression
tree
tree_regression