Skip to main content

Module math

Module math 

Source
Expand description

Module math contains mathematical utility functions for statistical operations and model evaluation.

This module provides comprehensive mathematical functions essential for machine learning algorithms, including impurity measures for decision trees, distance calculations for clustering algorithms, statistical measures for evaluation, and various mathematical utilities for data processing.

§Core Functions

§Decision Tree Mathematics

  • entropy - Calculates the entropy of a label set for information-based splitting
  • gini - Calculates the Gini impurity for CART-based splitting
  • information_gain - Measures information gained from dataset splitting
  • gain_ratio - Normalized information gain for C4.5 algorithm
  • c - Calculates the average path length adjustment factor for isolation trees

§Distance Calculations

  • squared_euclidean_distance_row - Squared Euclidean distance between two vectors
  • manhattan_distance_row - Manhattan (L1) distance between two vectors
  • minkowski_distance_row - Generalized Minkowski distance with parameter p
  • Finds the appropriate sigma value for a single sample’s distances to achieve target perplexity

§Statistical Functions

  • sum_of_square_total - Total variability measurement (SST)
  • sum_of_squared_errors - Sum of squared prediction errors (SSE)
  • variance - Mean squared error or variance of a dataset
  • standard_deviation - Population standard deviation calculation
  • average_path_length_factor - Adjustment factor for isolation forest algorithms

§Activation and Loss Functions

  • sigmoid - Sigmoid activation function for neural networks and logistic regression
  • logistic_loss - Cross-entropy loss for binary classification

§Example

use rustyml::math::{entropy, gini, sigmoid, squared_euclidean_distance_row};
use ndarray::array;

// Decision tree impurity measures
let labels = array![0.0, 1.0, 1.0, 0.0];
let ent = entropy(&labels);
let gini_val = gini(&labels);

// Distance calculations
let v1 = array![1.0, 2.0];
let v2 = array![4.0, 6.0];
let dist = squared_euclidean_distance_row(&v1, &v2);

// Activation function
let activated = sigmoid(0.5);

Functions§

average_path_length_factor
Calculates the average path length adjustment factor for isolation trees.
binary_search_sigma
Finds the sigma value that matches a target perplexity for distance-derived probabilities.
entropy
Calculates the entropy of a label set.
gain_ratio
Calculates the gain ratio for a dataset split.
gini
Calculates the Gini impurity of a label set.
information_gain
Calculates the information gain when splitting a dataset.
logistic_loss
Calculates the logistic regression loss (log loss).
manhattan_distance_row
Calculates the Manhattan (L1) distance between two vectors.
minkowski_distance_row
Calculates the Minkowski distance between two vectors.
sigmoid
Computes the logistic sigmoid for a scalar input.
squared_euclidean_distance_row
Calculates the squared Euclidean distance between two vectors.
standard_deviation
Calculates the population standard deviation of a set of values.
sum_of_square_total
Calculates the total sum of squares (SST).
sum_of_squared_errors
Calculates the sum of squared errors (SSE).
variance
Calculates the mean squared error (variance) of a set of values.