Expand description
Module math contains mathematical utility functions for statistical operations and model evaluation.
This module provides comprehensive mathematical functions essential for machine learning algorithms, including impurity measures for decision trees, distance calculations for clustering algorithms, statistical measures for evaluation, and various mathematical utilities for data processing.
§Core Functions
§Decision Tree Mathematics
entropy- Calculates the entropy of a label set for information-based splittinggini- Calculates the Gini impurity for CART-based splittinginformation_gain- Measures information gained from dataset splittinggain_ratio- Normalized information gain for C4.5 algorithmc- Calculates the average path length adjustment factor for isolation trees
§Distance Calculations
squared_euclidean_distance_row- Squared Euclidean distance between two vectorsmanhattan_distance_row- Manhattan (L1) distance between two vectorsminkowski_distance_row- Generalized Minkowski distance with parameter p- Finds the appropriate sigma value for a single sample’s distances to achieve target perplexity
§Statistical Functions
sum_of_square_total- Total variability measurement (SST)sum_of_squared_errors- Sum of squared prediction errors (SSE)variance- Mean squared error or variance of a datasetstandard_deviation- Population standard deviation calculationaverage_path_length_factor- Adjustment factor for isolation forest algorithms
§Activation and Loss Functions
sigmoid- Sigmoid activation function for neural networks and logistic regressionlogistic_loss- Cross-entropy loss for binary classification
§Example
use rustyml::math::{entropy, gini, sigmoid, squared_euclidean_distance_row};
use ndarray::array;
// Decision tree impurity measures
let labels = array![0.0, 1.0, 1.0, 0.0];
let ent = entropy(&labels);
let gini_val = gini(&labels);
// Distance calculations
let v1 = array![1.0, 2.0];
let v2 = array![4.0, 6.0];
let dist = squared_euclidean_distance_row(&v1, &v2);
// Activation function
let activated = sigmoid(0.5);Functions§
- average_
path_ length_ factor - Calculates the average path length adjustment factor for isolation trees.
- binary_
search_ sigma - Finds the sigma value that matches a target perplexity for distance-derived probabilities.
- entropy
- Calculates the entropy of a label set.
- gain_
ratio - Calculates the gain ratio for a dataset split.
- gini
- Calculates the Gini impurity of a label set.
- information_
gain - Calculates the information gain when splitting a dataset.
- logistic_
loss - Calculates the logistic regression loss (log loss).
- manhattan_
distance_ row - Calculates the Manhattan (L1) distance between two vectors.
- minkowski_
distance_ row - Calculates the Minkowski distance between two vectors.
- sigmoid
- Computes the logistic sigmoid for a scalar input.
- squared_
euclidean_ distance_ row - Calculates the squared Euclidean distance between two vectors.
- standard_
deviation - Calculates the population standard deviation of a set of values.
- sum_
of_ square_ total - Calculates the total sum of squares (SST).
- sum_
of_ squared_ errors - Calculates the sum of squared errors (SSE).
- variance
- Calculates the mean squared error (variance) of a set of values.