Module activations

Expand description

Activation functions for neural networks

This module provides common activation functions used in neural networks. Activation functions introduce non-linearity into neural networks, enabling them to learn complex patterns and relationships.

§Overview

Activation functions are mathematical functions that determine whether a neuron should be activated or not based on the input. They introduce non-linearity to the network, allowing it to learn complex mappings between inputs and outputs.

§Available Activation Functions

ReLU (Rectified Linear Unit): Most commonly used, simple and effective
Sigmoid: Maps input to (0,1), useful for binary classification output layers
Tanh: Maps input to (-1,1), often better than sigmoid for hidden layers
Softmax: Converts logits to probability distribution, used in multi-class classification
GELU (Gaussian Error Linear Unit): Smooth alternative to ReLU, used in transformers
Swish/SiLU: Self-gated activation, often outperforms ReLU
Mish: Smooth, non-monotonic activation function
Leaky ReLU: Variant of ReLU that allows small negative values
ELU (Exponential Linear Unit): Smooth variant of ReLU

§Examples

§Basic Usage

use scirs2_neural::activations::{Activation, ReLU, Sigmoid, Softmax};
use ndarray::Array;

// Create activation functions
let relu = ReLU::new();
let sigmoid = Sigmoid::new();
let softmax = Softmax::new(0); // Apply softmax along axis 0

// Create input data
let input = Array::from_vec(vec![-2.0, -1.0, 0.0, 1.0, 2.0])
    .into_dyn();

// Apply ReLU activation
let relu_output = relu.forward(&input)?;
// Output: [0.0, 0.0, 0.0, 1.0, 2.0]

// Apply Sigmoid activation
let sigmoid_output = sigmoid.forward(&input)?;
// Output: [0.119, 0.269, 0.5, 0.731, 0.881] (approximately)

// For softmax, typically used with 2D input (batch_size, num_classes)
let logits = Array::from_shape_vec((1, 3), vec![1.0, 2.0, 3.0])?.into_dyn();
let probabilities = softmax.forward(&logits)?;
// Output: [[0.090, 0.245, 0.665]] (approximately, sums to 1.0)

§Using in Forward and Backward Pass

use scirs2_neural::activations::{Activation, ReLU};
use ndarray::Array;

let relu = ReLU::new();
let input = Array::from_vec(vec![-1.0, 0.5, 2.0]).into_dyn();

// Forward pass
let output = relu.forward(&input)?;
println!("ReLU output: {:?}", output);
// Output: [0.0, 0.5, 2.0]

// Backward pass (computing gradients)
let grad_output = Array::from_vec(vec![1.0, 1.0, 1.0]).into_dyn();
let grad_input = relu.backward(&grad_output, &output)?;
println!("ReLU gradient: {:?}", grad_input);
// Output: [0.0, 1.0, 1.0] (gradient is 0 for negative inputs, 1 for positive)

§Choosing the Right Activation Function

§For Hidden Layers:

ReLU: Default choice, computationally efficient, prevents vanishing gradient
GELU: Good for transformer architectures
Swish: Often outperforms ReLU, especially in deep networks
Tanh: When you need outputs centered around zero

§For Output Layers:

Sigmoid: Binary classification (single output)
Softmax: Multi-class classification (multiple outputs that sum to 1)
Linear (no activation): Regression tasks
Tanh: When output should be in range (-1, 1)

§Performance Considerations

ReLU and Leaky ReLU: Fastest to compute
Sigmoid and Tanh: Require expensive exponential operations
Softmax: Most expensive, but only used in output layer typically
GELU and Swish: More expensive than ReLU but can provide better results

Structs§

ELU: Exponential Linear Unit (ELU) activation function.
GELU: Gaussian Error Linear Unit (GELU) activation function.
LeakyReLU: Leaky Rectified Linear Unit (LeakyReLU) activation function.
Mish: Mish activation function.
ReLU: Rectified Linear Unit (ReLU) activation function.
Sigmoid: Sigmoid activation function.
Softmax: Softmax activation function.
Swish: Swish activation function.
Tanh: Hyperbolic tangent (tanh) activation function.

Traits§

Activation: Trait for activation functions

Module activationsCopy item path