Module activations

Source
Expand description

Activation functions for neural networks

This module provides common activation functions used in neural networks. Activation functions introduce non-linearity into neural networks, enabling them to learn complex patterns and relationships.

§Overview

Activation functions are mathematical functions that determine whether a neuron should be activated or not based on the input. They introduce non-linearity to the network, allowing it to learn complex mappings between inputs and outputs.

§Available Activation Functions

  • ReLU (Rectified Linear Unit): Most commonly used, simple and effective
  • Sigmoid: Maps input to (0,1), useful for binary classification output layers
  • Tanh: Maps input to (-1,1), often better than sigmoid for hidden layers
  • Softmax: Converts logits to probability distribution, used in multi-class classification
  • GELU (Gaussian Error Linear Unit): Smooth alternative to ReLU, used in transformers
  • Swish/SiLU: Self-gated activation, often outperforms ReLU
  • Mish: Smooth, non-monotonic activation function
  • Leaky ReLU: Variant of ReLU that allows small negative values
  • ELU (Exponential Linear Unit): Smooth variant of ReLU

§Examples

§Basic Usage

use scirs2_neural::activations::{Activation, ReLU, Sigmoid, Softmax};
use ndarray::Array;

// Create activation functions
let relu = ReLU::new();
let sigmoid = Sigmoid::new();
let softmax = Softmax::new(0); // Apply softmax along axis 0

// Create input data
let input = Array::from_vec(vec![-2.0, -1.0, 0.0, 1.0, 2.0])
    .into_dyn();

// Apply ReLU activation
let relu_output = relu.forward(&input)?;
// Output: [0.0, 0.0, 0.0, 1.0, 2.0]

// Apply Sigmoid activation
let sigmoid_output = sigmoid.forward(&input)?;
// Output: [0.119, 0.269, 0.5, 0.731, 0.881] (approximately)

// For softmax, typically used with 2D input (batch_size, num_classes)
let logits = Array::from_shape_vec((1, 3), vec![1.0, 2.0, 3.0])?.into_dyn();
let probabilities = softmax.forward(&logits)?;
// Output: [[0.090, 0.245, 0.665]] (approximately, sums to 1.0)

§Using in Forward and Backward Pass

use scirs2_neural::activations::{Activation, ReLU};
use ndarray::Array;

let relu = ReLU::new();
let input = Array::from_vec(vec![-1.0, 0.5, 2.0]).into_dyn();

// Forward pass
let output = relu.forward(&input)?;
println!("ReLU output: {:?}", output);
// Output: [0.0, 0.5, 2.0]

// Backward pass (computing gradients)
let grad_output = Array::from_vec(vec![1.0, 1.0, 1.0]).into_dyn();
let grad_input = relu.backward(&grad_output, &output)?;
println!("ReLU gradient: {:?}", grad_input);
// Output: [0.0, 1.0, 1.0] (gradient is 0 for negative inputs, 1 for positive)

§Choosing the Right Activation Function

§For Hidden Layers:

  • ReLU: Default choice, computationally efficient, prevents vanishing gradient
  • GELU: Good for transformer architectures
  • Swish: Often outperforms ReLU, especially in deep networks
  • Tanh: When you need outputs centered around zero

§For Output Layers:

  • Sigmoid: Binary classification (single output)
  • Softmax: Multi-class classification (multiple outputs that sum to 1)
  • Linear (no activation): Regression tasks
  • Tanh: When output should be in range (-1, 1)

§Performance Considerations

  • ReLU and Leaky ReLU: Fastest to compute
  • Sigmoid and Tanh: Require expensive exponential operations
  • Softmax: Most expensive, but only used in output layer typically
  • GELU and Swish: More expensive than ReLU but can provide better results

Structs§

ELU
Exponential Linear Unit (ELU) activation function.
GELU
Gaussian Error Linear Unit (GELU) activation function.
LeakyReLU
Leaky Rectified Linear Unit (LeakyReLU) activation function.
Mish
Mish activation function.
ReLU
Rectified Linear Unit (ReLU) activation function.
Sigmoid
Sigmoid activation function.
Softmax
Softmax activation function.
Swish
Swish activation function.
Tanh
Hyperbolic tangent (tanh) activation function.

Traits§

Activation
Trait for activation functions