Expand description
Activation functions for neural networks
This module provides common activation functions used in neural networks. Activation functions introduce non-linearity into neural networks, enabling them to learn complex patterns and relationships.
§Overview
Activation functions are mathematical functions that determine whether a neuron should be activated or not based on the input. They introduce non-linearity to the network, allowing it to learn complex mappings between inputs and outputs.
§Available Activation Functions
- ReLU (Rectified Linear Unit): Most commonly used, simple and effective
- Sigmoid: Maps input to (0,1), useful for binary classification output layers
- Tanh: Maps input to (-1,1), often better than sigmoid for hidden layers
- Softmax: Converts logits to probability distribution, used in multi-class classification
- GELU (Gaussian Error Linear Unit): Smooth alternative to ReLU, used in transformers
- Swish/SiLU: Self-gated activation, often outperforms ReLU
- Mish: Smooth, non-monotonic activation function
- Leaky ReLU: Variant of ReLU that allows small negative values
- ELU (Exponential Linear Unit): Smooth variant of ReLU
§Examples
§Basic Usage
use scirs2_neural::activations::{Activation, ReLU, Sigmoid, Softmax};
use ndarray::Array;
// Create activation functions
let relu = ReLU::new();
let sigmoid = Sigmoid::new();
let softmax = Softmax::new(0); // Apply softmax along axis 0
// Create input data
let input = Array::from_vec(vec![-2.0, -1.0, 0.0, 1.0, 2.0])
.into_dyn();
// Apply ReLU activation
let relu_output = relu.forward(&input)?;
// Output: [0.0, 0.0, 0.0, 1.0, 2.0]
// Apply Sigmoid activation
let sigmoid_output = sigmoid.forward(&input)?;
// Output: [0.119, 0.269, 0.5, 0.731, 0.881] (approximately)
// For softmax, typically used with 2D input (batch_size, num_classes)
let logits = Array::from_shape_vec((1, 3), vec![1.0, 2.0, 3.0])?.into_dyn();
let probabilities = softmax.forward(&logits)?;
// Output: [[0.090, 0.245, 0.665]] (approximately, sums to 1.0)
§Using in Forward and Backward Pass
use scirs2_neural::activations::{Activation, ReLU};
use ndarray::Array;
let relu = ReLU::new();
let input = Array::from_vec(vec![-1.0, 0.5, 2.0]).into_dyn();
// Forward pass
let output = relu.forward(&input)?;
println!("ReLU output: {:?}", output);
// Output: [0.0, 0.5, 2.0]
// Backward pass (computing gradients)
let grad_output = Array::from_vec(vec![1.0, 1.0, 1.0]).into_dyn();
let grad_input = relu.backward(&grad_output, &output)?;
println!("ReLU gradient: {:?}", grad_input);
// Output: [0.0, 1.0, 1.0] (gradient is 0 for negative inputs, 1 for positive)
§Choosing the Right Activation Function
§For Hidden Layers:
- ReLU: Default choice, computationally efficient, prevents vanishing gradient
- GELU: Good for transformer architectures
- Swish: Often outperforms ReLU, especially in deep networks
- Tanh: When you need outputs centered around zero
§For Output Layers:
- Sigmoid: Binary classification (single output)
- Softmax: Multi-class classification (multiple outputs that sum to 1)
- Linear (no activation): Regression tasks
- Tanh: When output should be in range (-1, 1)
§Performance Considerations
- ReLU and Leaky ReLU: Fastest to compute
- Sigmoid and Tanh: Require expensive exponential operations
- Softmax: Most expensive, but only used in output layer typically
- GELU and Swish: More expensive than ReLU but can provide better results
Structs§
- ELU
- Exponential Linear Unit (ELU) activation function.
- GELU
- Gaussian Error Linear Unit (GELU) activation function.
- Leaky
ReLU - Leaky Rectified Linear Unit (LeakyReLU) activation function.
- Mish
- Mish activation function.
- ReLU
- Rectified Linear Unit (ReLU) activation function.
- Sigmoid
- Sigmoid activation function.
- Softmax
- Softmax activation function.
- Swish
- Swish activation function.
- Tanh
- Hyperbolic tangent (tanh) activation function.
Traits§
- Activation
- Trait for activation functions