Skip to main content

Module nn

Module nn 

Source
Expand description

Neural network modules for deep learning.

This module provides PyTorch-compatible neural network building blocks following the API design described in Paszke et al. (2019).

§Architecture

The nn module is organized around the Module trait, which defines the interface for all neural network layers:

§Example

use aprender::nn::{Module, Linear, ReLU, Sequential};
use aprender::autograd::Tensor;

// Build a simple MLP
let model = Sequential::new()
    .add(Linear::new(784, 256))
    .add(ReLU::new())
    .add(Linear::new(256, 10));

// Forward pass
let x = Tensor::randn(&[32, 784]);  // batch of 32
let output = model.forward(&x);     // [32, 10]

// Get all parameters for optimizer
let params = model.parameters();

§References

  • Paszke, A., et al. (2019). PyTorch: An imperative style, high-performance deep learning library. NeurIPS.
  • Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. AISTATS.
  • He, K., et al. (2015). Delving deep into rectifiers. ICCV.

Re-exports§

pub use functional as F;
pub use gnn::AdjacencyMatrix;
pub use gnn::GATConv;
pub use gnn::GCNConv;
pub use gnn::MessagePassing;
pub use gnn::SAGEAggregation;
pub use gnn::SAGEConv;
pub use loss::BCEWithLogitsLoss;
pub use loss::CrossEntropyLoss;
pub use loss::L1Loss;
pub use loss::MSELoss;
pub use loss::NLLLoss;
pub use loss::Reduction;
pub use loss::SmoothL1Loss;
pub use optim::Adam;
pub use optim::AdamW;
pub use optim::Optimizer;
pub use optim::RMSprop;
pub use optim::SGD;
pub use scheduler::CosineAnnealingLR;
pub use scheduler::ExponentialLR;
pub use scheduler::LRScheduler;
pub use scheduler::LinearWarmup;
pub use scheduler::PlateauMode;
pub use scheduler::ReduceLROnPlateau;
pub use scheduler::StepLR;
pub use scheduler::WarmupCosineScheduler;

Modules§

functional
Functional interface for neural network operations.
generation
Sequence generation and decoding algorithms.
gnn
Graph Neural Network layers for learning on graph-structured data.
loss
Differentiable loss functions for neural network training.
optim
Gradient-based optimizers for neural network training.
quantization
Quantization-Aware Training (QAT) module.
scheduler
Learning rate schedulers for training neural networks.
self_supervised
Self-Supervised Learning Pretext Tasks.
serialize
Neural network model serialization.
vae
Variational Autoencoder (VAE) module.

Structs§

ALiBi
ALiBi (Attention with Linear Biases) (Press et al., 2022).
AlphaDropout
Alpha Dropout for SELU activations.
AvgPool2d
Average Pooling 2D.
BatchNorm1d
Batch Normalization for 1D inputs (Ioffe & Szegedy, 2015).
Bidirectional
Bidirectional RNN wrapper.
Conv1d
1D Convolution layer.
Conv2d
2D Convolution layer.
ConvDimensionNumbers
Fully describes input, kernel, and output data format for a convolution.
DropBlock
DropBlock regularization (Ghiasi et al., 2018).
DropConnect
DropConnect regularization (Wan et al., 2013).
Dropout
Dropout regularization layer.
Dropout2d
2D Dropout (Spatial Dropout).
Flatten
Flatten layer.
GELU
Gaussian Error Linear Unit (GELU) activation.
GRU
Gated Recurrent Unit (GRU) layer.
GlobalAvgPool2d
Global Average Pooling 2D.
GroupNorm
Group Normalization (Wu & He, 2018).
GroupedQueryAttention
Grouped Query Attention (GQA).
InstanceNorm
Instance Normalization.
LSTM
Long Short-Term Memory (LSTM) layer.
LayerNorm
Layer Normalization (Ba et al., 2016).
LeakyReLU
Leaky ReLU activation: LeakyReLU(x) = max(negative_slope * x, x)
Linear
Fully connected layer: y = xW^T + b
LinearAttention
Linear Attention with kernel feature maps.
MaxPool1d
Max Pooling 1D.
MaxPool2d
Max Pooling 2D.
ModuleDict
Dictionary of named modules with string-based access.
ModuleList
List of modules with index-based access.
MultiHeadAttention
Multi-Head Attention (Vaswani et al., 2017).
PositionalEncoding
Sinusoidal Positional Encoding (Vaswani et al., 2017).
RMSNorm
Root Mean Square Layer Normalization (Zhang & Sennrich, 2019).
ReLU
Rectified Linear Unit activation: ReLU(x) = max(0, x)
RotaryPositionEmbedding
Rotary Position Embedding (RoPE) (Su et al., 2021).
Sequential
Sequential container for chaining modules.
Sigmoid
Sigmoid activation: σ(x) = 1 / (1 + exp(-x))
Softmax
Softmax activation: softmax(x)_i = exp(x_i) / Σ_j exp(x_j)
Tanh
Tanh activation: tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))
TransformerDecoderLayer
Transformer Decoder Layer.
TransformerEncoderLayer
Transformer Encoder Layer.

Enums§

ConvLayout
Data layout for convolution inputs and outputs.
KernelLayout
Kernel (weight) layout for convolution filters.

Traits§

Module
Base trait for all neural network modules.

Functions§

generate_causal_mask
Generate causal (triangular) attention mask.
kaiming_normal
Kaiming normal initialization (He et al., 2015).
kaiming_uniform
Kaiming uniform initialization (He et al., 2015).
xavier_normal
Xavier normal initialization (Glorot & Bengio, 2010).
xavier_uniform
Xavier uniform initialization (Glorot & Bengio, 2010).