PC-RL-Core
A Deliberative Predictive Coding (DPC) reinforcement learning framework implemented entirely in Rust with zero ML framework dependencies.
The actor deliberates before acting by running an iterative free energy minimization loop (predictive coding inference), and a residual echo of that deliberation feeds back into weight updates as a structured micro-regularizer. These two mechanisms form a coupled system: deliberation generates the signal, the signal improves learning, and better learning improves future deliberation.
The library is backend-agnostic: all linear algebra operations are abstracted behind a LinAlg trait, enabling future GPU backends (wgpu, CUDA) without changing the RL logic.
Installation
[]
= "1.2.3"
Quick Start
use ;
// Configure the agent
let actor_config = PcActorConfig ;
let critic_config = MlpCriticConfig ;
let config = PcActorCriticConfig ;
let mut agent = new?;
// Training loop: act, collect trajectory steps, learn
let = agent.act;
// ... execute action in environment, collect TrajectoryStep per timestep ...
let avg_loss = agent.learn;
// Evaluation (deterministic)
let = agent.act;
Architecture
Core Components
PcActor<L: LinAlg>-- Policy network with predictive coding inference loop, residual skip connections, surprise scoring, and CCA crossoverMlpCritic<L: LinAlg>-- Standard MLP value function with MSE loss backpropagation and CCA crossoverPcActorCritic<L: LinAlg>-- Integrated agent combining actor and critic with surprise-based learning rate schedulingLayer<L: LinAlg>-- Dense layer with forward, transpose (PC top-down), and backward passesLinAlgtrait -- Backend-agnostic linear algebra interface (32 methods). Default implementation:CpuLinAlgGolubKahanSvd-- O(n^3) SVD via bidiagonalization, used for CCA neuron alignment
Key Mechanisms
Predictive Coding Inference: Instead of a single feedforward pass, the actor runs an iterative inference loop where higher layers generate top-down predictions of lower layer states. The prediction error (surprise) between layers drives hidden state updates until convergence.
Residual Echo (local_lambda): A small fraction of prediction errors from deliberation is blended into backpropagation gradients: delta = lambda * backprop_grad + (1-lambda) * pc_error. This couples inference and learning into a synergistic system.
Adaptive Surprise Scheduling: A circular buffer of recent surprise scores dynamically calibrates learning rate thresholds. Low surprise reduces LR (familiar states), high surprise boosts LR (novel states). Buffer-mediated damping protects learned representations during environment transitions.
CCA Crossover: GA-ready crossover operator using Canonical Correlation Analysis to align neurons functionally before blending weights, solving the permutation problem. Supports dimension mismatches, layer count differences, and residual components.
Type Aliases
type PcActorCpu = ;
type MlpCriticCpu = ;
type PcActorCriticCpu = ;
type LayerCpu = ;
Project Structure
PC-RL-Core/
├── src/
│ ├── linalg/
│ │ ├── mod.rs # LinAlg trait (32 methods, backend-agnostic)
│ │ ├── cpu.rs # CpuLinAlg (Vec<f64> + Matrix)
│ │ └── golub_kahan.rs # Golub-Kahan SVD (O(n^3))
│ ├── activation.rs # Tanh, ReLU, Sigmoid, ELU, Softsign, Linear
│ ├── error.rs # PcError crate-wide error type
│ ├── matrix.rs # Dense matrix, softmax, CCA alignment, Hungarian assignment
│ ├── layer.rs # Layer<L: LinAlg> with PC top-down support
│ ├── pc_actor.rs # PcActor<L> with inference loop, residual, crossover
│ ├── mlp_critic.rs # MlpCritic<L> value function, crossover
│ ├── pc_actor_critic.rs # PcActorCritic<L> agent, ActivationCache, crossover
│ └── serializer.rs # JSON persistence (CPU concrete bridge)
├── docs/
│ ├── experiment_analysis.md # 20 experimental phases, ~3,800 runs
│ └── pc_actor_critic_paper.md # DPC architecture paper
└── Cargo.toml
Research Findings
Validated through 20 experimental phases (~3,800 training runs) on Tic-Tac-Toe (PC-TicTacToe):
- Deliberation is the primary advantage -- PC inference loop adds +2-3 depth levels over equivalent MLP
- Residual echo breaks performance ceilings -- 1% PC error blend (lambda=0.99) is statistically significant (p<0.034)
- Depth-Lambda Scaling Law:
lambda = 1 - 10^(-(L+1))-- PC error must decrease exponentially with network depth - Lambda and training budget interact -- ultra-low PC error needs more episodes to accumulate its regularization effect
- Adaptive surprise eliminates catastrophic forgetting -- buffer-mediated transition damping protects learned representations during curriculum transitions
- Optimal buffer ratio: 0.3-0.4 x environment transition window -- too small resonates, too large over-damps
- Bounded activations required for PC -- ReLU dies, ELU explodes; tanh and softsign work
- Softsign + residual + projection cooperate -- three mechanisms enable gradient flow in deep networks
- Parameter efficiency -- ~550 actor parameters matching networks 4-330x larger through iterative inference
For the complete experimental methodology and statistical analysis, see docs/experiment_analysis.md. For the full architecture description and lessons learned, see docs/pc_actor_critic_paper.md.
Dependencies
serde/serde_json-- Serializationrand-- Random number generationchrono-- Timestamps
No PyTorch, TensorFlow, or any ML framework. Pure Rust from scratch.
Testing
384 unit tests + 20 doctests:
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.