# PC-RL-Core
[](https://github.com/BolivarTech/PC-RL-Core/actions/workflows/ci.yml)
[](https://crates.io/crates/pc-rl-core)
[](https://docs.rs/pc-rl-core)
[](https://www.rust-lang.org)
[](LICENSE-MIT)
A **Deliberative Predictive Coding (DPC)** reinforcement learning framework implemented entirely in Rust with zero ML framework dependencies.
The actor **deliberates before acting** by running an iterative free energy minimization loop (predictive coding inference), and a **residual echo of that deliberation** feeds back into weight updates as a structured micro-regularizer. These two mechanisms form a coupled system: deliberation generates the signal, the signal improves learning, and better learning improves future deliberation.
The library is **backend-agnostic**: all linear algebra operations are abstracted behind a `LinAlg` trait, enabling future GPU backends (wgpu, CUDA) without changing the RL logic.
## Installation
```toml
[dependencies]
pc-rl-core = "2.0"
```
## Quick Start
```rust
use pc_rl_core::{
CpuLinAlg, PcActorCritic, PcActorCriticConfig, PcActorConfig, MlpCriticConfig,
Activation, LayerDef, SelectionMode,
};
// Configure the agent
let actor_config = PcActorConfig {
input_size: 9,
output_size: 9,
hidden_layers: vec![LayerDef { size: 27, activation: Activation::Softsign }],
output_activation: Activation::Linear,
alpha: 0.03,
tol: 0.01,
min_steps: 1,
max_steps: 5,
lr_weights: 0.005,
synchronous: true,
temperature: 1.0,
local_lambda: 0.99,
residual: false,
rezero_init: 0.001,
};
let critic_config = MlpCriticConfig {
input_size: 36, // state_dim + latent_dim
hidden_layers: vec![LayerDef { size: 36, activation: Activation::Softsign }],
output_activation: Activation::Linear,
lr: 0.005,
};
let config = PcActorCriticConfig {
actor: actor_config,
critic: critic_config,
gamma: 0.99,
surprise_low: 0.02,
surprise_high: 0.15,
adaptive_surprise: true,
surprise_buffer_size: 400,
entropy_coeff: 0.0,
};
let backend = CpuLinAlg::new();
let mut agent = PcActorCritic::new(backend, config, 42)?;
// Training loop: act, collect trajectory steps, learn
let (action, infer_result) = agent.act(&state, &valid_actions, SelectionMode::Training);
// ... execute action in environment, collect TrajectoryStep per timestep ...
let avg_loss = agent.learn(&trajectory);
// Evaluation (deterministic)
let (action, _) = agent.act(&state, &valid_actions, SelectionMode::Play);
```
## Architecture
### Core Components
- **`PcActor<L: LinAlg>`** -- Policy network with predictive coding inference loop, residual skip connections, surprise scoring, and CCA crossover
- **`MlpCritic<L: LinAlg>`** -- Standard MLP value function with MSE loss backpropagation and CCA crossover
- **`PcActorCritic<L: LinAlg>`** -- Integrated agent combining actor and critic with surprise-based learning rate scheduling
- **`Layer<L: LinAlg>`** -- Dense layer with forward, transpose (PC top-down), and backward passes
- **`LinAlg` trait** -- Backend-agnostic linear algebra interface (31 instance methods). Default implementation: `CpuLinAlg`
- **`GolubKahanSvd`** -- O(n^3) SVD via bidiagonalization, used for CCA neuron alignment
### Key Mechanisms
**Predictive Coding Inference**: Instead of a single feedforward pass, the actor runs an iterative inference loop where higher layers generate top-down predictions of lower layer states. The prediction error (surprise) between layers drives hidden state updates until convergence.
**Residual Echo (local_lambda)**: A small fraction of prediction errors from deliberation is blended into backpropagation gradients: `delta = lambda * backprop_grad + (1-lambda) * pc_error`. This couples inference and learning into a synergistic system.
**Adaptive Surprise Scheduling**: A circular buffer of recent surprise scores dynamically calibrates learning rate thresholds. Low surprise reduces LR (familiar states), high surprise boosts LR (novel states). Buffer-mediated damping protects learned representations during environment transitions.
**CCA Crossover**: GA-ready crossover operator using Canonical Correlation Analysis to align neurons functionally before blending weights, solving the permutation problem. Supports dimension mismatches, layer count differences, and residual components.
### Type Aliases
```rust
type PcActorCpu = PcActor<CpuLinAlg>;
type MlpCriticCpu = MlpCritic<CpuLinAlg>;
type PcActorCriticCpu = PcActorCritic<CpuLinAlg>;
type LayerCpu = Layer<CpuLinAlg>;
```
## Project Structure
```
PC-RL-Core/
├── src/
│ ├── linalg/
│ │ ├── mod.rs # LinAlg trait (31 methods, backend-agnostic)
│ │ ├── cpu.rs # CpuLinAlg (Vec<f64> + Matrix)
│ │ └── golub_kahan.rs # Golub-Kahan SVD (O(n^3))
│ ├── activation.rs # Tanh, ReLU, Sigmoid, ELU, Softsign, Linear
│ ├── error.rs # PcError crate-wide error type
│ ├── matrix.rs # Dense matrix, softmax, CCA alignment, Hungarian assignment
│ ├── layer.rs # Layer<L: LinAlg> with PC top-down support
│ ├── pc_actor.rs # PcActor<L> with inference loop, residual, crossover
│ ├── mlp_critic.rs # MlpCritic<L> value function, crossover
│ ├── pc_actor_critic.rs # PcActorCritic<L> agent, ActivationCache, crossover
│ └── serializer.rs # JSON persistence (CPU concrete bridge)
├── docs/
│ ├── experiment_analysis.md # 20 experimental phases, ~3,800 runs
│ └── pc_actor_critic_paper.md # DPC architecture paper
└── Cargo.toml
```
## Research Findings
Validated through 20 experimental phases (~3,800 training runs) on Tic-Tac-Toe ([PC-TicTacToe](https://github.com/BolivarTech/PC-TicTacToe)):
- **Deliberation is the primary advantage** -- PC inference loop adds +2-3 depth levels over equivalent MLP
- **Residual echo breaks performance ceilings** -- 1% PC error blend (lambda=0.99) is statistically significant (p<0.034)
- **Depth-Lambda Scaling Law: `lambda = 1 - 10^(-(L+1))`** -- PC error must decrease exponentially with network depth
- **Lambda and training budget interact** -- ultra-low PC error needs more episodes to accumulate its regularization effect
- **Adaptive surprise eliminates catastrophic forgetting** -- buffer-mediated transition damping protects learned representations during curriculum transitions
- **Optimal buffer ratio: 0.3-0.4 x environment transition window** -- too small resonates, too large over-damps
- **Bounded activations required for PC** -- ReLU dies, ELU explodes; tanh and softsign work
- **Softsign + residual + projection cooperate** -- three mechanisms enable gradient flow in deep networks
- **Parameter efficiency** -- ~550 actor parameters matching networks 4-330x larger through iterative inference
For the complete experimental methodology and statistical analysis, see [docs/experiment_analysis.md](docs/experiment_analysis.md). For the full architecture description and lessons learned, see [docs/pc_actor_critic_paper.md](docs/pc_actor_critic_paper.md).
## Dependencies
- `serde` / `serde_json` -- Serialization
- `rand` -- Random number generation
- `chrono` -- Timestamps
No PyTorch, TensorFlow, or any ML framework. Pure Rust from scratch.
## Testing
392 unit tests + 20 doctests:
```bash
cargo nextest run
cargo test --doc
cargo clippy --tests -- -D warnings
```
## License
Licensed under either of
- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or <http://www.apache.org/licenses/LICENSE-2.0>)
- MIT License ([LICENSE-MIT](LICENSE-MIT) or <http://opensource.org/licenses/MIT>)
at your option.