Expand description
§ruqu-qarlp: Quantum-Assisted Reinforcement Learning Policy
A Rust implementation of quantum-assisted reinforcement learning using variational quantum circuits as policy networks. This crate provides a complete framework for training quantum RL agents.
§Overview
This crate implements the QARLP (Quantum-Assisted Reinforcement Learning Policy) algorithm, which uses variational quantum circuits (VQCs) to represent policies in reinforcement learning. The key components are:
-
Quantum Policy Network: A variational quantum circuit that maps states to action probabilities through parameterized rotation gates.
-
Policy Gradient: REINFORCE algorithm with baseline subtraction, using the parameter-shift rule for exact gradient computation on quantum circuits.
-
Environment Interface: Generic trait for RL environments, with included implementations of GridWorld and CartPole for testing.
-
Training Loop: Complete training infrastructure with checkpointing, logging, and metrics.
§Architecture
The quantum policy network consists of:
-
State Encoding: Classical state vectors are encoded as rotation angles on qubits using RX gates.
-
Variational Layers: Parameterized RY and RZ rotations with CNOT entanglement gates form the trainable part of the circuit.
-
Measurement: Computational basis measurement probabilities are mapped to action probabilities via softmax.
§Example
use ruqu_qarlp::prelude::*;
// Create a quantum policy
let policy_config = PolicyConfig {
num_qubits: 4,
num_layers: 2,
num_actions: 4,
..Default::default()
};
let policy = QuantumPolicy::new(policy_config).unwrap();
// Create an environment
let env_config = GridWorldConfig::default();
let env = GridWorld::new(env_config).unwrap();
// Create trainer and train
let trainer_config = TrainerConfig {
episodes_per_update: 10,
max_steps_per_episode: 100,
..Default::default()
};
let mut trainer = Trainer::new(trainer_config, policy, env).unwrap();
// Train for 100 iterations
let result = trainer.train(100).unwrap();
println!("Final reward: {}", result.final_average_reward);§Tier 3 Capability (Exploratory)
This crate represents a Tier 3 (Score 69) exploratory quantum RL implementation. The two-week test criteria are:
- Policy gradient update works correctly
- Simple environment shows learning signal
§Features
parallel: Enable parallel gradient computation using rayon (not yet implemented)
§References
- Schuld, M., & Petruccione, F. (2018). Supervised Learning with Quantum Computers
- Mitarai, K., et al. (2018). Quantum circuit learning
- Jerbi, S., et al. (2021). Parametrized quantum policies for reinforcement learning
Modules§
- buffer
- Replay buffer for experience storage and sampling.
- environment
- Environment interface for reinforcement learning.
- error
- Error types for the QARLP (Quantum-Assisted Reinforcement Learning Policy) crate.
- gradient
- Gradient computation for quantum policy optimization.
- policy
- Quantum Policy Network implementation.
- prelude
- Prelude module for convenient imports.
- training
- Training loop for quantum RL policy.