Crate ruqu_qarlp

Crate ruqu_qarlp 

Source
Expand description

§ruqu-qarlp: Quantum-Assisted Reinforcement Learning Policy

A Rust implementation of quantum-assisted reinforcement learning using variational quantum circuits as policy networks. This crate provides a complete framework for training quantum RL agents.

§Overview

This crate implements the QARLP (Quantum-Assisted Reinforcement Learning Policy) algorithm, which uses variational quantum circuits (VQCs) to represent policies in reinforcement learning. The key components are:

  • Quantum Policy Network: A variational quantum circuit that maps states to action probabilities through parameterized rotation gates.

  • Policy Gradient: REINFORCE algorithm with baseline subtraction, using the parameter-shift rule for exact gradient computation on quantum circuits.

  • Environment Interface: Generic trait for RL environments, with included implementations of GridWorld and CartPole for testing.

  • Training Loop: Complete training infrastructure with checkpointing, logging, and metrics.

§Architecture

The quantum policy network consists of:

  1. State Encoding: Classical state vectors are encoded as rotation angles on qubits using RX gates.

  2. Variational Layers: Parameterized RY and RZ rotations with CNOT entanglement gates form the trainable part of the circuit.

  3. Measurement: Computational basis measurement probabilities are mapped to action probabilities via softmax.

§Example

use ruqu_qarlp::prelude::*;

// Create a quantum policy
let policy_config = PolicyConfig {
    num_qubits: 4,
    num_layers: 2,
    num_actions: 4,
    ..Default::default()
};
let policy = QuantumPolicy::new(policy_config).unwrap();

// Create an environment
let env_config = GridWorldConfig::default();
let env = GridWorld::new(env_config).unwrap();

// Create trainer and train
let trainer_config = TrainerConfig {
    episodes_per_update: 10,
    max_steps_per_episode: 100,
    ..Default::default()
};
let mut trainer = Trainer::new(trainer_config, policy, env).unwrap();

// Train for 100 iterations
let result = trainer.train(100).unwrap();
println!("Final reward: {}", result.final_average_reward);

§Tier 3 Capability (Exploratory)

This crate represents a Tier 3 (Score 69) exploratory quantum RL implementation. The two-week test criteria are:

  • Policy gradient update works correctly
  • Simple environment shows learning signal

§Features

  • parallel: Enable parallel gradient computation using rayon (not yet implemented)

§References

  • Schuld, M., & Petruccione, F. (2018). Supervised Learning with Quantum Computers
  • Mitarai, K., et al. (2018). Quantum circuit learning
  • Jerbi, S., et al. (2021). Parametrized quantum policies for reinforcement learning

Modules§

buffer
Replay buffer for experience storage and sampling.
environment
Environment interface for reinforcement learning.
error
Error types for the QARLP (Quantum-Assisted Reinforcement Learning Policy) crate.
gradient
Gradient computation for quantum policy optimization.
policy
Quantum Policy Network implementation.
prelude
Prelude module for convenient imports.
training
Training loop for quantum RL policy.