Expand description
Reinforcement Learning module
Implements RL algorithms:
- Deep Q-Network (DQN)
- Policy Gradient (REINFORCE)
- Actor-Critic (A2C/A3C)
- Proximal Policy Optimization (PPO)
- Deep Deterministic Policy Gradient (DDPG)
Structsยง
- Actor
Critic Agent - Actor-Critic agent (A2C)
- DQNAgent
- Deep Q-Network (DQN) agent
- Experience
- PPOAgent
- PPO (Proximal Policy Optimization) agent
- Policy
Network - Policy network for policy gradient methods
- QNetwork
- Q-Network (simple MLP)
- REINFORCE
Agent - REINFORCE (Policy Gradient) agent
- Replay
Buffer - Experience replay buffer for off-policy learning
- Value
Network - Value network for critic