1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
//! # Reinforcement Learning (`rl`)
//!
//! Self-contained deep RL implementations for both discrete and continuous action spaces.
//!
//! ## Sub-modules
//!
//! | Module | Description |
//! |--------|-------------|
//! | [`environments`] | [`Environment`] trait + `CartPole`, `PendulumEnv`, `GridWorld` |
//! | [`replay_buffer`] | Fixed-capacity circular replay buffer |
//! | [`policy`] | [`Policy`] trait, [`SimpleNetwork`], [`EpsilonGreedy`], [`BoltzmannPolicy`] |
//! | [`value`] | [`ValueNetwork`], [`QNetwork`], [`DuelingQNetwork`] |
//! | [`dqn`] | DQN / Double-DQN agent with ε-greedy exploration |
//! | [`actor_critic`] | A2C: synchronous Advantage Actor-Critic |
//! | [`ppo`] | Proximal Policy Optimisation (clipped surrogate + GAE) |
//! | [`sac`] | Soft Actor-Critic (maximum-entropy, off-policy) |
// ── Environments ─────────────────────────────────────────────────────────────
pub use ;
// ── Policy ───────────────────────────────────────────────────────────────────
pub use ;
// ── Value networks ───────────────────────────────────────────────────────────
pub use ;
// ── DQN ──────────────────────────────────────────────────────────────────────
pub use ;
// ── A2C ──────────────────────────────────────────────────────────────────────
pub use ;
// ── PPO ──────────────────────────────────────────────────────────────────────
pub use ;
// ── SAC ──────────────────────────────────────────────────────────────────────
pub use ;
// ── Replay buffer ────────────────────────────────────────────────────────────
pub use ;