1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
//! Types and traits for handling batches of transitions in reinforcement learning.
//!
//! This module provides abstractions for working with batches of transitions,
//! which are essential for training reinforcement learning agents. A transition
//! represents a single step in the environment, containing the observation,
//! action, next observation, reward, and termination information.
/// A batch of transitions used for training reinforcement learning agents.
///
/// This trait represents a collection of transitions in the form `(o_t, a_t, o_t+n, r_t, is_terminated_t, is_truncated_t)`,
/// where:
/// - `o_t` is the observation at time step t
/// - `a_t` is the action taken at time step t
/// - `o_t+n` is the observation n steps after t
/// - `r_t` is the reward received after taking action `a_t`
/// - `is_terminated_t` indicates if the episode terminated at this step
/// - `is_truncated_t` indicates if the episode was truncated at this step
///
/// The value of n determines the type of backup:
/// - When n = 1, it represents a standard one-step transition
/// - When n > 1, it represents an n-step transition, which can be used for
/// n-step temporal difference learning
///
/// # Associated Types
///
/// * `ObsBatch` - The type used to store batches of observations
/// * `ActBatch` - The type used to store batches of actions
///
/// # Examples
///
/// A typical use case is in Q-learning, where transitions are used to update
/// the Q-function:
/// ```ignore
/// let (obs, act, next_obs, reward, is_terminated, is_truncated, _, _) = batch.unpack();
/// let target = reward + gamma * (1 - is_terminated) * max_a Q(next_obs, a);
/// ```