pub trait TransitionBatch {
type ObsBatch;
type ActBatch;
// Required methods
fn unpack(
self,
) -> (Self::ObsBatch, Self::ActBatch, Self::ObsBatch, Vec<f32>, Vec<i8>, Vec<i8>, Option<Vec<usize>>, Option<Vec<f32>>);
fn len(&self) -> usize;
fn obs(&self) -> &Self::ObsBatch;
fn act(&self) -> &Self::ActBatch;
}
Expand description
A batch of transitions used for training reinforcement learning agents.
This trait represents a collection of transitions in the form (o_t, a_t, o_t+n, r_t, is_terminated_t, is_truncated_t)
,
where:
o_t
is the observation at time step ta_t
is the action taken at time step to_t+n
is the observation n steps after tr_t
is the reward received after taking actiona_t
is_terminated_t
indicates if the episode terminated at this stepis_truncated_t
indicates if the episode was truncated at this step
The value of n determines the type of backup:
- When n = 1, it represents a standard one-step transition
- When n > 1, it represents an n-step transition, which can be used for n-step temporal difference learning
§Associated Types
ObsBatch
- The type used to store batches of observationsActBatch
- The type used to store batches of actions
§Examples
A typical use case is in Q-learning, where transitions are used to update the Q-function:
ⓘ
let (obs, act, next_obs, reward, is_terminated, is_truncated, _, _) = batch.unpack();
let target = reward + gamma * (1 - is_terminated) * max_a Q(next_obs, a);
Required Associated Types§
Required Methods§
Sourcefn unpack(
self,
) -> (Self::ObsBatch, Self::ActBatch, Self::ObsBatch, Vec<f32>, Vec<i8>, Vec<i8>, Option<Vec<usize>>, Option<Vec<f32>>)
fn unpack( self, ) -> (Self::ObsBatch, Self::ActBatch, Self::ObsBatch, Vec<f32>, Vec<i8>, Vec<i8>, Option<Vec<usize>>, Option<Vec<f32>>)
Unpacks the batch into its constituent parts.
Returns a tuple containing:
- The batch of observations at time t
- The batch of actions taken at time t
- The batch of observations at time t+n
- The batch of rewards received
- The batch of termination flags
- The batch of truncation flags
- Optional sample indices (used for prioritized experience replay)
- Optional importance weights (used for prioritized experience replay)
§Returns
A tuple containing all components of the transition batch, with optional metadata for prioritized experience replay.
Sourcefn len(&self) -> usize
fn len(&self) -> usize
Returns the number of transitions in the batch.
This is typically used to determine the batch size for optimization steps and to verify that all components of the batch have consistent sizes.