pub trait Agent<E: Env, R: ReplayBufferBase>: Policy<E> {
// Provided methods
fn train(&mut self) { ... }
fn eval(&mut self) { ... }
fn is_train(&self) -> bool { ... }
fn opt(&mut self, buffer: &mut R) { ... }
fn opt_with_record(&mut self, buffer: &mut R) -> Record { ... }
fn save_params(&self, path: &Path) -> Result<Vec<PathBuf>> { ... }
fn load_params(&mut self, path: &Path) -> Result<()> { ... }
fn as_any_ref(&self) -> &dyn Any { ... }
fn as_any_mut(&mut self) -> &mut dyn Any { ... }
}
Expand description
A trainable policy that can learn from environment interactions.
This trait extends Policy
with training capabilities, allowing the policy to:
- Switch between training and evaluation modes
- Perform optimization steps using experience from a replay buffer
- Save and load model parameters
The agent operates in two distinct modes:
- Training mode: The policy may be stochastic to facilitate exploration
- Evaluation mode: The policy is typically deterministic for consistent performance
During training, the agent uses a replay buffer to store and sample experiences, which are then used to update the policy’s parameters through optimization steps.
Provided Methods§
Sourcefn train(&mut self)
fn train(&mut self)
Switches the agent to training mode.
In training mode, the policy may become stochastic to facilitate exploration. This is typically implemented by enabling noise or randomness in the action selection process.
Sourcefn eval(&mut self)
fn eval(&mut self)
Switches the agent to evaluation mode.
In evaluation mode, the policy typically becomes deterministic to ensure consistent performance. This is often implemented by disabling noise or using the mean action instead of sampling from a distribution.
Sourcefn is_train(&self) -> bool
fn is_train(&self) -> bool
Returns whether the agent is currently in training mode.
This method is used to determine the agent’s current mode and can be used to conditionally enable or disable certain behaviors.
Sourcefn opt(&mut self, buffer: &mut R)
fn opt(&mut self, buffer: &mut R)
Performs a single optimization step using experiences from the replay buffer.
This method updates the agent’s parameters using a batch of transitions sampled from the provided replay buffer. The specific optimization algorithm (e.g., Q-learning, policy gradient) is determined by the agent’s implementation.
§Arguments
buffer
- The replay buffer containing experiences used for training
Sourcefn opt_with_record(&mut self, buffer: &mut R) -> Record
fn opt_with_record(&mut self, buffer: &mut R) -> Record
Performs an optimization step and returns training metrics.
Similar to opt
, but also returns a Record
containing training metrics
such as loss values, gradients, or other relevant statistics.
§Arguments
buffer
- The replay buffer containing experiences used for training
§Returns
A Record
containing training metrics and statistics
Sourcefn save_params(&self, path: &Path) -> Result<Vec<PathBuf>>
fn save_params(&self, path: &Path) -> Result<Vec<PathBuf>>
Saves the agent’s parameters to the specified directory.
This method serializes the agent’s current state (e.g., neural network weights, policy parameters) to files in the given directory. The specific format and number of files created depends on the agent’s implementation.
§Arguments
path
- The directory where parameters will be saved
§Returns
A vector of paths to the saved parameter files
§Examples
For example, a DQN agent might save two Q-networks (original and target) in separate files within the specified directory.
Sourcefn load_params(&mut self, path: &Path) -> Result<()>
fn load_params(&mut self, path: &Path) -> Result<()>
Loads the agent’s parameters from the specified directory.
This method deserializes the agent’s state from files in the given directory, restoring the agent to a previously saved state.
§Arguments
path
- The directory containing the saved parameter files
Sourcefn as_any_ref(&self) -> &dyn Any
fn as_any_ref(&self) -> &dyn Any
Returns a reference to the agent as a type-erased Any
value.
This method is required for asynchronous training, allowing the agent to be stored in a type-erased container. The returned reference can be downcast to the concrete agent type when needed.
Sourcefn as_any_mut(&mut self) -> &mut dyn Any
fn as_any_mut(&mut self) -> &mut dyn Any
Returns a mutable reference to the agent as a type-erased Any
value.
This method is required for asynchronous training, allowing the agent to be stored in a type-erased container. The returned reference can be downcast to the concrete agent type when needed.