pub struct PpoConfig {
pub n_steps: usize,
pub n_envs: usize,
pub n_epochs: usize,
pub batch_size: usize,
pub learning_rate: f64,
pub clip_epsilon: f64,
pub value_loss_coef: f64,
pub entropy_coef: f64,
pub gamma: f64,
pub gae_lambda: f64,
pub hidden_sizes: Vec<usize>,
pub max_grad_norm: f64,
}Expand description
Hyperparameters for the PPO algorithm.
Fields§
§n_steps: usizeSteps collected per environment before each update. Total rollout size = n_steps * n_envs.
n_envs: usizeNumber of parallel environments feeding this agent. Must match the number of envs in your BevyGymPlugin / training loop.
n_epochs: usizeNumber of gradient epochs over each rollout. Typical: 4-10.
batch_size: usizeMinibatch size for each gradient step. Must divide n_steps * n_envs evenly.
learning_rate: f64§clip_epsilon: f64Clipping range for the probability ratio. Typical: 0.1-0.3.
value_loss_coef: f64Weight on the value function loss. Typical: 0.5.
entropy_coef: f64Weight on the entropy bonus (encourages exploration). Typical: 0.01.
gamma: f64Discount factor.
gae_lambda: f64GAE smoothing parameter. 1.0 = full Monte Carlo, 0.0 = TD(0).
Hidden layer sizes for the shared trunk.
max_grad_norm: f64Clip gradient norm to this value. Set to 0.0 to disable.
Implementations§
Trait Implementations§
Source§impl<'de> Deserialize<'de> for PpoConfig
impl<'de> Deserialize<'de> for PpoConfig
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Deserialize this value from the given Serde deserializer. Read more
Auto Trait Implementations§
impl Freeze for PpoConfig
impl RefUnwindSafe for PpoConfig
impl Send for PpoConfig
impl Sync for PpoConfig
impl Unpin for PpoConfig
impl UnsafeUnpin for PpoConfig
impl UnwindSafe for PpoConfig
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more