pub struct AwacConfig<Q, P>where
Q: SubModel2<Output = Tensor>,
Q::Config: DeserializeOwned + Serialize + Debug + PartialEq + Clone,
P: SubModel1<Output = (Tensor, Tensor)>,
P::Config: DeserializeOwned + Serialize + OutDim + Debug + PartialEq + Clone,{Show 16 fields
pub actor_config: GaussianActorConfig<P::Config>,
pub critic_config: MultiCriticConfig<Q::Config>,
pub gamma: f64,
pub inv_lambda: f64,
pub tau: f64,
pub min_lstd: f64,
pub max_lstd: f64,
pub n_updates_per_opt: usize,
pub batch_size: usize,
pub critic_loss: CriticLoss,
pub reward_scale: f32,
pub n_critics: usize,
pub exp_adv_max: f64,
pub seed: Option<i64>,
pub device: Option<Device>,
pub adv_softmax: bool,
}Expand description
Configuration of Awac.
Fields§
§actor_config: GaussianActorConfig<P::Config>Configuration of the actor model.
critic_config: MultiCriticConfig<Q::Config>Configuration of the critic model.
gamma: f64Discont factor.
inv_lambda: f64The inverse of lambda in the paper.
tau: f64Target smoothing coefficient.
This is a real number between 0 and 1. A value of 0.001 makes the target network parameters adapt very slowly to the critic network parameters.
Formula: target_params = tau * critic_params + (1.0 - tau) * target_params
min_lstd: f64Minimum value of the log of the standard deviation of the action distribution.
max_lstd: f64Maximum value of the log of the standard deviation of the action distribution.
n_updates_per_opt: usizeNumber of parameter updates per optimization step.
batch_size: usizeBatch size for training.
critic_loss: CriticLossType of critic loss function.
reward_scale: f32Scaling factor for rewards.
n_critics: usizeNumber of critics used.
exp_adv_max: f64Maximum of exponent of advantage.
seed: Option<i64>Random seed value (optional).
device: Option<Device>Device used for the actor and critic models (e.g., CPU or GPU).
adv_softmax: boolIf true, advantage weights are calculated with softmax within each mini-batch.
Implementations§
Source§impl<Q, P> AwacConfig<Q, P>
impl<Q, P> AwacConfig<Q, P>
Sourcepub fn n_updates_per_opt(self, v: usize) -> Self
pub fn n_updates_per_opt(self, v: usize) -> Self
Sets the numper of parameter update steps per optimization step.
Sourcepub fn batch_size(self, v: usize) -> Self
pub fn batch_size(self, v: usize) -> Self
Batch size.
Sourcepub fn discount_factor(self, v: f64) -> Self
pub fn discount_factor(self, v: f64) -> Self
Discount factor.
Sourcepub fn reward_scale(self, v: f32) -> Self
pub fn reward_scale(self, v: f32) -> Self
Reward scale.
It works for obtaining target values, not the values in logs.
Sourcepub fn critic_loss(self, v: CriticLoss) -> Self
pub fn critic_loss(self, v: CriticLoss) -> Self
Critic loss.
Sourcepub fn actor_config(self, actor_config: GaussianActorConfig<P::Config>) -> Self
pub fn actor_config(self, actor_config: GaussianActorConfig<P::Config>) -> Self
Configuration of actor.
Sourcepub fn critic_config(self, critic_config: MultiCriticConfig<Q::Config>) -> Self
pub fn critic_config(self, critic_config: MultiCriticConfig<Q::Config>) -> Self
Configuration of critic.
Sourcepub fn adv_softmax(self, b: bool) -> Self
pub fn adv_softmax(self, b: bool) -> Self
If true, advantage weights are calculated with softmax within each mini-batch.
Trait Implementations§
Source§impl<Q, P> Clone for AwacConfig<Q, P>
impl<Q, P> Clone for AwacConfig<Q, P>
Source§impl<Q, P> Debug for AwacConfig<Q, P>
impl<Q, P> Debug for AwacConfig<Q, P>
Source§impl<Q, P> Default for AwacConfig<Q, P>
impl<Q, P> Default for AwacConfig<Q, P>
Source§impl<'de, Q, P> Deserialize<'de> for AwacConfig<Q, P>
impl<'de, Q, P> Deserialize<'de> for AwacConfig<Q, P>
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Source§impl<Q, P> PartialEq for AwacConfig<Q, P>
impl<Q, P> PartialEq for AwacConfig<Q, P>
Source§impl<Q, P> Serialize for AwacConfig<Q, P>
impl<Q, P> Serialize for AwacConfig<Q, P>
impl<Q, P> StructuralPartialEq for AwacConfig<Q, P>
Auto Trait Implementations§
impl<Q, P> Freeze for AwacConfig<Q, P>
impl<Q, P> RefUnwindSafe for AwacConfig<Q, P>where
<P as SubModel1>::Config: Sized + RefUnwindSafe,
<Q as SubModel2>::Config: Sized + RefUnwindSafe,
impl<Q, P> Send for AwacConfig<Q, P>
impl<Q, P> Sync for AwacConfig<Q, P>
impl<Q, P> Unpin for AwacConfig<Q, P>
impl<Q, P> UnwindSafe for AwacConfig<Q, P>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more