pub struct ValuesOptConfig<MB, OC = AdamConfig> {
    pub state_value_fn_config: MB,
    pub optimizer_config: OC,
    pub advantage_fn: AdvantageFn,
    pub target: StepValueTarget,
    pub opt_steps_per_update: u64,
    pub max_discount_factor: f64,
}
Expand description

Configuration for ValuesOpt

Fields

state_value_fn_config: MB

Configuration for the state value function module.

optimizer_config: OC

Configuration for the state value function module optimizer.

advantage_fn: AdvantageFn

Strategy for calculating advantage estimates given a state value function module.

target: StepValueTarget

Strategy for calculating state value target values.

The state value module is updated to minimize its mean-squared-error to these targets.

opt_steps_per_update: u64

Number of optimization steps per update.

Design Note

Could be called num_epochs by analogy to supervised learning as the number of passes through the dataset in which the dataset is collected experience since the last agent update. However, the term “epoch” is used inconsistently in reinforcement learning, sometimes referring to an iteration of the collect-data-then-update-agent loop.

max_discount_factor: f64

Upper bound on the environment discount factor.

Effectively sets a maximum horizon on the number of steps of future reward considered. Low values bias the value estimates but reduce variance.

Trait Implementations

Returns a copy of the value. Read more

Performs copy-assignment from source. Read more

Formats the value using the given formatter. Read more

Returns the “default value” for a type. Read more

Deserialize this value from the given Serde deserializer. Read more

This method tests for self and other values to be equal, and is used by ==. Read more

This method tests for !=.

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Convert into an Any trait reference.

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The alignment of pointer.

The type for initializers.

Initializes a with the given initializer. Read more

Dereferences the given pointer. Read more

Mutably dereferences the given pointer. Read more

Drops the object pointed to by the given pointer. Read more

Should always be Self

The resulting type after obtaining ownership.

Creates owned data from borrowed data, usually by cloning. Read more

Uses borrowed data to replace owned data, usually by cloning. Read more

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.