Struct border_core::core::trainer::Trainer[−][src]

pub struct Trainer<E: Env, A: Agent<E>> { /* fields omitted */ }

Expand description

Manages training process.

Training loop

For training an agent with standard RL algorithms in the library, the agent and environment interact as illustrated in the following diagram:

flowchart TB Trainer -. 0. Env::reset .-> Env Env --> Obs ObsPrev -- 3. Policy::sample --> Policy Policy --> Act Act -- 4. Env::step --> Env Obs --> Step Obs -- 1. RefCell::replace --> ObsPrev Act --> Step ObsPrev -- 2. Agent::push_obs --> ObsPrev' Step -- 5. Agent::observe --> Transition subgraph Agent ObsPrev' --> Transition ReplayBuffer -- 6. update policy parameters --- Policy Transition --> ReplayBuffer end

Call Env::reset for resetting the enbironment and getting an observation. An episode starts.
Call std::cell::RefCell::replace for placing the observation in PrevObs.
Call Agent::push_obs for placing the observation in PrevObs'.
Call Policy::sample for sampling an action from Policy.
Call Env::step for taking an action, getting a new observation, and creating Step object.
Call Agent::observe for updating the replay buffer with the new and previous observations.
Call some methods in the agent for updating policy parameters.
Back to 1.

Actually, Trainer is not responsible for the step 6. The Agent does it.

Model evaluation and saving

Trainer::train() evaluates the agent being trained with the interval of optimization steps specified by TrainerBuilder::eval_interval(). If the evaluation reward is greater than the maximum in the history of training, the agent will be saved in the directory specified by TrainerBuilder::model_dir().

A trained agent often consists of a number of neural networks like an action-value network, its target network, a policy network. Typically, Agent saves all of these neural networks in a directory.

Implementations

impl<E: Env, A: Agent<E>> Trainer<E, A>

pub fn get_agent(&self) -> &impl Agent<E>

Get the reference to the agent.

pub fn get_env(&self) -> &E

Get the reference to the environment.

pub fn get_env_eval(&self) -> &E

Get the reference to the environment for evaluation.

pub fn train<T: Recorder>(&mut self, recorder: &mut T)

Train the agent.

In the training loop, the following values are recorded in the recorder:

n_steps - The nunber of steps interacting with the environment.
n_opts - The number of optimization steps.
datetime - Date and time.
mean_cum_eval_reward - Cumulative rewards in evaluation runs.

Auto Trait Implementations

impl<E, A> !RefUnwindSafe for Trainer<E, A>

impl<E, A> Send for Trainer<E, A> where
    A: Send,
    E: Send,
    <E as Env>::Obs: Send,

impl<E, A> !Sync for Trainer<E, A>

impl<E, A> Unpin for Trainer<E, A> where
    A: Unpin,
    E: Unpin,
    <E as Env>::Obs: Unpin,

impl<E, A> UnwindSafe for Trainer<E, A> where
    A: UnwindSafe,
    E: UnwindSafe,
    <E as Env>::Obs: UnwindSafe,

Blanket Implementations

impl<T> Any for T where
    T: 'static + ?Sized,

pub fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T where
    T: ?Sized,

pub fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T where
    T: ?Sized,

pub fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> From<T> for T

pub fn from(t: T) -> T

Performs the conversion.

impl<T, U> Into<U> for T where
    U: From<T>,

pub fn into(self) -> U

Performs the conversion.

impl<T> Pointable for T

pub const ALIGN: usize

The alignment of pointer.

type Init = T

The type for initializers.

pub unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more

pub unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more

pub unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more

pub unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more

impl<T, U> TryFrom<U> for T where
    U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

pub fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T where
    U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

pub fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.