Crate rurel [−] [src]
Rurel is a flexible, reusable reinforcement learning (Q learning) implementation in Rust.
Implement the Agent and State traits for your process, then create an AgentTrainer and train it for your process.
Basic Example
The following example defines the State
as a position on a 21x21 2D matrix. The Action
s
that can be taken are: go up, go down, go left and go right. Positions closer to (10, 10) are
assigned a higher reward.
After training, the AgentTrainer will have assigned higher values to actions which move closer to (10, 10).
use rurel::mdp::{State, Agent}; #[derive(PartialEq, Eq, Hash, Clone)] struct MyState { x: i32, y: i32 } #[derive(PartialEq, Eq, Hash, Clone)] struct MyAction { dx: i32, dy: i32 } impl State for MyState { type A = MyAction; fn reward(&self) -> f64 { // Negative Euclidean distance -((((10 - self.x).pow(2) + (10 - self.y).pow(2)) as f64).sqrt()) } fn actions(&self) -> Vec<MyAction> { vec![MyAction { dx: 0, dy: -1 }, // up MyAction { dx: 0, dy: 1 }, // down MyAction { dx: -1, dy: 0 }, // left MyAction { dx: 1, dy: 0 }, // right ] } } struct MyAgent { state: MyState } impl Agent<MyState> for MyAgent { fn current_state(&self) -> &MyState { &self.state } fn take_action(&mut self, action: &MyAction) -> () { match action { &MyAction { dx, dy } => { self.state = MyState { x: (((self.state.x + dx) % 21) + 21) % 21, // (x+dx) mod 21 y: (((self.state.y + dy) % 21) + 21) % 21, // (y+dy) mod 21 } } } } } use rurel::AgentTrainer; use rurel::strategy::learn::QLearning; use rurel::strategy::explore::RandomExploration; use rurel::strategy::terminate::FixedIterations; let mut trainer = AgentTrainer::new(); let mut agent = MyAgent { state: MyState { x: 0, y: 0 }}; trainer.train(&mut agent, &QLearning::new(0.2, 0.01, 2.), &mut FixedIterations::new(100000), &RandomExploration::new()); // Test to see if it worked let test_state = MyState { x: 10, y: 9 }; let go_up = MyAction { dx: 0, dy: -1 }; let go_down = MyAction { dx: 0, dy: 1}; // Going down is better than going up assert!(trainer.expected_value(&test_state, &go_down) > trainer.expected_value(&test_state, &go_up));
Modules
mdp | |
strategy |
Structs
AgentTrainer |
An |