1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
//! # InvertedDoublePendulum-v5 (Rapier3D-backed)
//!
//! # Physics note
//!
//! This env simulates dynamics via Rapier3D, not MuJoCo. Observation shape,
//! action dimensionality, reward structure, and termination conditions match
//! Gymnasium v5 exactly. **Absolute reward values, learned policies, and
//! trained scores will NOT transfer to real Gymnasium/MuJoCo benchmarks
//! without retuning.**
//!
//! ## Layout
//!
//! Cart on a 1D slider plus **two chained poles**, both revolute-y. The agent
//! applies horizontal force to the cart to keep the tip of the upper pole as
//! close as possible to `y_tip = 2` (Gymnasium's convention, which maps to our
//! world-z axis). The pendulum is structurally identical to the shipped
//! [`crate::locomotion::inverted_pendulum`] with a second pole chained on top.
//!
//! * Cart: dynamic body, x-only translation, rotations locked.
//! * Pole1: dynamic capsule, revolute-y joint to cart. Mass from collider density.
//! * Pole2: dynamic capsule, revolute-y joint to pole1's top. Mass from density.
//! * Action: `Box(-1, 1, (1,))` — force target, scaled by `gear = [100]`.
//! * Observation (9-dim):
//! `[cart_x, sin θ₁, sin θ₂, cos θ₁, cos θ₂, cart_vx, θ̇₁, θ̇₂, F_ext_x]`.
//! θ₂ is the **relative** elbow angle (pole2 world − pole1 world), wrapped.
//! * Reward:
//! `alive_bonus − 0.01·x_tip² − (y_tip − 2)² − 1e-3·|ω₁| − 5e-3·|ω₂|`,
//! with `alive_bonus = 10.0` while healthy and `0` otherwise.
//! * Termination: `y_tip ≤ 1.0`, or non-finite state.
//! * Truncation: `max_steps = 1000`.
//!
//! ## Divergence from Gymnasium
//!
//! * `constraint_force_x` (`obs[8]`) is approximated by reading Rapier's
//! aggregated contact force on pole2 (`Rapier3DBackend::contact_force`).
//! MuJoCo's equivalent `cfrc_inv[0]` is a joint reaction force computed in
//! generalised coordinates. Signs and rough magnitudes follow the same
//! dynamics; absolute values will differ.
//! * `ω₂` is reported as world-frame angular velocity (not relative to pole1),
//! matching MuJoCo's `qvel` for the second hinge — i.e. it is the body's
//! absolute rate, not the rate of the relative joint angle.
pub use InvertedDoublePendulumAction;
pub use InvertedDoublePendulumConfig;
pub use ;
pub use InvertedDoublePendulumObservation;
pub use InvertedDoublePendulumState;