rlx-rl
Flow-map generative policies with Flow Map Q-Guidance (FMQ) and Q-Guided Beam Search (QGBS) on RLX (arxiv:2605.12416).
Reference implementation (JAX): /Users/Shared/q-guided-flow-map-policies — see REFERENCE.md for a module-by-module map.
Design
| Principle | Implementation |
|---|---|
| MLP actor/critic | rlx-ir graphs in graph/ — not rlx-flow |
| CPU + autodiff | Session::new(Device::Cpu) + legalize_broadcast → grad_with_loss |
| No sim bindings | Implement RlEnv; store Transition in ReplayBuffer |
| Optional QGBS at eval | EvalConfig::with_qgbs → Algorithm 2 over CompiledFlowMapAgent |
| Offline ESD + curriculum | flow_curriculum + distillation (mf / lsd / psd) |
Plug in your environment
use ;
let spec = RlSpec ;
let mut trainer = new;
// Offline CFM from demonstrations
trainer.offline_pretrain;
// Online FMQ (no simulator inside RLX)
let mut env = default;
trainer.online_finetune;
// Eval: one-step (default)
let r0 = trainer.eval_rollout;
// Eval: optional QGBS
let eval = with_qgbs;
let r1 = trainer.eval_rollout;
Custom online loop without RlEnv:
let tr: Transition = /* from your stack */;
trainer.online_step_from_transition;
Toy example (feature toy)
Flow map + FMQ
[ X_{r,t}(a_r \mid s) = a_r + (t-r), u_{r,t}(a_r \mid s), \quad a_1 = X_{0,1}(a_0 \mid s) ]
Online FMQ: project (a_1) with (\nabla_a Q) inside a trust region, then regress (u_{0,1}) toward (a_1^* - a_0).
License
GPL-3.0-only.