1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
//! Random / sampling op family — Phase 4.5 (Category Q).
//!
//! Plan-per-op-family because the random ops have heterogeneous arg
//! shapes:
//!
//! - [`RandomPlan`] — pure-generator ops with no input tensor. Today
//! wires `Uniform`, `Normal` (f32 / f64 via cuRAND) and `Bernoulli`
//! (Bool output via cuRAND uniform + custom threshold kernel).
//!
//! - [`DropoutPlan`] / [`DropoutBackwardPlan`] — dropout takes an input
//! tensor and returns both the output and the saved mask; backward
//! replays the mask against `dy`. f32 / f64 only.
//!
//! Multinomial / Randint / exponential / gamma / quasi-random / stateful
//! RNG replay are reserved for future milestones (see the
//! `~/.claude/plans/warm-prancing-comet.md` Phase 4 deferral list).
//!
//! ## Generator lifetime
//!
//! cuRAND generators are stateful objects that bind to a CUDA stream.
//! Each `*Plan` creates one lazily on first `run` (the generator API
//! requires a live CUDA context, which `select()` cannot rely on having
//! at construction time) and rebinds it via `curandSetStream` on every
//! call so the plan is reusable across streams. The handle is destroyed
//! on `Drop`. cuRAND generators are *not* thread-safe — the plan is
//! `!Sync` by virtue of the `Cell<curandGenerator_t>` it holds.
// Phase 46 — FlashInfer sort-free top-K/top-P/min-P sampling.
// Phase 66 Tier 2 — per-row sampling + speculative-decode verification.
// Phase 66 Tier 2 — bespoke token-penalty logit transform.
pub use ;
pub use ;
// Phase 46 — sort-free sampling re-exports.
pub use ;
// Phase 66 Tier 2 re-exports.
pub use ;
pub use ;