1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
// RLX — versatile ML compiler + runtime.
// Copyright (C) 2026 Eugene Hauptmann, Nataliya Kosmyna.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, version 3.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with this program. If not, see <https://www.gnu.org/licenses/>.
//! Command-stream abstraction.
//!
//! Every GPU-shaped backend has the same shape: enqueue work, submit, wait.
//! - Metal: `MTLCommandBuffer` + `commit` + `waitUntilCompleted`
//! - CUDA: `cudaStream_t` + `cudaStreamSynchronize`
//! - ROCm: `hipStream_t` + `hipStreamSynchronize`
//! - wgpu: `CommandEncoder.finish()` → `Queue.submit()` → `Device.poll(Wait)`
//! - WASM (single-threaded): no-op (work runs synchronously)
//!
//! Hoisting this into one trait means:
//! - the runtime can drive *any* backend via the same submit-and-wait API
//! - new backends only need a thin command-stream impl
//! - test infrastructure works against the trait, not per-backend types
/// Per-backend command stream.
///
/// Implementations are free to be no-ops on synchronous backends (host CPU,
/// WASM): `submit` runs work eagerly, `wait` returns immediately.
/// Default implementation for synchronous backends — work has already
/// happened by the time `submit` is called.
;