1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
//! Core trait definitions for the kernel dispatch system.
//!
//! [`Kernel`] is a marker trait that associates typed input/output with a
//! compute operation. [`KernelDispatch<K>`] is implemented by each concrete
//! backend device (`CudaDevice`, `VulkanDevice`, `CpuDevice`) **and** by
//! [`DeviceBackend`] itself, so callers never need to name the backend:
//!
//! ```rust,ignore
//! let output = backend.run::<CompareScore>(input)?;
//! ```
//!
//! # Adding a new kernel
//!
//! 1. Create `src/kernels/my_kernel.rs`, define a marker struct + typed
//! `Input`/`Output` and `impl Kernel for MyKernel`.
//! 2. For CUDA add `src/backend/cuda/launch/my_kernel.rs`
//! with `impl KernelDispatch<MyKernel> for CudaDevice`.
//! 3. Add the CPU fallback `impl KernelDispatch<MyKernel> for CpuDevice` in
//! `src/backend/cpu/launch/my_kernel.rs`.
//! 4. Add `impl KernelDispatch<MyKernel> for DeviceBackend` in
//! `src/backend/mod.rs` (a match that delegates to the above).
//! 5. Register the new kernel in `build.rs` so it gets compiled to PTX.
//!
//! [`DeviceBackend`]: crate::backend::DeviceBackend
use crateGpuError;
/// Marker trait that binds typed `Input` and `Output` to a compute operation.
///
/// Implement this on a zero-sized marker struct, the struct itself carries no
/// data; it just names the operation so Rust can resolve the right dispatch.
/// Execute kernel `K` on `self`.
///
/// Implemented by:
/// - Backend devices (`CudaDevice`, `CpuDevice`), the actual
/// upload / launch / download logic lives here.
/// - [`DeviceBackend`], a thin match that delegates to the active variant.
///
/// Callers should go through [`DeviceBackend::run`] rather than calling
/// `dispatch` directly.
///
/// [`DeviceBackend`]: crate::backend::DeviceBackend
/// [`DeviceBackend::run`]: crate::backend::DeviceBackend::run