1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
//! BLAS Level 3 — matrix-matrix operations.
//!
//! This module provides GPU-accelerated Level 3 BLAS routines:
//!
//! | Routine | Operation |
//! |---------|-----------|
//! | [`fn@gemm`] | General matrix multiply: `C = alpha * op(A) * op(B) + beta * C` |
//! | [`fn@symm`] | Symmetric matrix multiply: `C = alpha * A * B + beta * C` |
//! | [`fn@trsm`] | Triangular solve: `op(A) * X = alpha * B` |
//! | [`fn@syrk`] | Symmetric rank-k update: `C = alpha * A * A^T + beta * C` |
//! | [`fn@syr2k`] | Symmetric rank-2k update |
//! | [`fn@trmm`] | Triangular matrix multiply: `B = alpha * op(A) * B` |
//! | [`fn@batched_trsm`] | Batched triangular solve (many small systems) |
//! | [`fn@stream_k_gemm`] | Stream-K GEMM with dynamic load balancing |
//!
//! The GEMM dispatcher is the core engine, selecting optimal tile
//! configurations, generating PTX via [`oxicuda_ptx::templates::gemm::GemmTemplate`], and caching
//! compiled kernels.
pub use batched_trsm;
pub use ;
pub use EpilogueOp;
pub use gemm;
pub use ;
pub use ;
pub use symm;
pub use syr2k;
pub use syrk;
pub use trmm;
pub use trsm;