1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
//! Sparse algorithm contracts for backend consistency
//!
//! This module defines traits that ensure all backends implement the same
//! mathematical algorithms for sparse operations. This guarantees numerical
//! parity across CPU, CUDA, WebGPU, and other backends.
//!
//! # Design Principles
//!
//! 1. **Algorithm-Level Contract**: Each trait method represents a specific algorithm
//! 2. **Backend Parity**: Same algorithm must produce same results (within FP tolerance)
//! 3. **Explicit Contracts**: Missing implementations cause compile errors
//! 4. **Testability**: Easy to verify all backends implement the same algorithm
use crateDType;
use crateResult;
use crateRuntime;
use crate;
use crateTensor;
/// Algorithmic contract for sparse matrix operations
///
/// All backends implementing sparse operations MUST implement this trait
/// using the EXACT SAME ALGORITHMS to ensure numerical parity.
///
/// # Algorithms Defined
///
/// - **ESC SpGEMM**: Exact Symbolic Computation with Hash Accumulation
/// - Phase 1: Symbolic (count NNZ per output row using HashSet)
/// - Phase 2: Numeric (compute values with pre-sized HashMap, filter by zero_tolerance)
///
/// - **Column-Parallel DSMM**: Dense × Sparse Matrix Multiplication
/// - For each column j of sparse matrix B (CSC format)
/// - For each non-zero `B[k,j]` in column j
/// - Compute `C[:,j]` += `A[:,k]` * `B[k,j]`
///
/// # Implementation Requirements
///
/// Backends may differ in:
/// - Parallelization strategy (threads, SIMD, GPU blocks)
/// - Memory access patterns (coalescing, vectorization)
/// - Loop unrolling and compiler optimizations
///
/// Backends MUST match in:
/// - Mathematical formula (same operations in same order)
/// - Accumulator precision (e.g., F32 accumulator for F16 inputs)
/// - Special case handling (NaN, Inf, zero_tolerance filtering)
/// - Result ordering (sorted by column index for SpGEMM)
// ============================================================================
// Helper Functions (Shared across backends)
// ============================================================================
/// Zero tolerance threshold for filtering small values
pub use cratezero_tolerance;
/// Validate CSR matrix dimensions for SpGEMM
/// Validate Dense × Sparse dimensions for DSMM
/// Validate dtypes match for sparse operations