pub struct HyperConnectionPlan<T: Element> { /* private fields */ }Expand description
Hyper-Connection forward plan (static-H, bf16 weights, Tier 1).
Formula (with M = Sinkhorn-Knopp(softmax_or_exp(H_res)),
s_pre = sigmoid(H_pre), s_post = 2 * sigmoid(H_post),
y_agg[b, c] = Σ_i s_pre[i] * x_expanded[b, i, c],
y_norm = RMSNorm(y_agg)):
out[b, i, c] = Σ_j M[i, j] * x_expanded[b, j, c] + s_post[i] * y_norm[b, c]
When to use: replace the bare x + sublayer(x) residual in a
transformer block when training a fresh model — mHC reports
improved training stability + downstream task scores in
DeepSeek-AI’s experiments.
Dtypes: f32 only in Tier 1. The rmsnorm_weight is always
bf16 regardless of T.
State: this plan owns a native MHCLayer* handle with
~B*n*C*sizeof(float) bytes of GPU scratch. Reuse across many
run() calls; construction is heavy.
Implementations§
Source§impl<T: Element> HyperConnectionPlan<T>
impl<T: Element> HyperConnectionPlan<T>
Sourcepub fn select(
_stream: &Stream,
desc: &HyperConnectionDescriptor,
_pref: PlanPreference,
) -> Result<Self>
pub fn select( _stream: &Stream, desc: &HyperConnectionDescriptor, _pref: PlanPreference, ) -> Result<Self>
Construct a plan for the given descriptor. Allocates the
internal MHCLayer scratch on the current CUDA context.
Returns Err(Error::Unsupported) if the mhc feature is off
or the descriptor is outside the Tier-1 SKU matrix.
Sourcepub fn can_implement(&self, args: &HyperConnectionArgs<'_, T>) -> Result<()>
pub fn can_implement(&self, args: &HyperConnectionArgs<'_, T>) -> Result<()>
Validate args against the descriptor.
Sourcepub fn workspace_size(&self) -> usize
pub fn workspace_size(&self) -> usize
Workspace size in bytes. Always zero — internal scratch lives
in the native handle (allocated at select time).
Sourcepub fn precision_guarantee(&self) -> PrecisionGuarantee
pub fn precision_guarantee(&self) -> PrecisionGuarantee
Numerical guarantees — deterministic, bit-stable on the same hardware (no atomicAdd on the FW path).