Expand description
write_slice plan — Phase 13.1 trailblazer.
write_slice(dest, source, ranges) -> dest:
dest[start_0..end_0, ..., start_{N-1}..end_{N-1}] = source
Assign semantics (not accumulate — that distinguishes
WriteSlicePlan from ScatterAddPlan). Drives Fuel team’s
persistent KV-cache append during autoregressive decoding —
step 9c E.3.3 of their Phase 7.6 integration.
Dtype coverage spans the entire baracuda element bank via
byte-width dispatch (sizeof(T) ∈ {1, 2, 4, 8, 16}), with a
separate nibble-packed kernel for [S4] / [U4]. Bound is
T: DeviceRepr + Copy + 'static (same as TensorRef) so the
same plan covers Element-family, IntElement-family, and
FpElement-family dtypes uniformly.
No backward — write_slice is non-differentiable in Fuel’s
autograd model.
§Fast paths
- Full-width minor axes — when
ranges[i] == (0, dest_shape[i])for alli > 0, the source maps to one contiguous chunk ofdeststarting at offsetstart_0 * stride[0] * sizeof(T). A singlecuMemcpyDtoDAsyncdoes the copy. This is the KV-cache append shape and the most performance-critical case. - Whole dest covered — when source-shape == dest-shape and
ranges fully cover dest, a single
cuMemcpyDtoDAsyncof the whole buffer (degenerate of case 1). - Otherwise — generic per-slab-element kernel. One thread per
source element computes the dest linear offset from the slab
coord shifted by
range_start.
§S4 / U4 constraint
Nibble-packed dtypes pack two elements per u8. To avoid
read-modify-write across the byte boundary, the trailblazer
requires that start_{N-1} and end_{N-1} on the innermost axis
be even. A non-even innermost range returns
Error::Unsupported at select time.
Structs§
- Write
Slice Args - Args bundle for a
write_slicelaunch. - Write
Slice Descriptor - Descriptor for a
write_sliceop. - Write
Slice Plan write_sliceplan.