Skip to main content

Module feature_concat

Module feature_concat 

Source
Expand description

ADR-021 K5: GPU feature-axis concat (single-chunk strided copy).

Each invocation copies one [T, src_dim] f32 row-major slab into its slice of the concatenated [T, dst_stride] destination, at column offset dst_offset. Launching once per chunk (with varying dst_offset) builds the full [T, Σ src_dim_i] concatenated tensor — exactly the shape qwen3vl.cpp:186 ggml_concat(ctx0, embeddings, deepstack_features, 0) produces.

Pure copy (no FP arithmetic) → AC-1 byte-identical.

Statics§

FEATURE_CONCAT_SHADER_SOURCE

Functions§

dispatch_feature_concat_f32
Copy one [n_tokens, src_dim] f32 row-major chunk into the [n_tokens, dst_stride] destination at column dst_offset.
register