Skip to main content

Module structured

Module structured 

Source
Expand description

§Structured Pruning

Removes entire structural units — channels, filters, or attention heads — rather than individual weights. Structured pruning produces weight matrices with rows or columns of zeros that can be physically removed, yielding real hardware speedups (unlike unstructured sparsity which requires special sparse kernels).

§Granularities

GranularityUnit removedLayout assumption
ChannelOutput channel[n_out, n_in] row-major
FilterConvolutional filter[n_filters, filter_size] flat
HeadAttention head[n_heads × head_dim, ...]

Structs§

StructuredPruner
Removes structural units based on L2 norm importance.

Enums§

PruneGranularity
Structural unit to remove during pruning.