Expand description
§Structured Pruning
Removes entire structural units — channels, filters, or attention heads — rather than individual weights. Structured pruning produces weight matrices with rows or columns of zeros that can be physically removed, yielding real hardware speedups (unlike unstructured sparsity which requires special sparse kernels).
§Granularities
| Granularity | Unit removed | Layout assumption |
|---|---|---|
Channel | Output channel | [n_out, n_in] row-major |
Filter | Convolutional filter | [n_filters, filter_size] flat |
Head | Attention head | [n_heads × head_dim, ...] |
Structs§
- Structured
Pruner - Removes structural units based on L2 norm importance.
Enums§
- Prune
Granularity - Structural unit to remove during pruning.