Expand description
Memory management for sparse inference.
This module provides weight quantization and neuron caching for efficient memory usage during inference.
Structsยง
- Cache
Stats - Cache statistics.
- Neuron
Cache - Neuron activation cache for hot/cold management.
- Quantized
Weights - Quantized weight storage for reduced memory usage.