microcnn
A minimal CNN framework in Rust with INT8 and INT4 quantization.
Features
- FP32, INT8, and INT4 inference
- Post-training quantization with calibration
- NEON SIMD acceleration (aarch64)
- Multiple convolution algorithms (Naive, Im2col, Winograd, FFT)
- Reference LeNet-5 implementation for MNIST
Benchmarks
Tested on LeNet-5 with 1000 MNIST samples (Apple Silicon, NEON SIMD enabled):
| Precision | Inference Time | Memory | Speedup | Savings | Accuracy |
|---|---|---|---|---|---|
| FP32 | 688.4ms | 241 KB | 1.00x | — | 98.7% |
| INT8 | 120.8ms | 61 KB | 5.70x | 75% | 98.7% |
| INT4 | 845.3ms | 31 KB | 0.81x | 87% | 95.3% |
Per-Layer Performance
| Layer | Type | FP32 Time | INT8 Time | INT4 Time | INT8 MSE | INT4 MSE |
|---|---|---|---|---|---|---|
| 0 | Conv2d | 0.13ms | 0.05ms | 0.23ms | 0.000115 | 0.022301 |
| 1 | ReLU | 0.01ms | 0.00ms | 0.01ms | 0.000080 | 0.017441 |
| 2 | MaxPool2d | 0.01ms | 0.01ms | 0.01ms | 0.000087 | 0.019562 |
| 3 | Conv2d | 0.31ms | 0.04ms | 0.47ms | 0.000822 | 0.213480 |
| 4 | ReLU | 0.00ms | 0.00ms | 0.00ms | 0.000188 | 0.059043 |
| 5 | MaxPool2d | 0.00ms | 0.00ms | 0.00ms | 0.000370 | 0.116998 |
| 6 | Conv2d | 0.19ms | 0.01ms | 0.09ms | 0.000895 | 0.331971 |
| 7 | ReLU | 0.00ms | 0.00ms | 0.00ms | 0.000383 | 0.124720 |
| 8 | Linear | 0.02ms | 0.01ms | 0.02ms | 0.000362 | 0.202737 |
| 9 | ReLU | 0.00ms | 0.00ms | 0.00ms | 0.000174 | 0.096129 |
| 10 | Linear | 0.00ms | 0.00ms | 0.00ms | 0.001202 | 1.060178 |
| 11 | Softmax | 0.00ms | 0.00ms | 0.00ms | 0.000000 | 0.000236 |
Convolution Algorithm Comparison (FP32)
| Algorithm | Total Time | Per Image | Speedup | Max Error vs Naive |
|---|---|---|---|---|
| Naive | 685.5ms | 685.5µs | 1.00x | — |
| Im2col | 553.3ms | 553.3µs | 1.24x | 1.86e-7 |
| Winograd | 552.7ms | 552.7µs | 1.24x | 1.86e-7 |
| FFT | 7996.8ms | 7996.8µs | 0.09x | 9.54e-7 |
SIMD Im2col Performance
| Layer | FP32 Im2col | INT8 Im2col | INT8 Speedup |
|---|---|---|---|
| Conv2d #0 | 68.0µs | 48.4µs | 1.40x |
| Conv2d #1 | 105.3µs | 39.4µs | 2.67x |
| Conv2d #2 | 320.5µs | 5.4µs | 59.01x |
| Total | 493.8µs | 93.3µs | 5.29x |
Key findings:
- INT8 achieves 5.70x speedup with zero accuracy loss
- INT4 reduces memory by 87% with only 3.4% accuracy drop
- Conv2d layers benefit most from quantization (up to 59x speedup on layer 2)
- Im2col and Winograd provide 1.24x speedup over naive convolution
- NEON SIMD delivers massive gains for INT8 convolutions
📊 View detailed benchmark results
Quick Start
Install
Running the above command will globally install the microcnn binary.
Install as library
Run the following Cargo command in your project directory:
Or add to your Cargo.toml:
[]
= "0.1"
Usage
use lenet;
let mut net = lenet;
net.load;
Running the Example
Or copy the example code directly into your main.rs:
use lenet;
use MNIST;
Requires MNIST data files in data/.
License
MIT