# INT8 Quantization Support: Architecture Review
**Date**: 2025-12-25
**Scope**: FEAGI INT8 hardware support for Hailo-8 and similar accelerators
**Status**: π‘ PARTIALLY IMPLEMENTED - Foundation exists but NOT production-ready
---
## Executive Summary
**What You Built (Good News β
):**
1. β
Generic `NeuralValue` trait with `INT8Value` implementation
2. β
Quantization-aware membrane potential storage
3. β
Generic `RustNPU<R, T, B>` where `T: NeuralValue` (can be `f32` or `INT8Value`)
4. β
Hailo-8 HAL abstraction layer
5. β
Fixed-point leak coefficient (`INT8LeakCoefficient`)
**What's Missing (Bad News β):**
1. β **Synaptic contribution calculation ALWAYS uses f32** (line 54-67, `contribution.rs`)
2. β **No INT8 path for weightΓconductance multiplication**
3. β **PSP division uses u8 integer division** (precision loss!)
4. β **No scale factors or quantization parameters**
5. β **Hailo backend is a stub** (doesn't actually call HailoRT)
---
## Detailed Analysis
### 1. Generic Architecture (β
Correct Design)
```rust:107:111:feagi-core/crates/feagi-npu/burst-engine/src/npu.rs
pub struct RustNPU<
R: Runtime,
T: NeuralValue, // β Can be f32 OR INT8Value!
B: crate::backend::ComputeBackend<T, R::NeuronStorage<T>, R::SynapseStorage>,
> {
```
**This is excellent!** You can instantiate:
- `RustNPU<StdRuntime, f32, CPUBackend>` for desktop
- `RustNPU<EmbeddedRuntime, INT8Value, HailoBackend>` for embedded
### 2. INT8Value Implementation (β
Well-Designed)
```rust:128:179:feagi-core/crates/feagi-npu/neural/src/types/numeric.rs
pub struct INT8Value(pub i8);
impl INT8Value {
pub const MEMBRANE_MIN: f32 = -100.0;
pub const MEMBRANE_MAX: f32 = 50.0;
pub const MEMBRANE_RANGE: f32 = Self::MEMBRANE_MAX - Self::MEMBRANE_MIN;
pub const SCALE: f32 = 254.0;
pub const RESOLUTION: f32 = Self::MEMBRANE_RANGE / Self::SCALE;
// ...
}
impl NeuralValue for INT8Value {
fn from_f32(value: f32) -> Self { /* quantize */ }
fn to_f32(self) -> f32 { /* dequantize */ }
fn saturating_add(self, other: Self) -> Self { /* INT8 */ }
fn mul_leak(self, leak_coefficient: f32) -> Self { /* converts to f32! */ }
// ...
}
```
**Analysis:**
- β
Quantization range: [-100, 50] β [-127, 127] (i8)
- β
Resolution: 0.59 per i8 unit (150/254)
- β οΈ `mul_leak()` converts to f32 internally! (Line 175-178)
- β οΈ Only works on **CPU with floats**, NOT on pure INT8 hardware!
### 3. **CRITICAL ISSUE: Synaptic Contribution Always Uses f32**
```rust:54:67:feagi-core/crates/feagi-npu/neural/src/synapse/contribution.rs
pub fn compute_synaptic_contribution(
weight: u8,
conductance: u8,
synapse_type: SynapseType,
) -> f32 { // β ALWAYS RETURNS f32!
let w = weight as f32; // β Converts to float
let c = conductance as f32; // β Converts to float
let sign = match synapse_type {
SynapseType::Excitatory => 1.0,
SynapseType::Inhibitory => -1.0,
};
w * c * sign // β Float multiplication
}
```
**This function CANNOT run on Hailo/TPU accelerators!**
Called from:
```rust:226:feagi-core/crates/feagi-npu/burst-engine/src/synaptic_propagation.rs
SynapticContribution(compute_synaptic_contribution(weight, psp, synapse_type));
```
**Problem**: Even if `T=INT8Value` in `RustNPU`, the synaptic contribution is ALWAYS computed as `f32`.
### 4. PSP Division Bug (π¨ CRITICAL for Your Issue)
```rust:208:216:feagi-core/crates/feagi-npu/burst-engine/src/synaptic_propagation.rs
// PSP uniformity = false: PSP is divided among all outgoing synapses
let synapse_count = source_synapse_counts.get(&source_neuron).copied().unwrap_or(1);
if synapse_count > 1 {
// Divide PSP by number of outgoing synapses
base_psp / synapse_count as u8 // β u8 INTEGER DIVISION!
} else {
base_psp
}
```
**This is YOUR BUG!**
- `base_psp` is `u8` (0-255)
- Division: `1u8 / 10u8 = 0u8` (precision loss!)
- Then: `contribution = weight Γ 0 = 0.0f32`
**Why it might still fire**: If your genome has `conductance >> 1`, the division doesn't truncate to 0.
---
## What's Actually Supported
### Current Support Matrix
| Membrane potential storage | β
| β
| β |
| Threshold comparison | β
| β
| β |
| Leak application | β
| β οΈ (converts to f32) | β |
| Synapse weightΓconductance | β
| β (uses f32) | β |
| PSP division | π (float) | π (u8 int) | β |
| FCL accumulation | β
| β
| β |
| Hailo backend | N/A | N/A | π‘ (stub only) |
**Legend:**
- β
Fully working
- β οΈ Works but suboptimal (uses floats internally)
- π Bug exists
- β Not implemented
- π‘ Placeholder/stub
---
## What You're Missing for True INT8 Hardware Support
### 1. **INT8 Synaptic Contribution**
Need to add:
```rust
// Option A: Generic contribution calculation
pub fn compute_synaptic_contribution_generic<T: NeuralValue>(
weight: u8,
conductance: u8,
synapse_type: SynapseType,
) -> T {
// For f32: convert and multiply
// For INT8Value: use fixed-point math or lookup table
}
// Option B: Backend-specific implementation
trait ComputeBackend {
type Contribution;
fn compute_contribution(&self, weight: u8, conductance: u8, sign: i8) -> Self::Contribution;
}
```
### 2. **Fixed-Point PSP Division**
```rust
// Instead of: base_psp / synapse_count (loses precision)
// Use scaled division:
let scale = 256u16;
let scaled_psp = (base_psp as u16) * scale; // 1 * 256 = 256
let divided_psp = scaled_psp / (synapse_count as u16); // 256 / 10 = 25
// Keep in u16 for accumulation, OR divide by scale to get u8
```
### 3. **Quantization Scale Factors**
```rust
pub struct QuantizationConfig {
pub weight_scale: f32, // For host-side quantization
pub activation_scale: f32, // For host-side quantization
pub output_scale: f32, // For dequantization
pub zero_point: i8, // Offset for asymmetric quantization
}
```
### 4. **Hailo Backend Implementation**
```rust
impl ComputeBackend for HailoBackend {
fn process_burst(&mut self, ...) -> Result<...> {
// 1. Upload quantized data to Hailo
hailo.upload_neurons(...)?;
hailo.upload_synapses(...)?;
// 2. Run inference (INT8 ops on Hailo cores)
hailo.process_burst()?;
// 3. Download results
hailo.download_neurons(...)?;
Ok(...)
}
}
```
### 5. **INT8-Only Leak Calculation**
Currently `mul_leak()` converts to f32 (line 175-178). Need:
```rust
impl NeuralValue for INT8Value {
fn mul_leak(self, leak_coefficient: f32) -> Self {
// Option A: Fixed-point multiplication (no floats!)
let leak_i16 = INT8LeakCoefficient::from_f32(leak_coefficient);
let potential_i32 = (self.0 as i32) * (leak_i16.0 as i32);
let result = (potential_i32 / INT8LeakCoefficient::SCALE) as i8;
Self(result.max(-127))
// Option B: Lookup table (fastest, but uses memory)
// LEAK_TABLE[self.0 as usize][leak_index]
}
}
```
---
## Answer to "What Am I Missing?"
**You have the FOUNDATION, but not the FULL IMPLEMENTATION:**
1. **Architecture** (generic `RustNPU<R,T,B>`) β
2. **INT8Value type** β
3. **Quantization** β
BUT:
4. **Synaptic contribution** still uses f32 β
5. **PSP division** has precision loss π
6. **Hailo backend** is a stub β
7. **No scale factors** β
8. **Leak uses f32 internally** β οΈ
**Bottom line**: Your code will run on a CPU with `T=INT8Value`, but:
- It still uses **floats for multiplication** (synaptic contrib, leak)
- **PSP division bug** (your current issue!)
- Won't run on **pure INT8 hardware** (Hailo, TPU) without major refactoring
---
## Recommendations
### Immediate (Fix Your Bug):
1. **Fix PSP division** to use scaled arithmetic or float before division
2. Check your genome: `conductance` value is likely >> 1
### Short-term (INT8-ready):
1. Make `compute_synaptic_contribution` generic over `T: NeuralValue`
2. Add fixed-point PSP division with scale factors
3. Implement INT8-only leak (no f32 conversion)
### Long-term (Hailo support):
1. Implement Hailo FFI bindings
2. Add quantization-aware training pipeline
3. Create backend-specific test suite
4. Add scale factor management
Want me to fix the PSP division bug first, then we can discuss the path to true INT8 hardware support?