amx-sys
Low-level AMX instruction emulation - Hardware-faithful implementation
This crate provides direct bindings to all 23 AMX (Apple Matrix eXtensions) instructions with a faithful emulation of the Apple silicon behavior.
Features
- ✅ All 23 AMX instructions
- ✅ Complete register file emulation (8 X, 8 Y, 64 Z registers)
- ✅ Full data type support (i8-i64, u8-u64, f32, f64)
- ✅ All element sizes (B8, B16, B32, B64)
- ✅ All shuffle modes (S0, S1, S2, S3)
- ✅ 100% parity with C reference implementation
- ✅ 100 tests (100% pass rate)
Instructions
Load/Store (8)
ldx,ldy,ldz,ldzi- Load X, Y, Z registersstx,sty,stz,stzi- Store registers
Extract (2)
extrx- Extract from X register with shuffleextry- Extract from Y register with shuffle
Floating-Point (6)
fma16,fma32,fma64- Multiply-accumulatefms16,fms32,fms64- Multiply-subtract
Integer (2)
mac16- Signed 16-bit multiply-accumulatemac16_unsigned- Unsigned multiply-accumulate
Vector (2)
vecint- Vector integer operationsvecfp- Vector floating-point operations
Matrix (2)
matint- Matrix integer operationsmatfp- Matrix floating-point operations
Lookup Table (1)
genlut- Generate lookup table
Usage
use AmxState;
use *;
use fma32;
let mut state = new;
// Load data
let data = ;
ldx;
ldy;
ldz;
// Perform FMA
fma32;
// Store result
let result = stx;
Testing
All 23 instructions are thoroughly tested with:
- 100 test cases (100% pass rate)
- Randomized inputs (xoshiro256++ RNG)
- All data types and element sizes
- 1,000+ iterations of stress testing
Run tests:
Benchmarking
Run the comprehensive IO-vs-compute benchmark:
The benchmark reports CSV-style rows with separate io_only_ns, compute_only_ns, and end_to_end_ns timings for:
- Extract shuffle modes S0-S3
- FMA and FMS precisions
- MAC16 signed and unsigned
- VECINT and MATINT for all signed and unsigned element sizes
- VECFP and MATFP for all floating-point precisions
- GENLUT for all supported element sizes
Useful filters:
AMX_BENCH_FILTER=matfp AMX_BENCH_SAMPLES=3
Performance
- All operations run in constant time
- Zero heap allocations
- Pure Rust with minimal unsafe code
License
MIT OR Apache-2.0