docs.rs failed to build amx-sys-0.0.3
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build:
amx-sys-0.0.1
amx-sys
Low-level AMX instruction emulation - Hardware-faithful implementation
This crate provides direct bindings to all 23 AMX (Apple Matrix eXtensions) instructions with a faithful emulation of the Apple silicon behavior.
Features
- ✅ All 23 AMX instructions
- ✅ Complete register file emulation (8 X, 8 Y, 64 Z registers)
- ✅ Full data type support (i8-i64, u8-u64, f32, f64)
- ✅ All element sizes (B8, B16, B32, B64)
- ✅ All shuffle modes (S0, S1, S2, S3)
- ✅ 100% parity with C reference implementation
- ✅ 100 tests (100% pass rate)
Instructions
Load/Store (8)
ldx,ldy,ldz,ldzi- Load X, Y, Z registersstx,sty,stz,stzi- Store registers
Extract (2)
extrx- Extract from X register with shuffleextry- Extract from Y register with shuffle
Floating-Point (6)
fma16,fma32,fma64- Multiply-accumulatefms16,fms32,fms64- Multiply-subtract
Integer (2)
mac16- Signed 16-bit multiply-accumulatemac16_unsigned- Unsigned multiply-accumulate
Vector (2)
vecint- Vector integer operationsvecfp- Vector floating-point operations
Matrix (2)
matint- Matrix integer operationsmatfp- Matrix floating-point operations
Lookup Table (1)
genlut- Generate lookup table
Usage
use AmxState;
use *;
use fma32;
let mut state = new;
// Load data
let data = ;
ldx;
ldy;
ldz;
// Perform FMA
fma32;
// Store result
let result = stx;
Testing
All 23 instructions are thoroughly tested with:
- 100 test cases (100% pass rate)
- Randomized inputs (xoshiro256++ RNG)
- All data types and element sizes
- 1,000+ iterations of stress testing
Run tests:
Benchmarking
Run the comprehensive IO-vs-compute benchmark:
The benchmark reports CSV-style rows with separate io_only_ns, compute_only_ns, and end_to_end_ns timings for:
- Extract shuffle modes S0-S3
- FMA and FMS precisions
- MAC16 signed and unsigned
- VECINT and MATINT for all signed and unsigned element sizes
- VECFP and MATFP for all floating-point precisions
- GENLUT for all supported element sizes
Useful filters:
AMX_BENCH_FILTER=matfp AMX_BENCH_SAMPLES=3
Performance
- All operations run in constant time
- Zero heap allocations
- Pure Rust with minimal unsafe code
License
MIT OR Apache-2.0