amx-sys 0.0.1

Low-level AMX (Apple Matrix eXtensions) instruction emulation - hardware-faithful implementation
Documentation
  • Coverage
  • 75.44%
    86 out of 114 items documented1 out of 60 items with examples
  • Size
  • Source code size: 168.3 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 6.11 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 1m Average build duration of successful builds.
  • all releases: 1m Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • eugenehp/amx-rs
    1 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • eugenehp

amx-sys

Low-level AMX instruction emulation - Hardware-faithful implementation

This crate provides direct bindings to all 23 AMX (Apple Matrix eXtensions) instructions with a faithful emulation of the Apple silicon behavior.

Features

  • ✅ All 23 AMX instructions
  • ✅ Complete register file emulation (8 X, 8 Y, 64 Z registers)
  • ✅ Full data type support (i8-i64, u8-u64, f32, f64)
  • ✅ All element sizes (B8, B16, B32, B64)
  • ✅ All shuffle modes (S0, S1, S2, S3)
  • ✅ 100% parity with C reference implementation
  • ✅ 100 tests (100% pass rate)

Instructions

Load/Store (8)

  • ldx, ldy, ldz, ldzi - Load X, Y, Z registers
  • stx, sty, stz, stzi - Store registers

Extract (2)

  • extrx - Extract from X register with shuffle
  • extry - Extract from Y register with shuffle

Floating-Point (6)

  • fma16, fma32, fma64 - Multiply-accumulate
  • fms16, fms32, fms64 - Multiply-subtract

Integer (2)

  • mac16 - Signed 16-bit multiply-accumulate
  • mac16_unsigned - Unsigned multiply-accumulate

Vector (2)

  • vecint - Vector integer operations
  • vecfp - Vector floating-point operations

Matrix (2)

  • matint - Matrix integer operations
  • matfp - Matrix floating-point operations

Lookup Table (1)

  • genlut - Generate lookup table

Usage

use amx_sys::registers::AmxState;
use amx_sys::instructions::ldst::*;
use amx_sys::instructions::fma::fma32;

let mut state = AmxState::new();

// Load data
let data = [0u8; 64];
ldx(&mut state, 0, &data);
ldy(&mut state, 0, &data);
ldz(&mut state, 0, &data);

// Perform FMA
fma32(&mut state, 0, 0, 0);

// Store result
let result = stx(&state, 0);

Testing

All 23 instructions are thoroughly tested with:

  • 100 test cases (100% pass rate)
  • Randomized inputs (xoshiro256++ RNG)
  • All data types and element sizes
  • 1,000+ iterations of stress testing

Run tests:

cargo test

Benchmarking

Run the comprehensive IO-vs-compute benchmark:

cargo bench -p amx-sys --bench io_vs_compute

The benchmark reports CSV-style rows with separate io_only_ns, compute_only_ns, and end_to_end_ns timings for:

  • Extract shuffle modes S0-S3
  • FMA and FMS precisions
  • MAC16 signed and unsigned
  • VECINT and MATINT for all signed and unsigned element sizes
  • VECFP and MATFP for all floating-point precisions
  • GENLUT for all supported element sizes

Useful filters:

AMX_BENCH_FILTER=matfp AMX_BENCH_SAMPLES=3 cargo bench -p amx-sys --bench io_vs_compute

Performance

  • All operations run in constant time
  • Zero heap allocations
  • Pure Rust with minimal unsafe code

License

MIT OR Apache-2.0