hardware 0.0.5

A no_std bare-metal hardware abstraction layer — all port I/O, memory and swap allocations are guarded at runtime. Do not consider this dependency stable before x.1.x
Documentation

hardware

A no_std Rust crate for bare-metal hardware abstraction. Zero dependencies, no allocator, no standard library — raw syscalls and direct hardware access, with runtime safety guards.

Warning

This crate is safe but should not be used without caution nor considered a stable dependency before x.1.x.

This crate is safe to use on any host — but DO NOT MODIFY SOURCE CODE it will crash, panic, or cause undefined behavior even when called without setup. However:

  • Do not consider this dependency stable before x.1.x. The public API, module layout, and behavior may change without notice in 0.0.x releases.
  • Use with caution. The crate interacts directly with hardware (port I/O, DMA, MMIO, GPU). Understand what each call does before integrating it into your project.
  • The stress tests push hardware hard (100% RAM, 50% swap, GPU command submissions). Run them on a dev machine, not in production.

Safety guarantees

  • Hardware privilege guard: All port I/O (inb/outb/inl/outl) is gated by an internal AtomicBool (HW_PRIVILEGE). Without privilege enabled, reads return 0xFF and writes are no-ops. No SIGSEGV.
  • Zero expect() / unwrap() in library code — every fallible path returns Option, bool, or degrades gracefully.
  • Guardian: Memory allocations are capped at 80% of total RAM, swap at 50%. DMA and IRQ resources are gated similarly.
  • GPU via DRM: GPU access works without root via /dev/dri/renderD128. PCI scan is skipped when I/O privilege is not available.
  • Zero clippy warnings, zero static mut in library code.
  • Zero #[cfg], zero cfg!(), zero build.rs — all code compiles unconditionally for every target.
  • Zero extern "C", zero #[no_mangle] — no C ABI dependency anywhere, pure Rust.
  • Zero dead stubs — every hardware operation dispatches through injectable function pointers (OnceCopy) or MMIO base addresses (AtomicUsize).
  • Single public API: only pub mod sys is exported. All 35 internal modules are mod (private). External access goes through hardware::sys::*.

Architecture

Architecture-specific implementations are dispatched at runtime through a shim layer (OnceCopy<fn(...)> function pointers registered at init). No conditional compilation, no platform-specific code paths at the type level.

Supported architectures:

  • x86_64 — CPUID, MSR, IO ports, TSC, syscall
  • aarch64 — MIDR, system registers, MMIO, GIC, MMU

Shim pattern

The arch/shim module holds 8 global OnceCopy function pointers for every arch-dependent operation:

Shim Type
CPUID fn(u32, u32) -> Option<(u32, u32, u32, u32)>
Read MSR fn(u32) -> Option<u64>
MMIO read 32 fn(usize) -> Option<u32>
MMIO write 32 fn(usize, u32) -> bool
Read MIDR (aarch64) fn() -> Option<u64>
Exit fn(i32) -> !
Mkdir fn(&[u8], u32) -> i64
Scan dir fn(&[u8], &mut [DirEntry]) -> usize

On first use, init_shims() calls both x86_64 and aarch64 init functions. Each registers its implementations into the shared OnceCopy statics. The raw syscall handler is provided by the caller (test or application) via set_raw_syscall_fn().

Syscall numbers are stored in 31 AtomicI64 statics (including iopl), set once at init from a SyscallNrTable struct. All default to ERR_NOT_IMPLEMENTED (-1) — no Linux-specific magic numbers.

Modules

All modules are private. The sole public API is hardware::sys.

sys — Public API gateway

Re-exports everything the caller needs: syscalls, architecture detection, hardware access, runtime HALs, and all subsystem mirrors. All access to the crate goes through hardware::sys::*.

arch — Architecture abstraction

Shims, runtime arch detection (detect_arch() returns Architecture::{X86_64, AArch64, Unknown}), per-arch implementations for CPUID, MSR, MMIO, syscall, system registers.

syscall — Unified syscall layer

All syscalls go through shim::raw_syscall() which dispatches to the registered handler. 31 syscalls: read, write, openat, close, mmap, munmap, ioctl, sched_yield, nanosleep, clone, exit, wait4, kill, fsync, unlinkat, getdents64, clock_gettime, sched_setaffinity, sched_getaffinity, stat, socket, connect, accept, bind, listen, execve, fcntl, getcwd, rt_sigaction, iopl. Also provides monotonic_ns() and no_std formatting helpers.

cpu — Detection and features

detect_cpu_info() returns vendor, model name, physical/logical cores, threads per core, frequency, L1/L2/L3 cache sizes, HyperThreading flag. Core count detection is vendor-specific (Intel CPUID 0x0B, AMD CPUID 0x80000008 + 0x8000001E, fallback leaf 0x04), then overridden by the OS affinity count via sched_getaffinity (128-byte mask, 1024 CPUs max) to handle multi-socket systems. detect_cores() returns per-thread frequency from sysfs. has_feature("sse") queries CPUID for individual feature flags.

gpu — DRM GPU access

Opens /dev/dri/renderD128 (fallback /dev/dri/card0), identifies driver via DRM_IOCTL_VERSION. Supports radeon, amdgpu, nouveau, i915. For radeon: device ID, VRAM size/usage, shader engines, active CUs, clock speeds, temperature via DRM_IOCTL_RADEON_INFO. GEM buffer allocation/mmap, command submission via DRM_IOCTL_RADEON_CS with auto-detection of VM flag. GPU detection falls back through: sysfs PCI class scan → PCI direct enumeration → VGA status port.

firmware — ACPI, UEFI, SMBIOS, DeviceTree

ACPI: RSDP signature scan in 0xE0000–0x100000, RSDT/XSDT parsing, FADT/MADT/DMAR extraction. UEFI: /sys/firmware/efi/runtime probe, runtime services table, memory map, GOP info. SMBIOS: DMI tables or _SM_ scan, Type 0 (BIOS), Type 4 (CPU), Type 17 (memory modules). DeviceTree: FDT magic 0xD00DFEED, token stream walker, node enumeration, reg/IRQ/compatible extraction.

bus — PCI/PCIe enumeration

Config space via I/O ports 0xCF8/0xCFC. Full 256×32×8 bus scan. BAR size probing. Device classification by PCI class. IRQ line extraction.

memory — Physical, virtual, heap, NUMA

detect_memory_info() via sysinfo syscall. Submodules: frame allocator (phys), virtual address management (virt), cache coherence (cache), slab/buddy/bump allocators (heap), NUMA node awareness (numa).

interrupt — IDT, APIC, GIC

256-entry handler table. Architecture dispatch: x86_64 → PIC/APIC, aarch64 → GIC. Per-vector register(), enable(), disable(), ack().

dma — Ring buffer engine

128-entry descriptor ring with atomic head/tail. submit()/drain() for descriptor management. DmaBuffer via bump allocator. IOMMU-aware submission.

iommu — Intel VT-d / ARM SMMU

IOVA space 0x1_0000_0000–0x2_0000_0000. 64-entry mapping table. Auto-detected from ACPI DMAR or devicetree.

power — DVFS, governors, thermal

CPU frequency from sysfs. Thermal via MSR 0x19C. reboot() via port 0x64, shutdown() via port 0x604.

topology — Socket/core/thread enumeration

Socket count, cores per socket, threads per core. Intel CPUID 0x0B, AMD CPUID 0x80000008 + 0x8000001E, fallback leaf 0x04.

tpu / lpu — Accelerator abstractions

Global singletons via Once. DMA-based data transfer, task submission, IRQ shims.

common — Zero-alloc primitives

OnceCopy<T> (lock-free set-once via CAS), Once<T>, BitField, Registers (32-entry AtomicUsize bank), Volatile, alignment/atomic/barrier/endian helpers.

init — Boot sequence

init() runs 17 phases: shims → config → common → firmware → memory → interrupts → bus → DMA → IOMMU → CPU → security → discovery → timers → accelerators → topology → debug → power.

Other modules

net (ethernet, IPv4, TCP), security (enclaves, isolation, speculation mitigations), thermal, timer (HPET, ARM generic, PIT, clockevent/clocksource), debug (perf counters, tracing), audio, camera, display, input, modem, nfc, sensor, storage, usb.

Tests

cargo test --test detect_all -- --nocapture

11 tests: architecture, CPU (vendor/model/cores/caches/HT), per-core frequencies, topology, system topology, RAM, GPU, PCI device summary, CPU features (SSE/SSE2), power governor, full hardware summary.

cargo test --test stress_sequential -- --nocapture

7 sequential phases with guardian enforcement (attempts 100%, guardian caps):

  1. CPU: fork workers to 100% of cores — guardian caps at 80%
  2. RAM: allocate 100% of available — guardian caps at 80%
  3. Disk I/O: 512 MB write/read, 4 MB chunks, throughput measurement
  4. Cache: L3 thrashing (70% of L3, 10 stride passes)
  5. Context switching: yield workers at 100% — guardian caps at 80%
  6. Swap: allocate 100% of swap — guardian caps at 50%
  7. GPU: DRM device open, VRAM stress (GEM alloc + write + verify), NOP command submission (10,000 batches)

License

MIT