hardware
A no_std Rust crate for bare-metal hardware abstraction. Zero dependencies, no allocator, no standard library — only raw syscalls and direct hardware access.
Warning
Do not run the tests on a production machine. No internal safety mechanisms are implemented yet — the crate performs raw syscalls, direct MMIO access, and hardware register manipulation without any sandboxing or privilege checks. If you'll modify the stress tests, it'll may destabilize/crash your system.
Architecture
All code compiles unconditionally for every target — no #[cfg], no feature gates, no build.rs. Architecture-specific implementations (x86_64 inline asm, aarch64 stubs) are dispatched at runtime through a callback/shim layer (OnceCopy<fn(...)> function pointers registered at init).
Supported architectures:
- x86_64 — fully functional (CPUID, MSR, IO ports, TSC, syscall asm)
- aarch64 — stubs in place (MIDR, system registers, MMIO, GIC), ready for real implementations
Shim pattern
The arch/shim.rs module holds global OnceCopy statics for every arch-dependent operation (CPUID, MSR read, MMIO read/write, raw syscall, exit, MIDR read). On first use, init_shims() calls both x86_64 and aarch64 init functions — the first architecture that successfully claims set_raw_syscall_fn() wins and registers its syscall numbers, exit handler, and all callbacks. The losing arch is skipped entirely — no runtime overhead, no conditional compilation.
Syscall numbers are stored in 17 AtomicI64 statics, set once at init from a SyscallNrTable struct (x86_64: read=0, write=1, open=2, ... vs aarch64: read=63, write=64, open=56, ...).
Modules
sys — Unified syscall layer
All syscalls go through shim::raw_syscall() which dispatches to the registered arch asm. Implemented syscalls: read, write, open, close, mmap (anon + device), munmap, ioctl, nanosleep, fork, exit, waitpid, kill, fsync, unlink, getdents64, clock_gettime. Also provides monotonic_ns() (nanosecond timer via CLOCK_MONOTONIC) and no_std formatting helpers (fmt_u64 with 20-byte buffer, write_stderr).
cpu — Detection and features
detect_cpu_info() returns a CpuInfo struct containing: architecture, vendor (GenuineIntel/AuthenticAMD), model name (48 bytes from CPUID 0x80000002-4), physical/logical cores, threads per core, frequency (MHz from CPUID 0x16 or sysfs fallback), L1/L2/L3 cache sizes, and HyperThreading flag.
Core count detection is vendor-specific:
- Intel: CPUID leaf 0x0B — SMT count from level 0, core count from level 1
- AMD: CPUID 0x80000008 for total logical cores + 0x8000001E for threads/compute-unit
- Fallback: CPUID leaf 0x04
EAX[31:26]+1
detect_cores() returns per-thread frequency via sysfs (/sys/devices/system/cpu/cpuN/cpufreq/scaling_cur_freq).
gpu — DRM/radeon GPU access
Opens /dev/dri/renderD128 (fallback /dev/dri/card0), identifies driver via DRM_IOCTL_VERSION. Supports radeon, amdgpu, nouveau, i915.
For radeon: queries device ID, VRAM size/usage, shader engine count, active CU count, GPU/memory clock, temperature via DRM_IOCTL_RADEON_INFO. GEM buffer allocation (DRM_IOCTL_RADEON_GEM_CREATE) and mmap for command submission.
Command submission via DRM_IOCTL_RADEON_CS with auto-detection of RADEON_CS_USE_VM flag — probe_cs_packet_size() does a binary search from 8192 downward to find the maximum working buffer size, testing with/without VM flag.
GPU detection falls back through: sysfs PCI class scan → PCI direct enumeration → VGA status port (0x3DA) for legacy detection.
firmware — ACPI, UEFI, SMBIOS, DeviceTree
ACPI: Finds RSDP by signature scan in 0xE0000–0x100000, validates checksum, reads RSDT (rev < 2) or XSDT (rev >= 2). Parses FADT (SCI interrupt, SMI port, PM timer, reset register, flags), MADT (IOAPIC base address), and DMAR (Intel VT-d base).
UEFI: Probes /sys/firmware/efi/runtime or signature at 0x80000000. Reads runtime services table (get/set time, get/set variable, reset). Parses memory map descriptors and GOP framebuffer info.
SMBIOS: Reads /sys/firmware/dmi/tables/smbios_entry_point or scans 0xF0000–0xFFFFF for _SM_ signature. Parses Type 0 (BIOS info), Type 4 (CPU: socket, family, speed, core count), Type 17 (memory module: locator, size, speed, type).
DeviceTree: Validates FDT magic (0xD00DFEED), parses header, walks token stream (BEGIN_NODE/END_NODE/PROP/NOP/END) with 4-byte alignment. enumerate_nodes() returns up to 128 FdtNode entries with name, depth, and offset. Extracts DtDeviceEntry with reg base/size, IRQ, and compatible string.
bus — PCI/PCIe enumeration
Config space access via I/O ports 0xCF8/0xCFC (x86_64). Scans all 256 buses × 32 devices × 8 functions. BAR size probing writes 0xFFFFFFFF, reads back mask, restores original, calculates size from (!mask)+1. Classifies devices by PCI class (0x01=storage, 0x02=network, 0x03=GPU, 0x04=multimedia, 0x06=bridge, 0x0C=serial bus). Reads IRQ line from config offset 0x3C and registers with interrupt controller.
memory — Physical, virtual, heap, NUMA
detect_memory_info() uses the sysinfo syscall to return total/free/available bytes and swap info. Submodules provide: frame allocator and zone management (phys), virtual address management (virt), cache coherence abstractions (cache), slab/buddy/bump allocators (heap), NUMA node awareness (numa).
interrupt — IDT, APIC, GIC
256-entry handler table ([Option<fn()>; 256]). Architecture dispatch: x86_64 → PIC/APIC, aarch64 → GIC. register(vec, handler) stores function pointers. Controller supports enable()/disable()/ack() per vector.
dma — Ring buffer engine
128-entry descriptor ring buffer with atomic head/tail pointers. submit() enqueues descriptors, drain() dequeues completed ones. DmaBuffer allocated via bump allocator, phys_addr() returns pointer as physical address. If IOMMU is present, submit_buffer() maps through it first.
iommu — Intel VT-d / ARM SMMU
IOVA space from 0x1_0000_0000 to 0x2_0000_0000. Mapping table of 64 entries tracking IOVA↔physical translations. map_dma_buffer() allocates IOVA and stores mapping. translate_iova() does linear scan for reverse lookup. Auto-detected from ACPI DMAR (Intel VT-d) or devicetree (ARM SMMU).
power — DVFS, governors, thermal
Reads current CPU frequency from sysfs (scaling_cur_freq in kHz). Thermal monitoring via MSR 0x19C (x86 thermal status, bits 16-22 for digital readout). reboot() writes 0xFE to keyboard controller port 0x64. shutdown() writes 0x2000 to port 0x604.
topology — Socket/core/thread enumeration
Detects socket count, cores per socket, and threads per core. Uses CPUID 0x0B (Intel extended topology) or 0x80000008 + 0x8000001E (AMD) with fallback to CPUID leaf 0x04.
tpu / lpu — Accelerator abstractions
Global singletons via Once<TpuDevice>/Once<LpuDevice>. Each wraps a base address, initialized flag, and mode register. transfer()/submit_task() allocates DMA buffers, copies data, and submits through the DMA engine. IRQ shims (tpu_irq_shim/lpu_irq_shim) increment atomic counters.
common — Zero-alloc primitives
OnceCopy<T>: Lock-free set-once via CAS state machine (0→1→2)Once<T>: Same for non-Copy typesBitField: Extract/insert bit ranges withmask_u64(),extract_u32/u64(),insert_u32/u64()Registers: 32-entryAtomicUsizebank with auto-increment write counterVolatile:read_volatile/write_volatilewrappers- Alignment, Atomic, Barrier, Endian: Standard bare-metal helpers
init — Boot sequence
init() runs in order: shims → config (SSE/AVX/NEON detection) → common (endian/alignment/bitfield self-test) → firmware (ACPI/UEFI/SMBIOS/DT) → memory → interrupts → bus (PCI) → DMA → IOMMU → CPU → security → discovery → timers → accelerators (GPU/TPU/LPU) → topology → debug → power → thermal.
Tests
cargo test --test detect_all -- --nocapture
11 tests: architecture, CPU info (vendor/model/cores/caches/HT), per-thread frequencies, topology (sockets/cores), system topology, RAM (total/available), GPU (vendor IDs), PCI device classification, CPU features (SSE/SSE2), power governor, full summary.
cargo test --test stress_sequential -- --nocapture
7 sequential phases: CPU load (70% of threads, 3s busy-loop with LCG), RAM pressure (70% of available, page-stride write/verify), disk I/O (512 MB write+read, 4 MB chunks, throughput in MB/s), L3 cache thrashing (70% of L3, 10 stride passes), context switching (70% workers yielding for 2s), swap pressure (70% of swap, monitor via sysinfo), and recovery.
License
MIT