hardware 0.0.1

A no_std bare-metal hardware abstraction layer — WARNING: do NOT modify tests, no internal safety is implemented yet, raw syscalls and direct hardware access with no sandboxing
Documentation

hardware

A no_std Rust crate for bare-metal hardware abstraction. Zero dependencies, no allocator, no standard library — only raw syscalls and direct hardware access.

Warning

Do not run the tests on a production machine. No internal safety mechanisms are implemented yet — the crate performs raw syscalls, direct MMIO access, and hardware register manipulation without any sandboxing or privilege checks. If you'll modify the stress tests, it'll may destabilize/crash your system.

Architecture

All code compiles unconditionally for every target — no #[cfg], no feature gates, no build.rs. Architecture-specific implementations (x86_64 inline asm, aarch64 stubs) are dispatched at runtime through a callback/shim layer (OnceCopy<fn(...)> function pointers registered at init).

Supported architectures:

  • x86_64 — fully functional (CPUID, MSR, IO ports, TSC, syscall asm)
  • aarch64 — stubs in place (MIDR, system registers, MMIO, GIC), ready for real implementations

Shim pattern

The arch/shim.rs module holds global OnceCopy statics for every arch-dependent operation (CPUID, MSR read, MMIO read/write, raw syscall, exit, MIDR read). On first use, init_shims() calls both x86_64 and aarch64 init functions — the first architecture that successfully claims set_raw_syscall_fn() wins and registers its syscall numbers, exit handler, and all callbacks. The losing arch is skipped entirely — no runtime overhead, no conditional compilation.

Syscall numbers are stored in 17 AtomicI64 statics, set once at init from a SyscallNrTable struct (x86_64: read=0, write=1, open=2, ... vs aarch64: read=63, write=64, open=56, ...).

Modules

sys — Unified syscall layer

All syscalls go through shim::raw_syscall() which dispatches to the registered arch asm. Implemented syscalls: read, write, open, close, mmap (anon + device), munmap, ioctl, nanosleep, fork, exit, waitpid, kill, fsync, unlink, getdents64, clock_gettime. Also provides monotonic_ns() (nanosecond timer via CLOCK_MONOTONIC) and no_std formatting helpers (fmt_u64 with 20-byte buffer, write_stderr).

cpu — Detection and features

detect_cpu_info() returns a CpuInfo struct containing: architecture, vendor (GenuineIntel/AuthenticAMD), model name (48 bytes from CPUID 0x80000002-4), physical/logical cores, threads per core, frequency (MHz from CPUID 0x16 or sysfs fallback), L1/L2/L3 cache sizes, and HyperThreading flag.

Core count detection is vendor-specific:

  • Intel: CPUID leaf 0x0B — SMT count from level 0, core count from level 1
  • AMD: CPUID 0x80000008 for total logical cores + 0x8000001E for threads/compute-unit
  • Fallback: CPUID leaf 0x04 EAX[31:26]+1

detect_cores() returns per-thread frequency via sysfs (/sys/devices/system/cpu/cpuN/cpufreq/scaling_cur_freq).

gpu — DRM/radeon GPU access

Opens /dev/dri/renderD128 (fallback /dev/dri/card0), identifies driver via DRM_IOCTL_VERSION. Supports radeon, amdgpu, nouveau, i915.

For radeon: queries device ID, VRAM size/usage, shader engine count, active CU count, GPU/memory clock, temperature via DRM_IOCTL_RADEON_INFO. GEM buffer allocation (DRM_IOCTL_RADEON_GEM_CREATE) and mmap for command submission.

Command submission via DRM_IOCTL_RADEON_CS with auto-detection of RADEON_CS_USE_VM flag — probe_cs_packet_size() does a binary search from 8192 downward to find the maximum working buffer size, testing with/without VM flag.

GPU detection falls back through: sysfs PCI class scan → PCI direct enumeration → VGA status port (0x3DA) for legacy detection.

firmware — ACPI, UEFI, SMBIOS, DeviceTree

ACPI: Finds RSDP by signature scan in 0xE0000–0x100000, validates checksum, reads RSDT (rev < 2) or XSDT (rev >= 2). Parses FADT (SCI interrupt, SMI port, PM timer, reset register, flags), MADT (IOAPIC base address), and DMAR (Intel VT-d base).

UEFI: Probes /sys/firmware/efi/runtime or signature at 0x80000000. Reads runtime services table (get/set time, get/set variable, reset). Parses memory map descriptors and GOP framebuffer info.

SMBIOS: Reads /sys/firmware/dmi/tables/smbios_entry_point or scans 0xF0000–0xFFFFF for _SM_ signature. Parses Type 0 (BIOS info), Type 4 (CPU: socket, family, speed, core count), Type 17 (memory module: locator, size, speed, type).

DeviceTree: Validates FDT magic (0xD00DFEED), parses header, walks token stream (BEGIN_NODE/END_NODE/PROP/NOP/END) with 4-byte alignment. enumerate_nodes() returns up to 128 FdtNode entries with name, depth, and offset. Extracts DtDeviceEntry with reg base/size, IRQ, and compatible string.

bus — PCI/PCIe enumeration

Config space access via I/O ports 0xCF8/0xCFC (x86_64). Scans all 256 buses × 32 devices × 8 functions. BAR size probing writes 0xFFFFFFFF, reads back mask, restores original, calculates size from (!mask)+1. Classifies devices by PCI class (0x01=storage, 0x02=network, 0x03=GPU, 0x04=multimedia, 0x06=bridge, 0x0C=serial bus). Reads IRQ line from config offset 0x3C and registers with interrupt controller.

memory — Physical, virtual, heap, NUMA

detect_memory_info() uses the sysinfo syscall to return total/free/available bytes and swap info. Submodules provide: frame allocator and zone management (phys), virtual address management (virt), cache coherence abstractions (cache), slab/buddy/bump allocators (heap), NUMA node awareness (numa).

interrupt — IDT, APIC, GIC

256-entry handler table ([Option<fn()>; 256]). Architecture dispatch: x86_64 → PIC/APIC, aarch64 → GIC. register(vec, handler) stores function pointers. Controller supports enable()/disable()/ack() per vector.

dma — Ring buffer engine

128-entry descriptor ring buffer with atomic head/tail pointers. submit() enqueues descriptors, drain() dequeues completed ones. DmaBuffer allocated via bump allocator, phys_addr() returns pointer as physical address. If IOMMU is present, submit_buffer() maps through it first.

iommu — Intel VT-d / ARM SMMU

IOVA space from 0x1_0000_0000 to 0x2_0000_0000. Mapping table of 64 entries tracking IOVA↔physical translations. map_dma_buffer() allocates IOVA and stores mapping. translate_iova() does linear scan for reverse lookup. Auto-detected from ACPI DMAR (Intel VT-d) or devicetree (ARM SMMU).

power — DVFS, governors, thermal

Reads current CPU frequency from sysfs (scaling_cur_freq in kHz). Thermal monitoring via MSR 0x19C (x86 thermal status, bits 16-22 for digital readout). reboot() writes 0xFE to keyboard controller port 0x64. shutdown() writes 0x2000 to port 0x604.

topology — Socket/core/thread enumeration

Detects socket count, cores per socket, and threads per core. Uses CPUID 0x0B (Intel extended topology) or 0x80000008 + 0x8000001E (AMD) with fallback to CPUID leaf 0x04.

tpu / lpu — Accelerator abstractions

Global singletons via Once<TpuDevice>/Once<LpuDevice>. Each wraps a base address, initialized flag, and mode register. transfer()/submit_task() allocates DMA buffers, copies data, and submits through the DMA engine. IRQ shims (tpu_irq_shim/lpu_irq_shim) increment atomic counters.

common — Zero-alloc primitives

  • OnceCopy<T>: Lock-free set-once via CAS state machine (0→1→2)
  • Once<T>: Same for non-Copy types
  • BitField: Extract/insert bit ranges with mask_u64(), extract_u32/u64(), insert_u32/u64()
  • Registers: 32-entry AtomicUsize bank with auto-increment write counter
  • Volatile: read_volatile/write_volatile wrappers
  • Alignment, Atomic, Barrier, Endian: Standard bare-metal helpers

init — Boot sequence

init() runs in order: shims → config (SSE/AVX/NEON detection) → common (endian/alignment/bitfield self-test) → firmware (ACPI/UEFI/SMBIOS/DT) → memory → interrupts → bus (PCI) → DMA → IOMMU → CPU → security → discovery → timers → accelerators (GPU/TPU/LPU) → topology → debug → power → thermal.

Tests

cargo test --test detect_all -- --nocapture

11 tests: architecture, CPU info (vendor/model/cores/caches/HT), per-thread frequencies, topology (sockets/cores), system topology, RAM (total/available), GPU (vendor IDs), PCI device classification, CPU features (SSE/SSE2), power governor, full summary.

cargo test --test stress_sequential -- --nocapture

7 sequential phases: CPU load (70% of threads, 3s busy-loop with LCG), RAM pressure (70% of available, page-stride write/verify), disk I/O (512 MB write+read, 4 MB chunks, throughput in MB/s), L3 cache thrashing (70% of L3, 10 stride passes), context switching (70% workers yielding for 2s), swap pressure (70% of swap, monitor via sysinfo), and recovery.

License

MIT