# hardware
A `no_std` Rust crate for bare-metal hardware abstraction. Zero dependencies, no allocator, no standard library — raw syscalls and direct hardware access, with runtime safety guards.
## Warning
> **This crate is safe but should not be used without caution nor considered a stable dependency before `x.1.x`.**
This crate is safe to use on any host — but DO NOT MODIFY SOURCE CODE it will crash, panic, or cause undefined behavior even when called without setup. However:
- **Do not consider this dependency stable before `x.1.x`.** The public API, module layout, and behavior may change without notice in `0.0.x` releases.
- **Use with caution.** The crate interacts directly with hardware (port I/O, DMA, MMIO, GPU). Understand what each call does before integrating it into your project.
- The stress tests push hardware hard (100% RAM, 50% swap, GPU command submissions). Run them on a dev machine, not in production.
## Safety guarantees
- **Hardware privilege guard**: All port I/O (`inb`/`outb`/`inl`/`outl`) is gated by an internal `AtomicBool` (`HW_PRIVILEGE`). Without privilege enabled, reads return `0xFF` and writes are no-ops. No SIGSEGV.
- **Zero `expect()` / `unwrap()`** in library code — every fallible path returns `Option`, `bool`, or degrades gracefully.
- **Guardian**: Memory allocations are capped at 80% of total RAM, swap at 50%, CPU at 80%. Each gate performs a dual check: capacity ceiling (`bounded()`) and surge rate limit (sliding-window budget). Usage can come from injected reader callbacks (real hardware) or internal counters. `guardian_snapshot()` exposes all state for monitoring.
- **GPU detection**: OS-agnostic cascade — PCI port I/O scan, sysfs PCI scan (`/sys/bus/pci/devices/`), VGA IO probe, consumer callback (`set_detect_gpu_fn`), native device node probe (`/dev/mali0`, `/dev/kgsl-3d0`, `/dev/pvrsrvkm`, `/dev/dri/renderD128`), Mali kbase ioctl. Library parses raw GPU registers (Mali GPU_ID, Adreno RBBM_CHIP_ID).
- **Zero clippy warnings**, zero `static mut` in library code.
- **Zero `#[cfg]`**, zero `cfg!()`, zero `build.rs` — all code compiles unconditionally for every target.
- **`extern "C"` used only for calling convention** on machine code blobs (syscall, CPUID), zero `#[no_mangle]` — no C library dependency, no foreign function linkage.
- **Zero dead stubs** — every hardware operation dispatches through injectable function pointers (`OnceCopy`) or MMIO base addresses (`AtomicUsize`).
- **Zero hardcoded MMIO** — All AArch64 device initialization queries the Flattened Device Tree (`find_device_by_compatible()`) at runtime. No hardcoded base addresses, region sizes, or IRQ numbers.
- **Single public API**: only `pub mod sys` is exported. All 35 internal modules are `mod` (private). External access goes through `hardware::sys::*`.
## Architecture
Architecture-specific implementations are dispatched at runtime through a shim layer (`OnceCopy<fn(...)>` function pointers registered at init). No conditional compilation, no platform-specific code paths at the type level.
Supported architectures:
- **x86_64** — CPUID, MSR, IO ports, TSC, syscall
- **aarch64** — MIDR, system registers, MMIO, GIC, MMU
### Shim pattern
The `arch/shim` module holds 7 global `OnceCopy` function pointers for every arch-dependent operation:
| Shim | Type |
|------|------|
| CPUID | `fn(u32, u32) -> Option<(u32, u32, u32, u32)>` |
| Read MSR | `fn(u32) -> Option<u64>` |
| MMIO read 32 | `fn(usize) -> Option<u32>` |
| MMIO write 32 | `fn(usize, u32) -> bool` |
| Read MIDR (aarch64) | `fn() -> Option<u64>` |
| Mkdir | `fn(&[u8], u32) -> i64` |
| Scan dir | `fn(&[u8], &mut [DirEntry]) -> usize` |
On first use, `init_shims()` calls both x86_64 and aarch64 init functions. Each registers its implementations into the shared `OnceCopy` statics. The raw syscall handler is auto-registered via native machine code blobs (`X86_64_SYSCALL_BLOB` / `AARCH64_SYSCALL_BLOB`) based on `detect_arch()`. `set_raw_syscall_fn()` remains available as an optional override. `arch_exit()` uses `raw_syscall(nr_exit(), code)` directly — no exit callback.
Syscall numbers are stored in 31 `AtomicI64` statics (including `iopl`), auto-configured at init by `auto_set_syscall_nrs()` from the detected architecture. OS constants (AT_FDCWD, SIGCHLD, mmap flags, O_* flags) set by `auto_set_os_constants()`. No external setup needed.
## Modules
All modules are private. The sole public API is `hardware::sys`.
### `sys` — Public API gateway
Re-exports everything the caller needs: syscalls, architecture detection, hardware access, runtime HALs, and all subsystem mirrors. All access to the crate goes through `hardware::sys::*`.
### `arch` — Architecture abstraction
Shims, runtime arch detection (`detect_arch()` returns `Architecture::{X86_64, AArch64, Unknown}`), per-arch implementations for CPUID, MSR, MMIO, syscall, system registers.
### `syscall` — Unified syscall layer
All syscalls go through `shim::raw_syscall()` which dispatches to the native machine code blob (auto-detected) or a custom handler registered via `set_raw_syscall_fn()`. 31 syscalls: `read`, `write`, `openat`, `close`, `mmap`, `munmap`, `ioctl`, `sched_yield`, `nanosleep`, `clone`, `exit`, `wait4`, `kill`, `fsync`, `unlinkat`, `getdents64`, `clock_gettime`, `sched_setaffinity`, `sched_getaffinity`, `stat`, `socket`, `connect`, `accept`, `bind`, `listen`, `execve`, `fcntl`, `getcwd`, `rt_sigaction`, `iopl`. Also provides `monotonic_ns()` and `no_std` formatting helpers.
### `cpu` — Detection and features
`detect_cpu_info()` returns vendor, model name, physical/logical cores, threads per core, frequency, L1/L2/L3 cache sizes, HyperThreading flag. Physical vs logical core distinction works natively via the CPUID machine code blob — Intel CPUID 0x0B (SMT + core level), AMD CPUID 0x80000008 + 0x8000001E (thread count + threads-per-unit), fallback leaf 0x04. The OS affinity count via `sched_getaffinity` (128-byte mask, 1024 CPUs max) overrides when higher (multi-socket). `detect_cores()` returns per-thread frequency. `has_feature("sse")` queries CPUID for individual feature flags.
### `gpu` — GPU detection and access
GPU detection cascade (tries each method in order until a GPU is found):
1. **PCI port I/O scan** — enumerates PCI bus for display class (0x03) devices (requires iopl/root)
2. **Sysfs PCI scan** — reads `/sys/bus/pci/devices/` class/vendor/device files (no root needed)
3. **VGA IO probe** (x86 only) — reads VGA status port 0x3DA
4. **Detection callback** — `set_detect_gpu_fn()` lets consumers inject a function that returns raw GPU register data (`RawGpuId`)
5. **Native device node probe** — probes `/dev/mali0`, `/dev/kgsl-3d0`, `/dev/pvrsrvkm`, `/dev/dri/renderD128` via `openat` syscall
6. **Mali kbase ioctl** — reads GPU_ID register via `KBASE_IOCTL_VERSION_CHECK` + `KBASE_IOCTL_GET_GPUPROPS`
The library parses the raw hardware register values:
- **Mali**: `parse_mali_gpu_id()` extracts the product ID from the GPU_ID register. `mali_product_name()` maps 28 known Mali GPUs (T620 through G720).
- **Adreno**: `parse_adreno_chip_id()` extracts the chip identity from RBBM_CHIP_ID. `adreno_product_name()` maps 14 known Adreno GPUs.
DRM access (command submission, GEM buffers) is available through the `drm` and `hw` submodules.
### `firmware` — ACPI, UEFI, SMBIOS, DeviceTree
**ACPI**: RSDP signature scan in 0xE0000–0x100000, RSDT/XSDT parsing, FADT/MADT/DMAR extraction.
**UEFI**: `/sys/firmware/efi/runtime` probe, runtime services table, memory map, GOP info.
**SMBIOS**: DMI tables or `_SM_` scan, Type 0 (BIOS), Type 4 (CPU), Type 17 (memory modules).
**DeviceTree**: FDT magic 0xD00DFEED, token stream walker, node enumeration, reg/IRQ/compatible extraction. `find_device_by_compatible(needle)` returns `(base, size, irq)` for any device matching a compatible string — used by all AArch64 lifecycle modules.
### `bus` — PCI/PCIe enumeration
Config space via I/O ports 0xCF8/0xCFC. Full 256×32×8 bus scan. BAR size probing. Device classification by PCI class. IRQ line extraction.
### `memory` — Physical, virtual, heap, NUMA
`detect_memory_info()` via consumer callback or native `sysinfo` syscall fallback (NR 99 x86_64 / NR 179 aarch64). Submodules: frame allocator (phys), virtual address management (virt), cache coherence (cache), slab/buddy/bump allocators (heap), NUMA node awareness (numa).
### `interrupt` — IDT, APIC, GIC
256-entry handler table. Architecture dispatch: x86_64 → PIC/APIC, aarch64 → GIC. Per-vector `register()`, `enable()`, `disable()`, `ack()`.
### `dma` — Ring buffer engine
128-entry descriptor ring with atomic head/tail. `submit()`/`drain()` for descriptor management. `DmaBuffer` via bump allocator. IOMMU-aware submission.
### `iommu` — Intel VT-d / ARM SMMU
IOVA space 0x1_0000_0000–0x2_0000_0000. 64-entry mapping table. Auto-detected from ACPI DMAR or devicetree.
### `power` — DVFS, governors, thermal
CPU frequency from CPUID brand string or leaf 0x15. Thermal via MSR 0x19C. `reboot()` via port 0x64, `shutdown()` via port 0x604. Governor defaults to `Unknown`; user sets policy via `set_policy()`.
### `topology` — Socket/core/thread enumeration
Socket count, cores per socket, threads per core. Intel CPUID 0x0B, AMD CPUID 0x80000008 + 0x8000001E, fallback leaf 0x04.
### `tpu` / `lpu` — Accelerator abstractions
Global singletons via `Once`. DMA-based data transfer, task submission, IRQ shims. AArch64 initialization discovers MMIO regions from the DeviceTree at runtime.
### `common` — Zero-alloc primitives
`OnceCopy<T>` (lock-free set-once via CAS), `Once<T>`, `BitField`, `Registers` (32-entry `AtomicUsize` bank), `Volatile`, alignment/atomic/barrier/endian helpers.
### `init` — Boot sequence
`init()` runs 17 phases: shims → config → common → firmware → memory → interrupts → bus → DMA → IOMMU → CPU → security → discovery → timers → accelerators → topology → debug → power.
### Other modules
`net` (ethernet, IPv4, TCP), `security` (enclaves, isolation, speculation mitigations), `thermal`, `timer` (HPET, ARM generic, PIT, clockevent/clocksource), `debug` (perf counters, tracing), `audio`, `camera`, `display`, `input`, `modem`, `nfc`, `sensor`, `storage`, `usb`.
## Tests
```
cargo test --test detect_all -- --nocapture
```
11 tests: architecture, CPU (vendor/model/cores/caches/HT), per-core frequencies, topology, system topology, RAM, GPU, PCI device summary, CPU features (SSE/SSE2), power governor, full hardware summary.
```
cargo test --test stress_sequential -- --nocapture
```
7 sequential phases with guardian enforcement (attempts 100%, guardian caps):
1. **CPU**: fork workers to 100% of cores — guardian caps at 80%
2. **RAM**: allocate 100% of available — guardian caps at 80%
3. **Disk I/O**: 512 MB write/read, 4 MB chunks, throughput measurement
4. **Cache**: L3 thrashing (70% of L3, 10 stride passes)
5. **Context switching**: yield workers at 100% — guardian caps at 80%
6. **Swap**: allocate 100% of swap — guardian caps at 50%
7. **GPU**: DRM device open, VRAM stress (GEM alloc + write + verify), NOP command submission (10,000 batches)
## License
MIT