# hardware
A `no_std` Rust crate for bare-metal hardware abstraction. Zero dependencies, no allocator, no standard library — raw syscalls and direct hardware access, with runtime safety guards.
## Warning
> **This crate is safe but should not be used without caution nor considered a stable dependency before `x.1.x`.**
This crate is safe to use on any host — but DO NOT MODIFY SOURCE CODE it will crash, panic, or cause undefined behavior even when called without setup. However:
- **Do not consider this dependency stable before `x.1.x`.** The public API, module layout, and behavior may change without notice in `0.0.x` releases.
- **Use with caution.** The crate interacts directly with hardware (port I/O, DMA, MMIO, GPU). Understand what each call does before integrating it into your project.
- The stress tests push hardware hard (100% RAM, 50% swap, GPU command submissions). Run them on a dev machine, not in production.
## Safety guarantees
- **Hardware privilege guard**: All port I/O (`inb`/`outb`/`inl`/`outl`) is gated by an internal `AtomicBool` (`HW_PRIVILEGE`). Without privilege enabled, reads return `0xFF` and writes are no-ops. No SIGSEGV.
- **Zero `expect()` / `unwrap()`** in library code — every fallible path returns `Option`, `bool`, or degrades gracefully.
- **Guardian**: Memory allocations are capped at 80% of total RAM, swap at 50%. DMA and IRQ resources are gated similarly.
- **GPU via DRM**: GPU access works without root via `/dev/dri/renderD128`. PCI scan is skipped when I/O privilege is not available.
- **Zero clippy warnings**, zero `static mut` in library code.
- **Zero `#[cfg]`**, zero `cfg!()`, zero `build.rs` — all code compiles unconditionally for every target.
- **`extern "C"` used only for calling convention** on machine code blobs (syscall, CPUID), zero `#[no_mangle]` — no C library dependency, no foreign function linkage.
- **Zero dead stubs** — every hardware operation dispatches through injectable function pointers (`OnceCopy`) or MMIO base addresses (`AtomicUsize`).
- **Single public API**: only `pub mod sys` is exported. All 35 internal modules are `mod` (private). External access goes through `hardware::sys::*`.
## Architecture
Architecture-specific implementations are dispatched at runtime through a shim layer (`OnceCopy<fn(...)>` function pointers registered at init). No conditional compilation, no platform-specific code paths at the type level.
Supported architectures:
- **x86_64** — CPUID, MSR, IO ports, TSC, syscall
- **aarch64** — MIDR, system registers, MMIO, GIC, MMU
### Shim pattern
The `arch/shim` module holds 8 global `OnceCopy` function pointers for every arch-dependent operation:
| Shim | Type |
|------|------|
| CPUID | `fn(u32, u32) -> Option<(u32, u32, u32, u32)>` |
| Read MSR | `fn(u32) -> Option<u64>` |
| MMIO read 32 | `fn(usize) -> Option<u32>` |
| MMIO write 32 | `fn(usize, u32) -> bool` |
| Read MIDR (aarch64) | `fn() -> Option<u64>` |
| Exit | `fn(i32) -> !` |
| Mkdir | `fn(&[u8], u32) -> i64` |
| Scan dir | `fn(&[u8], &mut [DirEntry]) -> usize` |
On first use, `init_shims()` calls both x86_64 and aarch64 init functions. Each registers its implementations into the shared `OnceCopy` statics. The raw syscall handler is auto-registered via native machine code blobs (`X86_64_SYSCALL_BLOB` / `AARCH64_SYSCALL_BLOB`) based on `detect_arch()`. `set_raw_syscall_fn()` remains available as an optional override.
Syscall numbers are stored in 31 `AtomicI64` statics (including `iopl`), set once at init from a `SyscallNrTable` struct. All default to `ERR_NOT_IMPLEMENTED` (`-1`) — no Linux-specific magic numbers.
## Modules
All modules are private. The sole public API is `hardware::sys`.
### `sys` — Public API gateway
Re-exports everything the caller needs: syscalls, architecture detection, hardware access, runtime HALs, and all subsystem mirrors. All access to the crate goes through `hardware::sys::*`.
### `arch` — Architecture abstraction
Shims, runtime arch detection (`detect_arch()` returns `Architecture::{X86_64, AArch64, Unknown}`), per-arch implementations for CPUID, MSR, MMIO, syscall, system registers.
### `syscall` — Unified syscall layer
All syscalls go through `shim::raw_syscall()` which dispatches to the native machine code blob (auto-detected) or a custom handler registered via `set_raw_syscall_fn()`. 31 syscalls: `read`, `write`, `openat`, `close`, `mmap`, `munmap`, `ioctl`, `sched_yield`, `nanosleep`, `clone`, `exit`, `wait4`, `kill`, `fsync`, `unlinkat`, `getdents64`, `clock_gettime`, `sched_setaffinity`, `sched_getaffinity`, `stat`, `socket`, `connect`, `accept`, `bind`, `listen`, `execve`, `fcntl`, `getcwd`, `rt_sigaction`, `iopl`. Also provides `monotonic_ns()` and `no_std` formatting helpers.
### `cpu` — Detection and features
`detect_cpu_info()` returns vendor, model name, physical/logical cores, threads per core, frequency, L1/L2/L3 cache sizes, HyperThreading flag. Physical vs logical core distinction works natively via the CPUID machine code blob — Intel CPUID 0x0B (SMT + core level), AMD CPUID 0x80000008 + 0x8000001E (thread count + threads-per-unit), fallback leaf 0x04. The OS affinity count via `sched_getaffinity` (128-byte mask, 1024 CPUs max) overrides when higher (multi-socket). `detect_cores()` returns per-thread frequency. `has_feature("sse")` queries CPUID for individual feature flags.
### `gpu` — DRM GPU access
Opens `/dev/dri/renderD128` (fallback `/dev/dri/card0`), identifies driver via `DRM_IOCTL_VERSION`. Supports radeon, amdgpu, nouveau, i915. For radeon: device ID, VRAM size/usage, shader engines, active CUs, clock speeds, temperature via `DRM_IOCTL_RADEON_INFO`. GEM buffer allocation/mmap, command submission via `DRM_IOCTL_RADEON_CS` with auto-detection of VM flag. GPU detection falls back through: sysfs PCI class scan → PCI direct enumeration → VGA status port.
### `firmware` — ACPI, UEFI, SMBIOS, DeviceTree
**ACPI**: RSDP signature scan in 0xE0000–0x100000, RSDT/XSDT parsing, FADT/MADT/DMAR extraction.
**UEFI**: `/sys/firmware/efi/runtime` probe, runtime services table, memory map, GOP info.
**SMBIOS**: DMI tables or `_SM_` scan, Type 0 (BIOS), Type 4 (CPU), Type 17 (memory modules).
**DeviceTree**: FDT magic 0xD00DFEED, token stream walker, node enumeration, reg/IRQ/compatible extraction.
### `bus` — PCI/PCIe enumeration
Config space via I/O ports 0xCF8/0xCFC. Full 256×32×8 bus scan. BAR size probing. Device classification by PCI class. IRQ line extraction.
### `memory` — Physical, virtual, heap, NUMA
`detect_memory_info()` via sysinfo syscall. Submodules: frame allocator (phys), virtual address management (virt), cache coherence (cache), slab/buddy/bump allocators (heap), NUMA node awareness (numa).
### `interrupt` — IDT, APIC, GIC
256-entry handler table. Architecture dispatch: x86_64 → PIC/APIC, aarch64 → GIC. Per-vector `register()`, `enable()`, `disable()`, `ack()`.
### `dma` — Ring buffer engine
128-entry descriptor ring with atomic head/tail. `submit()`/`drain()` for descriptor management. `DmaBuffer` via bump allocator. IOMMU-aware submission.
### `iommu` — Intel VT-d / ARM SMMU
IOVA space 0x1_0000_0000–0x2_0000_0000. 64-entry mapping table. Auto-detected from ACPI DMAR or devicetree.
### `power` — DVFS, governors, thermal
CPU frequency from sysfs. Thermal via MSR 0x19C. `reboot()` via port 0x64, `shutdown()` via port 0x604.
### `topology` — Socket/core/thread enumeration
Socket count, cores per socket, threads per core. Intel CPUID 0x0B, AMD CPUID 0x80000008 + 0x8000001E, fallback leaf 0x04.
### `tpu` / `lpu` — Accelerator abstractions
Global singletons via `Once`. DMA-based data transfer, task submission, IRQ shims.
### `common` — Zero-alloc primitives
`OnceCopy<T>` (lock-free set-once via CAS), `Once<T>`, `BitField`, `Registers` (32-entry `AtomicUsize` bank), `Volatile`, alignment/atomic/barrier/endian helpers.
### `init` — Boot sequence
`init()` runs 17 phases: shims → config → common → firmware → memory → interrupts → bus → DMA → IOMMU → CPU → security → discovery → timers → accelerators → topology → debug → power.
### Other modules
`net` (ethernet, IPv4, TCP), `security` (enclaves, isolation, speculation mitigations), `thermal`, `timer` (HPET, ARM generic, PIT, clockevent/clocksource), `debug` (perf counters, tracing), `audio`, `camera`, `display`, `input`, `modem`, `nfc`, `sensor`, `storage`, `usb`.
## Tests
```
cargo test --test detect_all -- --nocapture
```
11 tests: architecture, CPU (vendor/model/cores/caches/HT), per-core frequencies, topology, system topology, RAM, GPU, PCI device summary, CPU features (SSE/SSE2), power governor, full hardware summary.
```
cargo test --test stress_sequential -- --nocapture
```
7 sequential phases with guardian enforcement (attempts 100%, guardian caps):
1. **CPU**: fork workers to 100% of cores — guardian caps at 80%
2. **RAM**: allocate 100% of available — guardian caps at 80%
3. **Disk I/O**: 512 MB write/read, 4 MB chunks, throughput measurement
4. **Cache**: L3 thrashing (70% of L3, 10 stride passes)
5. **Context switching**: yield workers at 100% — guardian caps at 80%
6. **Swap**: allocate 100% of swap — guardian caps at 50%
7. **GPU**: DRM device open, VRAM stress (GEM alloc + write + verify), NOP command submission (10,000 batches)
## License
MIT