hardware 0.0.1

A no_std bare-metal hardware abstraction layer — WARNING: do NOT modify tests, no internal safety is implemented yet, raw syscalls and direct hardware access with no sandboxing
Documentation
# hardware

A `no_std` Rust crate for bare-metal hardware abstraction. Zero dependencies, no allocator, no standard library — only raw syscalls and direct hardware access.

## Warning

**Do not run the tests on a production machine.** No internal safety mechanisms are implemented yet — the crate performs raw syscalls, direct MMIO access, and hardware register manipulation without any sandboxing or privilege checks. If you'll modify the stress tests, it'll may destabilize/crash your system.

## Architecture

All code compiles unconditionally for every target — no `#[cfg]`, no feature gates, no `build.rs`. Architecture-specific implementations (x86_64 inline asm, aarch64 stubs) are dispatched at runtime through a callback/shim layer (`OnceCopy<fn(...)>` function pointers registered at init).

Supported architectures:
- **x86_64** — fully functional (CPUID, MSR, IO ports, TSC, syscall asm)
- **aarch64** — stubs in place (MIDR, system registers, MMIO, GIC), ready for real implementations

### Shim pattern

The `arch/shim.rs` module holds global `OnceCopy` statics for every arch-dependent operation (CPUID, MSR read, MMIO read/write, raw syscall, exit, MIDR read). On first use, `init_shims()` calls both x86_64 and aarch64 init functions — the first architecture that successfully claims `set_raw_syscall_fn()` wins and registers its syscall numbers, exit handler, and all callbacks. The losing arch is skipped entirely — no runtime overhead, no conditional compilation.

Syscall numbers are stored in 17 `AtomicI64` statics, set once at init from a `SyscallNrTable` struct (x86_64: read=0, write=1, open=2, ... vs aarch64: read=63, write=64, open=56, ...).

## Modules

### `sys` — Unified syscall layer

All syscalls go through `shim::raw_syscall()` which dispatches to the registered arch asm. Implemented syscalls: `read`, `write`, `open`, `close`, `mmap` (anon + device), `munmap`, `ioctl`, `nanosleep`, `fork`, `exit`, `waitpid`, `kill`, `fsync`, `unlink`, `getdents64`, `clock_gettime`. Also provides `monotonic_ns()` (nanosecond timer via `CLOCK_MONOTONIC`) and `no_std` formatting helpers (`fmt_u64` with 20-byte buffer, `write_stderr`).

### `cpu` — Detection and features

`detect_cpu_info()` returns a `CpuInfo` struct containing: architecture, vendor (`GenuineIntel`/`AuthenticAMD`), model name (48 bytes from CPUID 0x80000002-4), physical/logical cores, threads per core, frequency (MHz from CPUID 0x16 or sysfs fallback), L1/L2/L3 cache sizes, and HyperThreading flag.

Core count detection is vendor-specific:
- **Intel**: CPUID leaf 0x0B — SMT count from level 0, core count from level 1
- **AMD**: CPUID 0x80000008 for total logical cores + 0x8000001E for threads/compute-unit
- **Fallback**: CPUID leaf 0x04 `EAX[31:26]+1`

`detect_cores()` returns per-thread frequency via sysfs (`/sys/devices/system/cpu/cpuN/cpufreq/scaling_cur_freq`).

### `gpu` — DRM/radeon GPU access

Opens `/dev/dri/renderD128` (fallback `/dev/dri/card0`), identifies driver via `DRM_IOCTL_VERSION`. Supports radeon, amdgpu, nouveau, i915.

For radeon: queries device ID, VRAM size/usage, shader engine count, active CU count, GPU/memory clock, temperature via `DRM_IOCTL_RADEON_INFO`. GEM buffer allocation (`DRM_IOCTL_RADEON_GEM_CREATE`) and mmap for command submission.

Command submission via `DRM_IOCTL_RADEON_CS` with auto-detection of `RADEON_CS_USE_VM` flag — `probe_cs_packet_size()` does a binary search from 8192 downward to find the maximum working buffer size, testing with/without VM flag.

GPU detection falls back through: sysfs PCI class scan → PCI direct enumeration → VGA status port (0x3DA) for legacy detection.

### `firmware` — ACPI, UEFI, SMBIOS, DeviceTree

**ACPI**: Finds RSDP by signature scan in 0xE0000–0x100000, validates checksum, reads RSDT (rev < 2) or XSDT (rev >= 2). Parses FADT (SCI interrupt, SMI port, PM timer, reset register, flags), MADT (IOAPIC base address), and DMAR (Intel VT-d base).

**UEFI**: Probes `/sys/firmware/efi/runtime` or signature at 0x80000000. Reads runtime services table (get/set time, get/set variable, reset). Parses memory map descriptors and GOP framebuffer info.

**SMBIOS**: Reads `/sys/firmware/dmi/tables/smbios_entry_point` or scans 0xF0000–0xFFFFF for `_SM_` signature. Parses Type 0 (BIOS info), Type 4 (CPU: socket, family, speed, core count), Type 17 (memory module: locator, size, speed, type).

**DeviceTree**: Validates FDT magic (0xD00DFEED), parses header, walks token stream (BEGIN_NODE/END_NODE/PROP/NOP/END) with 4-byte alignment. `enumerate_nodes()` returns up to 128 `FdtNode` entries with name, depth, and offset. Extracts `DtDeviceEntry` with reg base/size, IRQ, and compatible string.

### `bus` — PCI/PCIe enumeration

Config space access via I/O ports 0xCF8/0xCFC (x86_64). Scans all 256 buses × 32 devices × 8 functions. BAR size probing writes 0xFFFFFFFF, reads back mask, restores original, calculates size from `(!mask)+1`. Classifies devices by PCI class (0x01=storage, 0x02=network, 0x03=GPU, 0x04=multimedia, 0x06=bridge, 0x0C=serial bus). Reads IRQ line from config offset 0x3C and registers with interrupt controller.

### `memory` — Physical, virtual, heap, NUMA

`detect_memory_info()` uses the `sysinfo` syscall to return total/free/available bytes and swap info. Submodules provide: frame allocator and zone management (phys), virtual address management (virt), cache coherence abstractions (cache), slab/buddy/bump allocators (heap), NUMA node awareness (numa).

### `interrupt` — IDT, APIC, GIC

256-entry handler table (`[Option<fn()>; 256]`). Architecture dispatch: x86_64 → PIC/APIC, aarch64 → GIC. `register(vec, handler)` stores function pointers. Controller supports `enable()`/`disable()`/`ack()` per vector.

### `dma` — Ring buffer engine

128-entry descriptor ring buffer with atomic head/tail pointers. `submit()` enqueues descriptors, `drain()` dequeues completed ones. `DmaBuffer` allocated via bump allocator, `phys_addr()` returns pointer as physical address. If IOMMU is present, `submit_buffer()` maps through it first.

### `iommu` — Intel VT-d / ARM SMMU

IOVA space from 0x1_0000_0000 to 0x2_0000_0000. Mapping table of 64 entries tracking IOVA↔physical translations. `map_dma_buffer()` allocates IOVA and stores mapping. `translate_iova()` does linear scan for reverse lookup. Auto-detected from ACPI DMAR (Intel VT-d) or devicetree (ARM SMMU).

### `power` — DVFS, governors, thermal

Reads current CPU frequency from sysfs (`scaling_cur_freq` in kHz). Thermal monitoring via MSR 0x19C (x86 thermal status, bits 16-22 for digital readout). `reboot()` writes 0xFE to keyboard controller port 0x64. `shutdown()` writes 0x2000 to port 0x604.

### `topology` — Socket/core/thread enumeration

Detects socket count, cores per socket, and threads per core. Uses CPUID 0x0B (Intel extended topology) or 0x80000008 + 0x8000001E (AMD) with fallback to CPUID leaf 0x04.

### `tpu` / `lpu` — Accelerator abstractions

Global singletons via `Once<TpuDevice>`/`Once<LpuDevice>`. Each wraps a base address, initialized flag, and mode register. `transfer()`/`submit_task()` allocates DMA buffers, copies data, and submits through the DMA engine. IRQ shims (`tpu_irq_shim`/`lpu_irq_shim`) increment atomic counters.

### `common` — Zero-alloc primitives

- **`OnceCopy<T>`**: Lock-free set-once via CAS state machine (0→1→2)
- **`Once<T>`**: Same for non-Copy types
- **`BitField`**: Extract/insert bit ranges with `mask_u64()`, `extract_u32/u64()`, `insert_u32/u64()`
- **`Registers`**: 32-entry `AtomicUsize` bank with auto-increment write counter
- **`Volatile`**: `read_volatile`/`write_volatile` wrappers
- **Alignment, Atomic, Barrier, Endian**: Standard bare-metal helpers

### `init` — Boot sequence

`init()` runs in order: shims → config (SSE/AVX/NEON detection) → common (endian/alignment/bitfield self-test) → firmware (ACPI/UEFI/SMBIOS/DT) → memory → interrupts → bus (PCI) → DMA → IOMMU → CPU → security → discovery → timers → accelerators (GPU/TPU/LPU) → topology → debug → power → thermal.

## Tests

```
cargo test --test detect_all -- --nocapture
```

11 tests: architecture, CPU info (vendor/model/cores/caches/HT), per-thread frequencies, topology (sockets/cores), system topology, RAM (total/available), GPU (vendor IDs), PCI device classification, CPU features (SSE/SSE2), power governor, full summary.

```
cargo test --test stress_sequential -- --nocapture
```

7 sequential phases: CPU load (70% of threads, 3s busy-loop with LCG), RAM pressure (70% of available, page-stride write/verify), disk I/O (512 MB write+read, 4 MB chunks, throughput in MB/s), L3 cache thrashing (70% of L3, 10 stride passes), context switching (70% workers yielding for 2s), swap pressure (70% of swap, monitor via sysinfo), and recovery.

## License

MIT