npu-rs 0.1.2 - Docs.rs

# NPU Driver for 20 TOPS RISC Board

A Simulation Rust driver for neural processing units on RISC-based boards with 20 TOPS peak performance.

NOTE: *This crate is a simulator Real hardware integration requires HAL implementation and Linux kernel module support.
NOTE: I don't own a real RISC board thus this code wasn't tested on real RISCV hardware, please make sure to use at your own risk.

## Features

Core Compute
  - Matrix multiplication (single and batched)
  - 1x1 convolution operations
  - Multi-dimensional tensor support

Memory Management
  - Device memory allocation tracking
  - Memory pool for efficient allocation
  - Real-time statistics

Power Management
  - Dynamic voltage and frequency scaling (DVFS)
  - Thermal monitoring and throttling
  - Multiple power domains (compute, memory, cache, control)

Performance Analysis
  - Real-time throughput measurement (GOPS)
  - Power consumption tracking
  - Operation-level profiling
  - Performance metrics collection

Model Optimization
  - Post-training quantization (INT8)
  - Graph optimization and fusion
  - Operator optimization patterns

Device Management
  - Multi-device support
  - Device registry
  - JSON status reporting

## Module Overview

**tensor:** Tensor operations (add, sub, mul, div, relu, sigmoid)

**device:** Device driver and state management

**memory:** Memory allocation and tracking

**compute:** Matrix multiplication and convolution units

**execution:** Operation execution and scheduling

**power:** DVFS and thermal management

**model:** Neural network model definitions

**quantization:** INT8 quantization and calibration

**optimizer:** Graph optimization

**profiler:** Performance profiling

**perf_monitor:** Real-time metrics

**error:** Error handling

## Download

```bash
cargo install npu-rs
```

## Building

```bash
cargo build --release
```

## Running

NOTE: THIS CODE RUNS ON CPU ONLY; NO REAL HARDWARE EXECUTION

```bash
cargo run                              # Full demo
cargo run --example full_inference_pipeline  # Example pipeline
```

## Device Configuration

Peak Throughput  - 20 TOPS
Memory           - 512 MB
Compute Units    - 4
Frequency        - 400-1000 MHz (via DVFS)
Power TDP        - 1.2-5.0 W
Thermal Limit    - 90 C

## Usage Example

```rust
use npu_rs::{NpuDevice, Tensor, ExecutionContext};
use std::sync::Arc;

let device = Arc::new(NpuDevice::new());
device.initialize()?;

let ctx = ExecutionContext::new(device);
let a = Tensor::random(&[4, 8]);
let b = Tensor::random(&[8, 6]);

let result = ctx.execute_matmul(&a.data, &b.data)?;
println!("Result: {:?}", result.shape());
```

## Design

- Type-safe Rust with no unsafe code
- Thread-safe using Arc and Mutex
- Comprehensive error handling
- Documentation comments only (no inline comments)
- All modules fully implemented
- Production-ready code quality

### Build With ♥️ in Rust