# esp-p4-eth
`#![no_std]` async Ethernet MAC driver for **ESP32-P4** RMII designs, plug-in
compatible with [`embassy-net`](https://crates.io/crates/embassy-net).
> **Status:** ready for **0.1.0** crates.io release as of 2026-04-29.
> End-to-end ping, TCP, and UDP all work on the Waveshare ESP32-P4-ETH dev
> board (IP101GRI PHY) at 100 Mbps full duplex. Both `embassy-time`
> (SYSTIMER) and EMAC RX/TX wakers can run IRQ-driven via the on-chip CLIC —
> the executor genuinely sleeps on `wfi` when there's no work to do.
> Cold-boot and warm-reboot reliability validated 100 % across stress runs
> (30/30 warm reboots on the `embassy_tcp_soak` harness, 5/5 power-cycle
> on the canonical examples). 180 host-side unit tests pass on
> `x86_64-unknown-linux-gnu`. The public API surface (`Ethernet`, `Device`,
> `Runner`, `BoardConfig`, the `pub mod diag` observability atomics)
> is what we expect to commit to for 0.x SemVer.
## Why this exists
As of late April 2026, the official [`esp-hal`](https://github.com/esp-rs/esp-hal)
does not include `soc/esp32p4/` EMAC support — the P4 simply isn't covered by
the upstream HAL crates yet. This repository is a self-contained, hand-rolled
Synopsys DesignWare GMAC driver targeting the P4 silicon directly: clock tree,
IO_MUX + GPIO matrix routing, DMA descriptor rings with cache coherency,
MDIO + IP101 PHY, and an `embassy-net-driver-channel` adaptor that drops into
any `embassy-net` stack.
## Features
- 100 Mbps full duplex RMII, `embassy-net` `Driver` via
`embassy-net-driver-channel`
- Two `embassy-time` driver options on the P4 SYSTIMER:
- **IRQ-driven** via SYSTIMER alarm 0 → CLIC entry 17, gated by the
`p4-time-driver-irq` cargo feature. Executor drops to `wfi` between
deadlines. Drift ~0.002 % over 1 s on the Waveshare board.
- **Polling** via `time_polling_task`, gated by `p4-time-driver`. Simpler
but burns one core; kept for comparison and as a `wfi`-free fallback.
- **IRQ-driven EMAC RX/TX wakers** under `p4-time-driver-irq` — DMA
completion interrupts (Synopsys SBD source → CLIC entry 18) wake the
embassy-net rx/tx futures. No `wake_by_ref` busy-spin under that feature.
- IPv4 / ARP / ICMP / TCP / UDP all verified end-to-end against a Windows host
through a consumer router
- All RX frame sizes from 60 up to MTU 1500 bytes round-trip cleanly (single
1472-byte UDP datagrams included)
- `BoardConfig` abstraction — bring your own pin map, ref-clock pad, and
PHY MDIO address
- 180 host-side unit tests; the host build target is `x86_64-unknown-linux-gnu`
## Hardware support
| Waveshare ESP32-P4-ETH (IP101GRI PHY, 25 MHz XO) | ✅ tested | `BoardConfig::WAVESHARE_P4_ETH` |
| Other P4 + RMII PHY designs | should work | construct your own `BoardConfig` |
### Waveshare ESP32-P4-ETH default pin map
| TXD0 | 34 |
| TXD1 | 35 |
| TX_EN | 49 |
| RXD0 | 30 |
| RXD1 | 29 |
| CRS_DV | 28 |
| MDC | 31 |
| MDIO | 52 |
| PHY RESET | 51 (active low) |
| REF_CLK in | 50 |
| PHY MDIO addr | 1 |
## Quick start (Waveshare ESP32-P4-ETH)
Build a minimal "ping me" example:
```bash
cargo build \
--no-default-features \
--features p4-example,p4-time-driver,embassy-net-icmp \
--example embassy_static_ping
```
Flash via [`espflash`](https://github.com/esp-rs/espflash) into RAM:
```bash
espflash flash --ram --no-stub --chip esp32p4 \
--ignore_app_descriptor --monitor \
target/riscv32imafc-unknown-none-elf/debug/examples/embassy_static_ping
```
The example brings the link up at 100 Mbps full duplex (~2.5 s after reset),
configures itself as `192.168.0.50/24`, and replies to ICMP echo requests.
Adjust `SELF_IP` and `GATEWAY` constants in the example for your subnet.
For the **IRQ-driven** path (recommended for real workloads — no polling
task burning a core, executor sleeps on `wfi`):
```bash
cargo build \
--no-default-features \
--features p4-example,p4-time-driver-irq,embassy-net-tcp \
--example embassy_tcp_echo_irq
```
This routes SYSTIMER alarms through CLIC entry 17 and EMAC RX/TX completion
interrupts through CLIC entry 18, both dispatched by a single trap entry
(`_p4_eth_trap_entry`) defined in `src/time_driver_irq.rs`.
## Mandatory P4-specific build configuration
ESP32-P4 has two hardware constraints that the calling crate **must** honour:
1. **DMA-shared statics must live below `0x4FF80000`.** The upper 256 KB of
HP SRAM is the L2 cache backing region; bus masters (the EMAC DMA) cannot
read it. Any static that the DMA touches (descriptors, packet buffers,
`StaticDmaResources`) must be placed in a linker section that resolves to
the safe range. See `memory.x` and the
`#[link_section = ".dma_bss"]` annotation in the bundled examples.
2. **Use a workspace `[profile.dev]` with `opt-level = 1`.** With `opt-level = 0`
the debug `.text` overflows the 192 KB safe-DRAM slab; with `opt-level = "s"`
inlining gets aggressive enough that the naked-counter MDIO BUSY-poll loop
completes before the PHY answers and the bus times out.
The ready-made `embassy_static_ping`, `embassy_tcp_echo`, and
`embassy_udp_echo` examples already embed both invariants and serve as
templates.
## Cargo features
| `mock-time` | yes | embassy-time mock driver — required for host tests; mutually exclusive with `p4-time-driver*` |
| `embassy-net-tcp` | yes | enables the `embassy-net/tcp` socket layer |
| `embassy-net-udp` | no | enables the `embassy-net/udp` socket layer |
| `embassy-net-icmp` | yes | enables `embassy-net/auto-icmp-echo-reply` |
| `p4-time-driver` | no | SYSTIMER-backed polling `embassy-time` driver for `riscv32imafc` targets |
| `p4-time-driver-irq` | no | SYSTIMER+CLIC IRQ-driven `embassy-time` driver. Also routes EMAC RX/TX completion IRQs and removes the `wake_by_ref` paths in `eth/mod.rs`. Mutually exclusive with `p4-time-driver` |
| `p4-example` | no | gates `[[example]]` blocks; required by every P4 example |
For a target build, `--no-default-features` then add what you need, e.g.:
```bash
--features p4-example,p4-time-driver,embassy-net-icmp,embassy-net-udp
```
## Examples
The crate ships with three canonical examples that build with their
required features by default. Bring-up scratch examples (mdio_test,
clk_dump, systimer_probe, etc.) live under `examples/dev/` and are gated
behind the `dev-examples` feature so they don't pollute downstream builds.
### Canonical (`examples/`)
| `embassy_static_ping` | full embassy-net stack with static IP, ICMP echo reply via the driver (polling time driver) |
| `embassy_dhcp` | embassy-net DHCP client + diagnostic atomics dump (polling time driver) |
| `embassy_tcp_echo_irq` | TCP listener on :7777 echoing bytes back, byte-exact for any size 1..1500, IRQ-driven path |
| `embassy_tcp_soak` | 4 parallel TCP echo listeners on :7780–:7783 with 60-second `stat_task` snapshots and hourly summary; pair with `examples/dev/soak_driver.py` from the host for byte-exact verification |
Build, e.g.:
```bash
cargo build --no-default-features \
--features p4-example,p4-time-driver-irq,embassy-net-tcp \
--example embassy_tcp_echo_irq
```
### Dev / bring-up (`examples/dev/`, requires `dev-examples` feature)
`mdio_test`, `phy_probe`, `clk_dump`, `systimer_probe`, `clic_irq_smoke`,
`embassy_smoke`, `embassy_irq_smoke`, `embassy_time_smoke`, polling
`embassy_tcp_echo` / `embassy_udp_echo`, `embassy_tcp_stress_irq`,
`phy_init_diag` (cold-boot diagnostic, requires `phy-init-debug` feature),
plus `soak_driver.py` host-side companion for `embassy_tcp_soak`.
Build any of them by adding `dev-examples` to the feature list, e.g.:
```bash
cargo build --no-default-features \
--features p4-example,dev-examples \
--example mdio_test
```
## P4 CLIC quirks (relevant if you fork the trap entry)
Three non-obvious facts learned the hard way during IRQ bring-up. They're
already encoded in `src/clic.rs` and `src/time_driver_irq.rs`, but worth
flagging if you're poking the trap path or routing additional IRQs:
1. **`mtvec.MODE` is forced to `11` (CLIC mode) in hardware.** Direct /
Vectored RISC-V modes are not available — writing `addr | 0` reads back
as `addr | 3`.
2. **Trap-entry base = `mtvec & ~0xFF`.** The low 8 bits are MODE/reserved
and get clamped. The asm trap entry must be `.balign 256`.
3. **`INTERRUPT_CORE0_<peripheral>_INT_MAP_REG` accepts the CLIC index**
(`cpu_int_line + 16`), not just the CPU INT line. Writing `1` to map a
peripheral to "CPU line 1" silently no-ops; you must write `17` for it
to land in `CLIC_INT_CTRL_REG[17]`.
4. **Don't enable `AIE` in `DMA_INTEN` under IRQ-driven mode.** The
`RU` bit (RX Buffer Unavailable, sticky) goes high immediately after
`Ethernet::start()` because the descriptor ring is empty; with AIE
enabled the abnormal-summary line storms the trap. The driver programs
`DMA_INTEN = TIE | RIE | NIE` only and leaves abnormal recovery to a
polling `dma_recovery_task`.
## Performance & footprint
Numbers below are for `embassy_tcp_echo_irq` built in release mode against
crate version 0.1.0 on the Waveshare ESP32-P4-ETH dev board.
| `.text` (code) | 71.5 KB | release, debuginfo stripped |
| `.rodata` | 12.6 KB | |
| `.bss` (CPU-side) | 31.3 KB | embassy + smoltcp + sockets + diagnostics|
| `.dma_bss` (DMA buffers + descriptors) | 26.1 KB | 8 RX + 8 TX × 1536-byte buffers |
| `.stack` budget | 76.0 KB | embassy task pool reservation |
| **Total runtime RAM** | **~134 KB** | static + stack budget |
| Combined RAM-loaded image | ~218 KB | the whole `--ram --no-stub` payload |
| Idle CPU | **0.024 %** @ 360 MHz | executor sleeps on `wfi` |
| Cold-boot to first link-up | ~3 s | includes 5 M-cycle PHY oscillator wait |
| Sustained TCP echo validated | 4 Mbps × 50 s, byte-clean, RBU=0 | window-limited (2 KB sockets), not driver|
| Cold-boot reliability | 100 % (5/5 power-cycle) | |
| Warm-reboot reliability | 100 % (30/30 stress on soak harness) | after the L2 cache mode init fix |
For an apples-to-apples comparison with the IDF `esp_eth` driver on the
same chip, `MIGRATION_PLAN/ESP32_P4.md` has the canonical IDF baseline
recipe; runtime numbers vs IDF are an open follow-up tracked for 0.2.0.
## Known limitations
- **Throughput ceiling not characterised.** Sustained TCP RX of ~4 Mbps and
TX of ~3 Mbps over 25 s (Waveshare board, 2 KB socket buffers, host
through WSL NAT) round-trip cleanly with `RBU = 0`, 1:1 IRQ-to-frame
ratio, and no descriptor errors. Higher rates and multi-connection
saturation have not been measured yet — the stress example
(`embassy_tcp_stress_irq`) is window-limited by 2 KB sockets, and
`embassy_tcp_soak` has 4 listeners × 4 KB each. A multi-MB/s
characterisation with 32 KB sockets and a direct cable is a planned
follow-up.
- **DMA buffer footprint vs IDF.** The driver currently allocates one
1536-byte buffer per descriptor (8 + 8 = ~26 KB total). IDF defaults to
20 RX + 10 TX × 512-byte buffers with descriptor chaining for jumbo
frames (~16 KB). Switching to chained 512-byte buffers would shave
~10 KB of static RAM at the cost of more bookkeeping in `descriptors.rs`.
- **Cache writeback uses `_All` instead of `_Addr`.** The chip ROM
`Cache_WriteBack_Addr` variant returns success but does not actually
flush data to RAM on the `--ram --no-stub` boot path even after
`Cache_Set_L2_Cache_Mode` has been re-applied. The driver therefore
uses `Cache_WriteBack_All` per descriptor — correct, but ~100 µs/frame
more expensive than IDF's per-address writeback. Identifying the
remaining init step IDF runs that makes `_Addr` actually flush is a
planned investigation.
- **DHCP not validated end-to-end.** The driver sends DHCP DISCOVER frames
correctly but the lab consumer router silently drops them. Static IP works.
- **Half-IRQ EMAC error recovery (cosmetic).** Under `p4-time-driver-irq`,
normal RX/TX completions are IRQ-driven but `RU` / abnormal recovery
still lives in the polling `dma_recovery_task`. The stress example shows
recovery never fires at 3–4 Mbps sustained (rbu+=0 over 50 s of
continuous traffic), so this is not a correctness issue at SmartBox-scale
rates — but re-enabling `AIE` with proper debouncing and removing the
polling task is a future cleanup.
## Contributing
Issues and pull requests welcome. The driver is structured for
side-by-side comparison with ESP-IDF baseline register dumps — when adding
a fix, please include a brief explanation of which register/bit changes and
what wire-level test (ping, UDP echo size, TCP echo size) catches a
regression.
## License
Licensed under either of
- Apache License, Version 2.0 ([LICENSE-APACHE](../LICENSE-APACHE))
- MIT license ([LICENSE-MIT](../LICENSE-MIT))
at your option.
Unless you explicitly state otherwise, any contribution intentionally
submitted for inclusion in the work by you, as defined in the Apache-2.0
license, shall be dual licensed as above, without any additional terms or
conditions.