esp-p4-eth 0.1.0

no_std async Ethernet MAC driver for ESP32-P4 RMII designs (embassy-net compatible)
Documentation

esp-p4-eth

#![no_std] async Ethernet MAC driver for ESP32-P4 RMII designs, plug-in compatible with embassy-net.

Status: ready for 0.1.0 crates.io release as of 2026-04-29. End-to-end ping, TCP, and UDP all work on the Waveshare ESP32-P4-ETH dev board (IP101GRI PHY) at 100 Mbps full duplex. Both embassy-time (SYSTIMER) and EMAC RX/TX wakers can run IRQ-driven via the on-chip CLIC — the executor genuinely sleeps on wfi when there's no work to do. Cold-boot and warm-reboot reliability validated 100 % across stress runs (30/30 warm reboots on the embassy_tcp_soak harness, 5/5 power-cycle on the canonical examples). 180 host-side unit tests pass on x86_64-unknown-linux-gnu. The public API surface (Ethernet, Device, Runner, BoardConfig, the pub mod diag observability atomics) is what we expect to commit to for 0.x SemVer.

Why this exists

As of late April 2026, the official esp-hal does not include soc/esp32p4/ EMAC support — the P4 simply isn't covered by the upstream HAL crates yet. This repository is a self-contained, hand-rolled Synopsys DesignWare GMAC driver targeting the P4 silicon directly: clock tree, IO_MUX + GPIO matrix routing, DMA descriptor rings with cache coherency, MDIO + IP101 PHY, and an embassy-net-driver-channel adaptor that drops into any embassy-net stack.

Features

  • 100 Mbps full duplex RMII, embassy-net Driver via embassy-net-driver-channel
  • Two embassy-time driver options on the P4 SYSTIMER:
    • IRQ-driven via SYSTIMER alarm 0 → CLIC entry 17, gated by the p4-time-driver-irq cargo feature. Executor drops to wfi between deadlines. Drift ~0.002 % over 1 s on the Waveshare board.
    • Polling via time_polling_task, gated by p4-time-driver. Simpler but burns one core; kept for comparison and as a wfi-free fallback.
  • IRQ-driven EMAC RX/TX wakers under p4-time-driver-irq — DMA completion interrupts (Synopsys SBD source → CLIC entry 18) wake the embassy-net rx/tx futures. No wake_by_ref busy-spin under that feature.
  • IPv4 / ARP / ICMP / TCP / UDP all verified end-to-end against a Windows host through a consumer router
  • All RX frame sizes from 60 up to MTU 1500 bytes round-trip cleanly (single 1472-byte UDP datagrams included)
  • BoardConfig abstraction — bring your own pin map, ref-clock pad, and PHY MDIO address
  • 180 host-side unit tests; the host build target is x86_64-unknown-linux-gnu

Hardware support

Board Status BoardConfig
Waveshare ESP32-P4-ETH (IP101GRI PHY, 25 MHz XO) ✅ tested BoardConfig::WAVESHARE_P4_ETH
Other P4 + RMII PHY designs should work construct your own BoardConfig

Waveshare ESP32-P4-ETH default pin map

Signal GPIO
TXD0 34
TXD1 35
TX_EN 49
RXD0 30
RXD1 29
CRS_DV 28
MDC 31
MDIO 52
PHY RESET 51 (active low)
REF_CLK in 50
PHY MDIO addr 1

Quick start (Waveshare ESP32-P4-ETH)

Build a minimal "ping me" example:

cargo build \
  --no-default-features \
  --features p4-example,p4-time-driver,embassy-net-icmp \
  --example embassy_static_ping

Flash via espflash into RAM:

espflash flash --ram --no-stub --chip esp32p4 \
  --ignore_app_descriptor --monitor \
  target/riscv32imafc-unknown-none-elf/debug/examples/embassy_static_ping

The example brings the link up at 100 Mbps full duplex (~2.5 s after reset), configures itself as 192.168.0.50/24, and replies to ICMP echo requests. Adjust SELF_IP and GATEWAY constants in the example for your subnet.

For the IRQ-driven path (recommended for real workloads — no polling task burning a core, executor sleeps on wfi):

cargo build \
  --no-default-features \
  --features p4-example,p4-time-driver-irq,embassy-net-tcp \
  --example embassy_tcp_echo_irq

This routes SYSTIMER alarms through CLIC entry 17 and EMAC RX/TX completion interrupts through CLIC entry 18, both dispatched by a single trap entry (_p4_eth_trap_entry) defined in src/time_driver_irq.rs.

Mandatory P4-specific build configuration

ESP32-P4 has two hardware constraints that the calling crate must honour:

  1. DMA-shared statics must live below 0x4FF80000. The upper 256 KB of HP SRAM is the L2 cache backing region; bus masters (the EMAC DMA) cannot read it. Any static that the DMA touches (descriptors, packet buffers, StaticDmaResources) must be placed in a linker section that resolves to the safe range. See memory.x and the #[link_section = ".dma_bss"] annotation in the bundled examples.
  2. Use a workspace [profile.dev] with opt-level = 1. With opt-level = 0 the debug .text overflows the 192 KB safe-DRAM slab; with opt-level = "s" inlining gets aggressive enough that the naked-counter MDIO BUSY-poll loop completes before the PHY answers and the bus times out.

The ready-made embassy_static_ping, embassy_tcp_echo, and embassy_udp_echo examples already embed both invariants and serve as templates.

Cargo features

Feature Default Purpose
mock-time yes embassy-time mock driver — required for host tests; mutually exclusive with p4-time-driver*
embassy-net-tcp yes enables the embassy-net/tcp socket layer
embassy-net-udp no enables the embassy-net/udp socket layer
embassy-net-icmp yes enables embassy-net/auto-icmp-echo-reply
p4-time-driver no SYSTIMER-backed polling embassy-time driver for riscv32imafc targets
p4-time-driver-irq no SYSTIMER+CLIC IRQ-driven embassy-time driver. Also routes EMAC RX/TX completion IRQs and removes the wake_by_ref paths in eth/mod.rs. Mutually exclusive with p4-time-driver
p4-example no gates [[example]] blocks; required by every P4 example

For a target build, --no-default-features then add what you need, e.g.:

--features p4-example,p4-time-driver,embassy-net-icmp,embassy-net-udp

Examples

The crate ships with three canonical examples that build with their required features by default. Bring-up scratch examples (mdio_test, clk_dump, systimer_probe, etc.) live under examples/dev/ and are gated behind the dev-examples feature so they don't pollute downstream builds.

Canonical (examples/)

Example What it shows
embassy_static_ping full embassy-net stack with static IP, ICMP echo reply via the driver (polling time driver)
embassy_dhcp embassy-net DHCP client + diagnostic atomics dump (polling time driver)
embassy_tcp_echo_irq TCP listener on :7777 echoing bytes back, byte-exact for any size 1..1500, IRQ-driven path
embassy_tcp_soak 4 parallel TCP echo listeners on :7780–:7783 with 60-second stat_task snapshots and hourly summary; pair with examples/dev/soak_driver.py from the host for byte-exact verification

Build, e.g.:

cargo build --no-default-features \
  --features p4-example,p4-time-driver-irq,embassy-net-tcp \
  --example embassy_tcp_echo_irq

Dev / bring-up (examples/dev/, requires dev-examples feature)

mdio_test, phy_probe, clk_dump, systimer_probe, clic_irq_smoke, embassy_smoke, embassy_irq_smoke, embassy_time_smoke, polling embassy_tcp_echo / embassy_udp_echo, embassy_tcp_stress_irq, phy_init_diag (cold-boot diagnostic, requires phy-init-debug feature), plus soak_driver.py host-side companion for embassy_tcp_soak.

Build any of them by adding dev-examples to the feature list, e.g.:

cargo build --no-default-features \
  --features p4-example,dev-examples \
  --example mdio_test

P4 CLIC quirks (relevant if you fork the trap entry)

Three non-obvious facts learned the hard way during IRQ bring-up. They're already encoded in src/clic.rs and src/time_driver_irq.rs, but worth flagging if you're poking the trap path or routing additional IRQs:

  1. mtvec.MODE is forced to 11 (CLIC mode) in hardware. Direct / Vectored RISC-V modes are not available — writing addr | 0 reads back as addr | 3.
  2. Trap-entry base = mtvec & ~0xFF. The low 8 bits are MODE/reserved and get clamped. The asm trap entry must be .balign 256.
  3. INTERRUPT_CORE0_<peripheral>_INT_MAP_REG accepts the CLIC index (cpu_int_line + 16), not just the CPU INT line. Writing 1 to map a peripheral to "CPU line 1" silently no-ops; you must write 17 for it to land in CLIC_INT_CTRL_REG[17].
  4. Don't enable AIE in DMA_INTEN under IRQ-driven mode. The RU bit (RX Buffer Unavailable, sticky) goes high immediately after Ethernet::start() because the descriptor ring is empty; with AIE enabled the abnormal-summary line storms the trap. The driver programs DMA_INTEN = TIE | RIE | NIE only and leaves abnormal recovery to a polling dma_recovery_task.

Performance & footprint

Numbers below are for embassy_tcp_echo_irq built in release mode against crate version 0.1.0 on the Waveshare ESP32-P4-ETH dev board.

Metric Value Notes
.text (code) 71.5 KB release, debuginfo stripped
.rodata 12.6 KB
.bss (CPU-side) 31.3 KB embassy + smoltcp + sockets + diagnostics
.dma_bss (DMA buffers + descriptors) 26.1 KB 8 RX + 8 TX × 1536-byte buffers
.stack budget 76.0 KB embassy task pool reservation
Total runtime RAM ~134 KB static + stack budget
Combined RAM-loaded image ~218 KB the whole --ram --no-stub payload
Idle CPU 0.024 % @ 360 MHz executor sleeps on wfi
Cold-boot to first link-up ~3 s includes 5 M-cycle PHY oscillator wait
Sustained TCP echo validated 4 Mbps × 50 s, byte-clean, RBU=0 window-limited (2 KB sockets), not driver
Cold-boot reliability 100 % (5/5 power-cycle)
Warm-reboot reliability 100 % (30/30 stress on soak harness) after the L2 cache mode init fix

For an apples-to-apples comparison with the IDF esp_eth driver on the same chip, MIGRATION_PLAN/ESP32_P4.md has the canonical IDF baseline recipe; runtime numbers vs IDF are an open follow-up tracked for 0.2.0.

Known limitations

  • Throughput ceiling not characterised. Sustained TCP RX of ~4 Mbps and TX of ~3 Mbps over 25 s (Waveshare board, 2 KB socket buffers, host through WSL NAT) round-trip cleanly with RBU = 0, 1:1 IRQ-to-frame ratio, and no descriptor errors. Higher rates and multi-connection saturation have not been measured yet — the stress example (embassy_tcp_stress_irq) is window-limited by 2 KB sockets, and embassy_tcp_soak has 4 listeners × 4 KB each. A multi-MB/s characterisation with 32 KB sockets and a direct cable is a planned follow-up.
  • DMA buffer footprint vs IDF. The driver currently allocates one 1536-byte buffer per descriptor (8 + 8 = ~26 KB total). IDF defaults to 20 RX + 10 TX × 512-byte buffers with descriptor chaining for jumbo frames (~16 KB). Switching to chained 512-byte buffers would shave ~10 KB of static RAM at the cost of more bookkeeping in descriptors.rs.
  • Cache writeback uses _All instead of _Addr. The chip ROM Cache_WriteBack_Addr variant returns success but does not actually flush data to RAM on the --ram --no-stub boot path even after Cache_Set_L2_Cache_Mode has been re-applied. The driver therefore uses Cache_WriteBack_All per descriptor — correct, but ~100 µs/frame more expensive than IDF's per-address writeback. Identifying the remaining init step IDF runs that makes _Addr actually flush is a planned investigation.
  • DHCP not validated end-to-end. The driver sends DHCP DISCOVER frames correctly but the lab consumer router silently drops them. Static IP works.
  • Half-IRQ EMAC error recovery (cosmetic). Under p4-time-driver-irq, normal RX/TX completions are IRQ-driven but RU / abnormal recovery still lives in the polling dma_recovery_task. The stress example shows recovery never fires at 3–4 Mbps sustained (rbu+=0 over 50 s of continuous traffic), so this is not a correctness issue at SmartBox-scale rates — but re-enabling AIE with proper debouncing and removing the polling task is a future cleanup.

Contributing

Issues and pull requests welcome. The driver is structured for side-by-side comparison with ESP-IDF baseline register dumps — when adding a fix, please include a brief explanation of which register/bit changes and what wire-level test (ping, UDP echo size, TCP echo size) catches a regression.

License

Licensed under either of

at your option.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.