Module encoder_worker

Expand description

Persistent encoder worker thread (ADR-028 iter-380).

Provides a long-lived worker thread for parallel command-buffer encoding, mirroring llama.cpp’s n_cb=2 GCD dispatch_apply pattern (see /opt/llama.cpp/ggml/src/ggml-metal/ggml-metal-context.m:438+550).

Per the existing forward_decode comment at line 4592-4595:

Threaded wait DURING encode: -43 tok/s (thread spawn + Metal cross-thread synchronization overhead on command queue)

That falsified attempt used per-token std::thread::spawn, paying the ~50 µs spawn cost on every decode token. This module amortizes that cost by spawning the worker ONCE at process start, then submitting work via a crossbeam-style mpsc channel.

§Usage

use mlx_native::encoder_worker::EncoderWorker;

// At process start (e.g., model load):
let worker = EncoderWorker::spawn();

// Per-token (or per-encoding-task):
let (done_tx, done_rx) = std::sync::mpsc::channel();
worker.submit(move || {
    // ... encode work into a fresh CommandEncoder ...
    done_tx.send(()).ok();
});

// Main thread can do its own work in parallel.

// Eventually wait for worker to finish:
done_rx.recv().expect("worker died");

§Safety / lifetime

The worker thread is detached on EncoderWorker::shutdown() only. The thread holds a Receiver<Closure>; when all Sender clones drop, the iter() loop exits naturally and the thread joins.
Closures must be 'static (they cross thread boundaries). Use Arc for shared state.
Closures must be Send (Rust’s mpsc::channel enforces this).

Structs§

EncoderWorker: A persistent worker thread that executes submitted closures sequentially (in submission order). Designed for command-buffer encoding workloads where the cost of std::thread::spawn per task would dwarf the work.