Expand description
Persistent encoder worker thread (ADR-028 iter-380).
Provides a long-lived worker thread for parallel command-buffer encoding,
mirroring llama.cpp’s n_cb=2 GCD dispatch_apply pattern (see
/opt/llama.cpp/ggml/src/ggml-metal/ggml-metal-context.m:438+550).
Per the existing forward_decode comment at line 4592-4595:
Threaded wait DURING encode: -43 tok/s (thread spawn + Metal cross-thread synchronization overhead on command queue)
That falsified attempt used per-token std::thread::spawn, paying the
~50 µs spawn cost on every decode token. This module amortizes that cost
by spawning the worker ONCE at process start, then submitting work via a
crossbeam-style mpsc channel.
§Usage
ⓘ
use mlx_native::encoder_worker::EncoderWorker;
// At process start (e.g., model load):
let worker = EncoderWorker::spawn();
// Per-token (or per-encoding-task):
let (done_tx, done_rx) = std::sync::mpsc::channel();
worker.submit(move || {
// ... encode work into a fresh CommandEncoder ...
done_tx.send(()).ok();
});
// Main thread can do its own work in parallel.
// Eventually wait for worker to finish:
done_rx.recv().expect("worker died");§Safety / lifetime
- The worker thread is detached on
EncoderWorker::shutdown()only. The thread holds aReceiver<Closure>; when allSenderclones drop, theiter()loop exits naturally and the thread joins. - Closures must be
'static(they cross thread boundaries). UseArcfor shared state. - Closures must be
Send(Rust’smpsc::channelenforces this).
Structs§
- Encoder
Worker - A persistent worker thread that executes submitted closures sequentially
(in submission order). Designed for command-buffer encoding workloads
where the cost of
std::thread::spawnper task would dwarf the work.