Skip to main content

Module encoder_worker

Module encoder_worker 

Source
Expand description

Persistent encoder worker thread (ADR-028 iter-380).

Provides a long-lived worker thread for parallel command-buffer encoding, mirroring llama.cpp’s n_cb=2 GCD dispatch_apply pattern (see /opt/llama.cpp/ggml/src/ggml-metal/ggml-metal-context.m:438+550).

Per the existing forward_decode comment at line 4592-4595:

Threaded wait DURING encode: -43 tok/s (thread spawn + Metal cross-thread synchronization overhead on command queue)

That falsified attempt used per-token std::thread::spawn, paying the ~50 µs spawn cost on every decode token. This module amortizes that cost by spawning the worker ONCE at process start, then submitting work via a crossbeam-style mpsc channel.

§Usage

use mlx_native::encoder_worker::EncoderWorker;

// At process start (e.g., model load):
let worker = EncoderWorker::spawn();

// Per-token (or per-encoding-task):
let (done_tx, done_rx) = std::sync::mpsc::channel();
worker.submit(move || {
    // ... encode work into a fresh CommandEncoder ...
    done_tx.send(()).ok();
});

// Main thread can do its own work in parallel.

// Eventually wait for worker to finish:
done_rx.recv().expect("worker died");

§Safety / lifetime

  • The worker thread is detached on EncoderWorker::shutdown() only. The thread holds a Receiver<Closure>; when all Sender clones drop, the iter() loop exits naturally and the thread joins.
  • Closures must be 'static (they cross thread boundaries). Use Arc for shared state.
  • Closures must be Send (Rust’s mpsc::channel enforces this).

Structs§

EncoderWorker
A persistent worker thread that executes submitted closures sequentially (in submission order). Designed for command-buffer encoding workloads where the cost of std::thread::spawn per task would dwarf the work.