Skip to main content

EncoderSession

Struct EncoderSession 

Source
pub struct EncoderSession { /* private fields */ }
Expand description

Session-level wrapper around a CommandEncoder for one or more logical transformer stages.

See module docs for lifecycle and fence preservation. iter89e2-B scope: multi-stage chaining via MTLSharedEvent, residency delegation surface, and the matching test cohort. Phase 0b-C will broaden label propagation; Phase 2+ will wire this struct into the production forward path.

§Thread safety

EncoderSession is Send because CommandEncoder is Send (the existing unsafe impl at encoder.rs:613-619), String/u64/bool are Send, metal::Device is Send + Sync (foreign_obj_type! at metal-rs 0.33 lib.rs:179), and metal::SharedEvent is Send + Sync for the same reason. It is NOT Sync — exclusive ownership during dispatch encoding is the same contract as the inner CommandEncoder.

Implementations§

Source§

impl EncoderSession

Source

pub fn env_enabled() -> bool

Whether HF2Q_ENCODER_SESSION=1 is set in the process environment.

Public introspection helper for hf2q-side dispatch wrappers that need to choose between the legacy command_encoder() path and the new encoder_session() path. Cached on first read via OnceLock so the per-call cost is a single atomic load.

Source

pub fn begin_stage(&mut self, label: &str)

Set the semantic stage label.

The label propagates to MTLCommandBuffer.label and (when an encoder is active) MTLComputeCommandEncoder.label at the next commit_stage / commit_and_wait / fence_stage call, enabling xctrace MST attribution per ADR-015 iter16. Calling begin_stage does NOT itself touch any Metal object — it only stores the string.

Idempotent: calling begin_stage multiple times before commit overwrites the previous label with the latest value, matching the existing apply_labels semantic at encoder.rs:1980-1985 (the last_label field is overwritten on every labeled commit).

Source

pub fn encoder(&mut self) -> &mut CommandEncoder

Borrow the inner CommandEncoder for dispatch encoding.

All dispatch APIs (encode, encode_threadgroups, encode_with_args, dispatch_tracked_*, memory_barrier, start_capture / take_capture, etc.) live on CommandEncoder; EncoderSession adds a stage-aware commit surface on top of them. Use this accessor inside the dispatch loop, then call one of Self::commit_stage / Self::commit_and_wait / Self::fence_stage at the stage boundary.

§Caller contract

Do NOT call inner.commit* methods directly through this borrow. Use the session’s commit surface so the stage label propagates and the drained-latch / fence state stay consistent. Calling the inner commit bypasses these — it is not unsafe (no UB risk) but it makes the session state inconsistent with what it has actually committed.

Source

pub fn commit_stage(&mut self) -> Result<()>

Commit the stage’s command buffer non-blocking (no fence).

Delegates to CommandEncoder::commit_labeled (when a label is set) or CommandEncoder::commit (when not). Both end the persistent compute encoder, flush the residency-set pending staging (flush_residency_pending at encoder.rs:2004), and hand the CB to the GPU without blocking the CPU.

The session enters the Drained state. To chain into another stage on the same session, call Self::reset_for_next_stage — that opens a fresh CB and (if a fence was pending from a prior fence_stage) encodes the matching wait. After commit_stage (no fence), reset_for_next_stage does NOT emit a wait — the CBs are merely sequenced by the Metal queue’s FIFO dispatch order.

§Errors

Returns Ok(()) unconditionally — CommandEncoder::commit and CommandEncoder::commit_labeled are infallible (they hand the CB to Metal without waiting for completion; errors surface only at wait_until_completed). The Result is preserved for symmetry with Self::commit_and_wait and for future-proofing.

Source

pub fn commit_and_wait(&mut self) -> Result<()>

Commit the stage’s command buffer and block until GPU completion.

Delegates to CommandEncoder::commit_and_wait_labeled (when a label is set) or CommandEncoder::commit_and_wait (when not). Required at K-batch boundaries (F7) and at output-head CPU reads (F6). Increments SYNC_COUNT exactly once per call (matches encoder.rs:1845).

The session enters the Drained state with NO fence pending — blocking commit fully drains the GPU, so the next stage (after Self::reset_for_next_stage) needs no wait-event.

§Errors

Returns MlxError::CommandBufferError if the GPU reports an error after wait — propagated from CommandEncoder.

ADR-015 iter94 Task #2 — fail-loud contract. iter93 final-report §“Root-cause hypothesis” point 5 noted that under MLX_UNRETAINED_REFS=1 + HF2Q_ENCODER_SESSION=1 + K>1, the session appeared to silently absorb a MTLCommandBufferStatus:: Error and produce deterministic-but-wrong tokens. By code reading, the tail-expression self.inner.commit_and_wait() already returns the inner error (commit_and_wait at encoder.rs:1852 explicitly matches on cmd_buf.status()). This re-shape converts the implicit propagation into an explicit ? chain so future maintainers cannot accidentally swallow the error by inserting a let _ = inner.commit_and_wait(); or adding fall-through logic between the inner call and the function return. Latched drained = true happens BEFORE the inner call so a panicking unwind through Drop sees the same drained-state contract.

Source

pub fn fence_stage(&mut self, label: Option<&str>) -> Result<()>

Encode a stage-fence signal on the current CB and commit non-blocking.

This is the D3 multi-stage building block: the prior stage’s final CB-level op is encodeSignalEvent:value:value+1, where value+1 is then both stored in event_value (so the next stage’s encodeWaitForEvent:value: blocks on it) and committed. The session enters the Fenced (drained-with-fence-pending) state; Self::reset_for_next_stage rotates the inner CB and emits the matching wait.

§Lazy event allocation

On the first call, allocates the per-session MTLSharedEvent via metal::DeviceRef::new_shared_event (/Users/robert/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/metal-0.33.0/src/device.rs:2063). Subsequent calls reuse the same event — the monotonic event_value carries the per-fence identity. This matches the llama.cpp pattern at /opt/llama.cpp/ggml/src/ggml-metal/ggml-metal-device.m:944-958.

§Label

label’s Some(value) arm overwrites stage_label and propagates via commit_labeled’s apply_labels chain — same as calling Self::begin_stage before this. None keeps any previously-set begin_stage label intact.

§Counter semantics

Bumps SYNC_COUNT zero times (non-blocking). Bumps CMD_BUF_COUNT zero times (no new CB allocated here — reset_for_next_stage does that). Increments event_value by exactly 1.

§Errors

Returns Ok(()) unconditionally for the same reason Self::commit_stage does.

Source

pub fn reset_for_next_stage(&mut self) -> Result<()>

Open a fresh command buffer on the same queue and (when a fence is pending) encode the matching wait on the new CB.

This is the second half of the multi-stage chaining primitive. After Self::fence_stage (or Self::commit_stage / Self::commit_and_wait) has put the session in the Drained state, callers invoke this to start the next stage’s CB. The session transitions back to Encoding (no CB or compute encoder open until the next dispatch lazy-opens them).

§Wait-event encoding

If Self::fence_stage was the most recent commit, this method encodes encodeWaitForEvent:value:event_value on the freshly-allocated CB before returning. The new CB’s GPU work blocks until the prior CB’s signal lands at the same value. After Self::commit_stage / Self::commit_and_wait (no fence), no wait is encoded — Metal’s queue-FIFO sequencing is the implicit ordering primitive.

§State machine
BeforeAfter
Drained (no fence)Encoding (new CB, no wait)
Fenced (fence pending)Encoding (new CB, wait encoded)
Encoding (not drained)no-op (returns Ok)

The not-drained case is intentionally a no-op rather than a panic: it keeps the session drop-safe under unusual call sequences (e.g. test scaffolding that calls reset speculatively).

§Counter semantics

Bumps CMD_BUF_COUNT by exactly 1 (the new CB). Does NOT bump SYNC_COUNT (no commit/wait happens here).

§Errors

Returns Ok(()) unconditionally. Future error paths (e.g. queue-side allocation failure on new_command_buffer) would surface here.

Source

pub fn add_to_residency_set(&self, buffer: &MlxBuffer) -> bool

Add a buffer to the device-level residency set.

Delegates to the inner encoder’s [ResidencySet::add_allocation] (the same Arc clone the device, the encoder, and every other concurrent encoder shares — single-set invariant per ADR-019:467). The actual [set commit] is deferred until the next commit_stage / commit_and_wait / fence_stage, which all route through flush_residency_pending.

Returns false and is a no-op when the device booted without a residency set (HF2Q_NO_RESIDENCY=1, macOS<15, or MlxError::DeviceNotFound test paths).

§Use case

Caller holds an MlxBuffer not previously registered (e.g. from a pool, slice_view, or external interop) and wants the GPU pages hinted as resident before the stage’s first dispatch. MlxDevice::alloc_buffer already auto-registers — this method is the explicit hook for the residual cases.

Source

pub fn remove_from_residency_set(&self, buffer: &MlxBuffer) -> bool

Remove a buffer from the device-level residency set.

Mirror of Self::add_to_residency_set. Stages a deferred removeAllocation: that flushes at the next commit boundary. Returns false and no-ops when no residency set is active.

§F2 caveat

Removing a buffer that the in-flight CB still references is the iter58b residency-rescission class. Under retained-refs (default), the CB’s ARC retain keeps the underlying Metal page alive; the residency-set demotion only affects the resident-hint (a perf knob, not a safety knob). Under MLX_UNRETAINED_REFS=1 (NOT enabled in Phase 0b), the caller-owned arena contract is the only structural mitigation.

Source

pub fn is_drained(&self) -> bool

Whether the session has been committed (any commit path).

Test-and-introspection helper. Production code should use the explicit reset_for_next_stage cycle to chain stages rather than polling this field.

Source

pub fn is_fence_pending(&self) -> bool

Whether a fence is pending (most recent commit was fence_stage).

Test-and-introspection helper for verifying the multi-stage state machine. Cleared by the next reset_for_next_stage / commit_stage / commit_and_wait.

Source

pub fn fence_value(&self) -> u64

The current monotonic fence value.

Returns 0 before the first fence_stage; otherwise returns the most recently signaled value. Mirrors the semantics of ggml_metal_event::value — a fence at value N means signal N is in flight (or completed) and any subsequent waiters at N will be unblocked.

Source

pub fn has_event(&self) -> bool

Whether a MTLSharedEvent has been allocated in this session.

Returns false until the first fence_stage; true afterwards. Test helper for verifying lazy-allocation behavior.

Source

pub fn wait_value(&self) -> u64

The most recent value passed to encode_wait_for_event inside Self::reset_for_next_stage.

Returns 0 until the first reset_for_next_stage actually emits a wait (i.e. the prior commit was Self::fence_stage, not Self::commit_stage / Self::commit_and_wait). After a fence_stage(N) followed by reset_for_next_stage(), this MUST equal N — the wait-side scoreboard mirrors the signal-side Self::fence_value.

iter90b §2 H1b proof helper: makes the wait-event encoding observable from a Rust test without xctrace.

§Risk register

Pure read-only introspection. Reads a u64 field updated under &mut self exclusively (no concurrent mutation possible — EncoderSession is !Sync). Does NOT widen F1/F2/F11/F12.

Source

pub fn wait_count(&self) -> u64

Cumulative count of encode_wait_for_event calls actually emitted inside Self::reset_for_next_stage in this session.

Bumped exactly once per reset_for_next_stage call that finds fence_pending == true — i.e. once per “fence + reset” pair. commit_stage / commit_and_wait followed by reset_for_next_stage does NOT bump this (no wait emitted — Metal queue FIFO is the implicit ordering primitive in that case).

For an N-stage chain (N fences + (N-1) resets), this returns N - 1 after the last reset. The Nth (terminal) fence is drained by the caller via metal_command_buffer().wait_until_completed() or by a subsequent commit_and_wait, neither of which emits an additional wait.

iter90b §2 H1b proof helper: paired with Self::wait_value to make the wait-event side of the multi-stage chain observable.

§Risk register

Same as Self::wait_value — pure read-only introspection over a u64 field updated under &mut self exclusively. Does NOT widen F1/F2/F11/F12.

Source

pub fn metal_command_buffer(&self) -> &CommandBuffer

Borrow the underlying Metal command buffer.

Mirrors CommandEncoder::metal_command_buffer. Used by label-propagation tests and by callers that need to call wait_until_completed after a non-blocking commit_stage / fence_stage.

Trait Implementations§

Source§

impl Drop for EncoderSession

Source§

fn drop(&mut self)

Drain the inner CommandEncoder safely on drop.

§F2 residency-rescission preservation (load-bearing)

Drop scenarios across the multi-stage state machine:

  1. Drained (no fence)commit_stage / commit_and_wait already ran. inner.flush_residency_pending() was already called; the GPU has the CB (and may already have completed it under commit_and_wait). CommandEncoder::Drop runs and calls end_active_encoder(), which is a no-op because commit* already ended the encoder. Safe.

  2. Fenced (fence pending)fence_stage already ran. The signal-event has been encoded onto the prior CB and the CB has been submitted non-blocking. The session never opened a new CB (no reset_for_next_stage call), so cmd_buf still points at the FENCED CB. CommandEncoder::Drop runs and end_active_encoder is a no-op (encoder was ended inside fence_signal_and_commit). The submitted CB executes on the GPU normally — the signal lands, the value is observable to any external waitUntilSignaledValue: consumer (none in iter89e2-B), and the next allocation/CB on the same residency set will see the bumped pending flag flushed at its commit boundary. The fence event itself is dropped with event (an Option); ARC drop releases it.

  3. Encoding (uncommitted) — caller created the session, optionally encoded dispatches, then dropped without calling any commit_*. CommandEncoder::Drop ends the active compute encoder cleanly (encoder.rs:2057-2063). The cmd_buf is dropped without ever being committed — Metal discards the encoded work. No residency-remove is staged because no buffers were registered as freed during this session (the F2 race requires a buffer drop staging a remove that a later flush_pending commits before the in-flight CB finishes; here no commit ever happens). The residency-set’s pending state persists into the next encoder; correct.

  4. Empty — no dispatches encoded. active_encoder is null; CommandEncoder::Drop’s end_active_encoder is a no-op. Safe.

We deliberately do NOT call wait_until_completed here for the committed-but-not-waited case (scenarios 1 with commit_stage or 2 with fence_stage). Under retained-refs mode (default — MLX_UNRETAINED_REFS=0), the in-flight CB holds ARC retains on every bound buffer, so the GPU completes safely after the session drops. Under MLX_UNRETAINED_REFS=1 (NOT enabled in Phase 0b), the caller-owned-arena contract is the only structural mitigation — same as the existing async-commit() path at encoder.rs:2014-2022.

In short: Drop does no extra work; the inner CommandEncoder’s own Drop is the entire safety story. metal::SharedEvent drops via its foreign_obj_type! ARC release.

Source§

fn pin_drop(self: Pin<&mut Self>)

🔬This is a nightly-only experimental API. (pin_ergonomics)
Execute the destructor for this type, but different to Drop::drop, it requires self to be pinned. Read more
Source§

impl Send for EncoderSession

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.