pub struct EncoderSession { /* private fields */ }Expand description
Session-level wrapper around a CommandEncoder for one or more
logical transformer stages.
See module docs for lifecycle and fence preservation. iter89e2-B scope:
multi-stage chaining via MTLSharedEvent, residency
delegation surface, and the matching test cohort. Phase 0b-C will
broaden label propagation; Phase 2+ will wire this struct into the
production forward path.
§Thread safety
EncoderSession is Send because CommandEncoder is Send (the
existing unsafe impl at encoder.rs:613-619), String/u64/bool
are Send, metal::Device is Send + Sync (foreign_obj_type! at
metal-rs 0.33 lib.rs:179), and metal::SharedEvent is Send + Sync
for the same reason. It is NOT Sync — exclusive ownership during
dispatch encoding is the same contract as the inner CommandEncoder.
Implementations§
Source§impl EncoderSession
impl EncoderSession
Sourcepub fn env_enabled() -> bool
pub fn env_enabled() -> bool
Whether HF2Q_ENCODER_SESSION=1 is set in the process environment.
Public introspection helper for hf2q-side dispatch wrappers that
need to choose between the legacy command_encoder() path and the
new encoder_session() path. Cached on first read via OnceLock
so the per-call cost is a single atomic load.
Sourcepub fn begin_stage(&mut self, label: &str)
pub fn begin_stage(&mut self, label: &str)
Set the semantic stage label.
The label propagates to MTLCommandBuffer.label and (when an
encoder is active) MTLComputeCommandEncoder.label at the next
commit_stage / commit_and_wait / fence_stage call, enabling
xctrace MST attribution per ADR-015 iter16. Calling begin_stage
does NOT itself touch any Metal object — it only stores the
string.
Idempotent: calling begin_stage multiple times before commit
overwrites the previous label with the latest value, matching
the existing apply_labels semantic at encoder.rs:1980-1985
(the last_label field is overwritten on every labeled commit).
Sourcepub fn encoder(&mut self) -> &mut CommandEncoder
pub fn encoder(&mut self) -> &mut CommandEncoder
Borrow the inner CommandEncoder for dispatch encoding.
All dispatch APIs (encode, encode_threadgroups,
encode_with_args, dispatch_tracked_*, memory_barrier,
start_capture / take_capture, etc.) live on
CommandEncoder; EncoderSession adds a stage-aware commit
surface on top of them. Use this accessor inside the dispatch
loop, then call one of Self::commit_stage /
Self::commit_and_wait / Self::fence_stage at the stage
boundary.
§Caller contract
Do NOT call inner.commit* methods directly through this
borrow. Use the session’s commit surface so the stage label
propagates and the drained-latch / fence state stay consistent.
Calling the inner commit bypasses these — it is not unsafe (no
UB risk) but it makes the session state inconsistent with what
it has actually committed.
Sourcepub fn commit_stage(&mut self) -> Result<()>
pub fn commit_stage(&mut self) -> Result<()>
Commit the stage’s command buffer non-blocking (no fence).
Delegates to CommandEncoder::commit_labeled (when a label is
set) or CommandEncoder::commit (when not). Both end the
persistent compute encoder, flush the residency-set pending
staging (flush_residency_pending at encoder.rs:2004), and
hand the CB to the GPU without blocking the CPU.
The session enters the Drained state. To chain into another
stage on the same session, call Self::reset_for_next_stage
— that opens a fresh CB and (if a fence was pending from a prior
fence_stage) encodes the matching wait. After
commit_stage (no fence), reset_for_next_stage does NOT emit
a wait — the CBs are merely sequenced by the Metal queue’s FIFO
dispatch order.
§Errors
Returns Ok(()) unconditionally — CommandEncoder::commit and
CommandEncoder::commit_labeled are infallible (they hand the
CB to Metal without waiting for completion; errors surface only
at wait_until_completed). The Result is preserved for
symmetry with Self::commit_and_wait and for future-proofing.
Sourcepub fn commit_and_wait(&mut self) -> Result<()>
pub fn commit_and_wait(&mut self) -> Result<()>
Commit the stage’s command buffer and block until GPU completion.
Delegates to CommandEncoder::commit_and_wait_labeled (when a
label is set) or CommandEncoder::commit_and_wait (when not).
Required at K-batch boundaries (F7) and at output-head CPU reads
(F6). Increments SYNC_COUNT exactly once per call (matches
encoder.rs:1845).
The session enters the Drained state with NO fence pending —
blocking commit fully drains the GPU, so the next stage (after
Self::reset_for_next_stage) needs no wait-event.
§Errors
Returns MlxError::CommandBufferError if the GPU reports an
error after wait — propagated from CommandEncoder.
ADR-015 iter94 Task #2 — fail-loud contract. iter93 final-report
§“Root-cause hypothesis” point 5 noted that under
MLX_UNRETAINED_REFS=1 + HF2Q_ENCODER_SESSION=1 + K>1, the
session appeared to silently absorb a MTLCommandBufferStatus:: Error and produce deterministic-but-wrong tokens. By code
reading, the tail-expression self.inner.commit_and_wait()
already returns the inner error (commit_and_wait at
encoder.rs:1852 explicitly matches on cmd_buf.status()). This
re-shape converts the implicit propagation into an explicit ?
chain so future maintainers cannot accidentally swallow the
error by inserting a let _ = inner.commit_and_wait(); or
adding fall-through logic between the inner call and the
function return. Latched drained = true happens BEFORE the
inner call so a panicking unwind through Drop sees the same
drained-state contract.
Sourcepub fn fence_stage(&mut self, label: Option<&str>) -> Result<()>
pub fn fence_stage(&mut self, label: Option<&str>) -> Result<()>
Encode a stage-fence signal on the current CB and commit non-blocking.
This is the D3 multi-stage building block: the prior stage’s
final CB-level op is encodeSignalEvent:value:value+1, where
value+1 is then both stored in event_value (so the next
stage’s encodeWaitForEvent:value: blocks on it) and committed.
The session enters the Fenced (drained-with-fence-pending)
state; Self::reset_for_next_stage rotates the inner CB and
emits the matching wait.
§Lazy event allocation
On the first call, allocates the per-session
MTLSharedEvent via
metal::DeviceRef::new_shared_event
(/Users/robert/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/metal-0.33.0/src/device.rs:2063).
Subsequent calls reuse the same event — the monotonic
event_value carries the per-fence identity. This matches the
llama.cpp pattern at
/opt/llama.cpp/ggml/src/ggml-metal/ggml-metal-device.m:944-958.
§Label
label’s Some(value) arm overwrites stage_label and
propagates via commit_labeled’s apply_labels chain — same as
calling Self::begin_stage before this. None keeps any
previously-set begin_stage label intact.
§Counter semantics
Bumps SYNC_COUNT zero times (non-blocking). Bumps
CMD_BUF_COUNT zero times (no new CB allocated here —
reset_for_next_stage does that). Increments event_value by
exactly 1.
§Errors
Returns Ok(()) unconditionally for the same reason
Self::commit_stage does.
Sourcepub fn reset_for_next_stage(&mut self) -> Result<()>
pub fn reset_for_next_stage(&mut self) -> Result<()>
Open a fresh command buffer on the same queue and (when a fence is pending) encode the matching wait on the new CB.
This is the second half of the multi-stage chaining primitive.
After Self::fence_stage (or Self::commit_stage /
Self::commit_and_wait) has put the session in the Drained
state, callers invoke this to start the next stage’s CB. The
session transitions back to Encoding (no CB or compute encoder
open until the next dispatch lazy-opens them).
§Wait-event encoding
If Self::fence_stage was the most recent commit, this
method encodes encodeWaitForEvent:value:event_value on the
freshly-allocated CB before returning. The new CB’s GPU work
blocks until the prior CB’s signal lands at the same value.
After Self::commit_stage / Self::commit_and_wait (no
fence), no wait is encoded — Metal’s queue-FIFO sequencing is
the implicit ordering primitive.
§State machine
| Before | After |
|---|---|
| Drained (no fence) | Encoding (new CB, no wait) |
| Fenced (fence pending) | Encoding (new CB, wait encoded) |
| Encoding (not drained) | no-op (returns Ok) |
The not-drained case is intentionally a no-op rather than a panic: it keeps the session drop-safe under unusual call sequences (e.g. test scaffolding that calls reset speculatively).
§Counter semantics
Bumps CMD_BUF_COUNT by exactly 1 (the new CB). Does NOT bump
SYNC_COUNT (no commit/wait happens here).
§Errors
Returns Ok(()) unconditionally. Future error paths (e.g.
queue-side allocation failure on new_command_buffer) would
surface here.
Sourcepub fn add_to_residency_set(&self, buffer: &MlxBuffer) -> bool
pub fn add_to_residency_set(&self, buffer: &MlxBuffer) -> bool
Add a buffer to the device-level residency set.
Delegates to the inner encoder’s [ResidencySet::add_allocation]
(the same Arc clone the device, the encoder, and every other
concurrent encoder shares — single-set invariant per ADR-019:467).
The actual [set commit] is deferred until the next
commit_stage / commit_and_wait / fence_stage, which all
route through flush_residency_pending.
Returns false and is a no-op when the device booted without
a residency set (HF2Q_NO_RESIDENCY=1, macOS<15, or
MlxError::DeviceNotFound test paths).
§Use case
Caller holds an MlxBuffer not previously registered (e.g.
from a pool, slice_view, or external interop) and wants the GPU
pages hinted as resident before the stage’s first dispatch.
MlxDevice::alloc_buffer already auto-registers — this method
is the explicit hook for the residual cases.
Sourcepub fn remove_from_residency_set(&self, buffer: &MlxBuffer) -> bool
pub fn remove_from_residency_set(&self, buffer: &MlxBuffer) -> bool
Remove a buffer from the device-level residency set.
Mirror of Self::add_to_residency_set. Stages a deferred
removeAllocation: that flushes at the next commit boundary.
Returns false and no-ops when no residency set is active.
§F2 caveat
Removing a buffer that the in-flight CB still references is the
iter58b residency-rescission class. Under retained-refs
(default), the CB’s ARC retain keeps the underlying Metal page
alive; the residency-set demotion only affects the resident-hint
(a perf knob, not a safety knob). Under MLX_UNRETAINED_REFS=1
(NOT enabled in Phase 0b), the caller-owned arena contract is
the only structural mitigation.
Sourcepub fn is_drained(&self) -> bool
pub fn is_drained(&self) -> bool
Whether the session has been committed (any commit path).
Test-and-introspection helper. Production code should use the
explicit reset_for_next_stage cycle to chain stages rather
than polling this field.
Sourcepub fn is_fence_pending(&self) -> bool
pub fn is_fence_pending(&self) -> bool
Whether a fence is pending (most recent commit was fence_stage).
Test-and-introspection helper for verifying the multi-stage
state machine. Cleared by the next reset_for_next_stage /
commit_stage / commit_and_wait.
Sourcepub fn fence_value(&self) -> u64
pub fn fence_value(&self) -> u64
The current monotonic fence value.
Returns 0 before the first fence_stage; otherwise returns the
most recently signaled value. Mirrors the semantics of
ggml_metal_event::value — a fence at value N means signal
N is in flight (or completed) and any subsequent waiters at
N will be unblocked.
Sourcepub fn has_event(&self) -> bool
pub fn has_event(&self) -> bool
Whether a MTLSharedEvent has been allocated
in this session.
Returns false until the first fence_stage; true afterwards.
Test helper for verifying lazy-allocation behavior.
Sourcepub fn wait_value(&self) -> u64
pub fn wait_value(&self) -> u64
The most recent value passed to encode_wait_for_event inside
Self::reset_for_next_stage.
Returns 0 until the first reset_for_next_stage actually emits
a wait (i.e. the prior commit was Self::fence_stage, not
Self::commit_stage / Self::commit_and_wait). After a
fence_stage(N) followed by reset_for_next_stage(), this MUST
equal N — the wait-side scoreboard mirrors the signal-side
Self::fence_value.
iter90b §2 H1b proof helper: makes the wait-event encoding observable from a Rust test without xctrace.
§Risk register
Pure read-only introspection. Reads a u64 field updated under
&mut self exclusively (no concurrent mutation possible —
EncoderSession is !Sync). Does NOT widen F1/F2/F11/F12.
Sourcepub fn wait_count(&self) -> u64
pub fn wait_count(&self) -> u64
Cumulative count of encode_wait_for_event calls actually
emitted inside Self::reset_for_next_stage in this session.
Bumped exactly once per reset_for_next_stage call that finds
fence_pending == true — i.e. once per “fence + reset” pair.
commit_stage / commit_and_wait followed by
reset_for_next_stage does NOT bump this (no wait emitted —
Metal queue FIFO is the implicit ordering primitive in that
case).
For an N-stage chain (N fences + (N-1) resets), this returns
N - 1 after the last reset. The Nth (terminal) fence is
drained by the caller via metal_command_buffer().wait_until_completed()
or by a subsequent commit_and_wait, neither of which emits an
additional wait.
iter90b §2 H1b proof helper: paired with Self::wait_value to
make the wait-event side of the multi-stage chain observable.
§Risk register
Same as Self::wait_value — pure read-only introspection over
a u64 field updated under &mut self exclusively. Does NOT
widen F1/F2/F11/F12.
Sourcepub fn metal_command_buffer(&self) -> &CommandBuffer
pub fn metal_command_buffer(&self) -> &CommandBuffer
Borrow the underlying Metal command buffer.
Mirrors CommandEncoder::metal_command_buffer. Used by
label-propagation tests and by callers that need to call
wait_until_completed after a non-blocking commit_stage /
fence_stage.
Trait Implementations§
Source§impl Drop for EncoderSession
impl Drop for EncoderSession
Source§fn drop(&mut self)
fn drop(&mut self)
Drain the inner CommandEncoder safely on drop.
§F2 residency-rescission preservation (load-bearing)
Drop scenarios across the multi-stage state machine:
-
Drained (no fence) —
commit_stage/commit_and_waitalready ran.inner.flush_residency_pending()was already called; the GPU has the CB (and may already have completed it undercommit_and_wait).CommandEncoder::Dropruns and callsend_active_encoder(), which is a no-op becausecommit*already ended the encoder. Safe. -
Fenced (fence pending) —
fence_stagealready ran. The signal-event has been encoded onto the prior CB and the CB has been submitted non-blocking. The session never opened a new CB (noreset_for_next_stagecall), socmd_bufstill points at the FENCED CB.CommandEncoder::Dropruns and end_active_encoder is a no-op (encoder was ended insidefence_signal_and_commit). The submitted CB executes on the GPU normally — the signal lands, the value is observable to any externalwaitUntilSignaledValue:consumer (none in iter89e2-B), and the next allocation/CB on the same residency set will see the bumped pending flag flushed at its commit boundary. The fence event itself is dropped withevent(an Option); ARC drop releases it. -
Encoding (uncommitted) — caller created the session, optionally encoded dispatches, then dropped without calling any
commit_*.CommandEncoder::Dropends the active compute encoder cleanly (encoder.rs:2057-2063). Thecmd_bufis dropped without ever being committed — Metal discards the encoded work. No residency-remove is staged because no buffers were registered as freed during this session (the F2 race requires a buffer drop staging a remove that a laterflush_pendingcommits before the in-flight CB finishes; here no commit ever happens). The residency-set’s pending state persists into the next encoder; correct. -
Empty — no dispatches encoded.
active_encoderis null;CommandEncoder::Drop’send_active_encoderis a no-op. Safe.
We deliberately do NOT call wait_until_completed here for the
committed-but-not-waited case (scenarios 1 with commit_stage
or 2 with fence_stage). Under retained-refs mode (default —
MLX_UNRETAINED_REFS=0), the in-flight CB holds ARC retains on
every bound buffer, so the GPU completes safely after the
session drops. Under MLX_UNRETAINED_REFS=1 (NOT enabled in
Phase 0b), the caller-owned-arena contract is the only
structural mitigation — same as the existing async-commit()
path at encoder.rs:2014-2022.
In short: Drop does no extra work; the inner CommandEncoder’s
own Drop is the entire safety story. metal::SharedEvent drops
via its foreign_obj_type! ARC release.