pub struct MoeRouting<B: Backend + ?Sized> {
pub sorted_token_ids: B::Buffer,
pub expert_ids: B::Buffer,
pub num_tokens_past_padded: B::Buffer,
}Expand description
Routing buffers consumed by moe_gemm_phase_vllm — held by the
caller across phase 1 and phase 3 of one MoE forward. All three
fields are i32 device tensors in disguise (Self::Buffer = fp16 on
CUDA; the backend reinterprets the underlying device pointer).
Fields§
§sorted_token_ids: B::Buffer§expert_ids: B::Buffer§num_tokens_past_padded: B::BufferAuto Trait Implementations§
impl<B> Freeze for MoeRouting<B>
impl<B> RefUnwindSafe for MoeRouting<B>
impl<B> Send for MoeRouting<B>
impl<B> Sync for MoeRouting<B>
impl<B> Unpin for MoeRouting<B>
impl<B> UnsafeUnpin for MoeRouting<B>
impl<B> UnwindSafe for MoeRouting<B>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more