pub struct DequantizePerTokenArgs<'a, TIn: Element, TOut: IntElement> {
pub input: TensorRef<'a, TOut, 2>,
pub scale: TensorRef<'a, TIn, 1>,
pub zero_point: TensorRef<'a, i32, 1>,
pub output: TensorMut<'a, TIn, 2>,
}Expand description
Args bundle for the dequant-per-token launch.
The TIn / TOut type parameters mirror the FW plan to keep the
type vocabulary consistent: TIn is the FP type the FW consumed
(and the BW + dequant produce), TOut is the int storage type the
FW produced (and the dequant consumes).
Fields§
§input: TensorRef<'a, TOut, 2>Quantized input [N, D] in int.
scale: TensorRef<'a, TIn, 1>Per-row scale [N] in FP.
zero_point: TensorRef<'a, i32, 1>Per-row zero-point [N] in i32.
output: TensorMut<'a, TIn, 2>Output [N, D] in FP.
Auto Trait Implementations§
impl<'a, TIn, TOut> !UnwindSafe for DequantizePerTokenArgs<'a, TIn, TOut>
impl<'a, TIn, TOut> Freeze for DequantizePerTokenArgs<'a, TIn, TOut>
impl<'a, TIn, TOut> RefUnwindSafe for DequantizePerTokenArgs<'a, TIn, TOut>where
TOut: RefUnwindSafe,
TIn: RefUnwindSafe,
impl<'a, TIn, TOut> Send for DequantizePerTokenArgs<'a, TIn, TOut>
impl<'a, TIn, TOut> Sync for DequantizePerTokenArgs<'a, TIn, TOut>
impl<'a, TIn, TOut> Unpin for DequantizePerTokenArgs<'a, TIn, TOut>
impl<'a, TIn, TOut> UnsafeUnpin for DequantizePerTokenArgs<'a, TIn, TOut>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more