pub struct GptqLinear<B: Backend + BackendQuantMarlin> { /* private fields */ }Expand description
GPTQ-format Linear projection, polymorphic over backend.
Holds a boxed backend-specific Linear<B> produced by B::load_gptq.
forward() delegates straight through.
Implementations§
Source§impl<B: Backend + BackendQuantMarlin> GptqLinear<B>
impl<B: Backend + BackendQuantMarlin> GptqLinear<B>
Sourcepub fn from_raw(
qweight: &[i32],
scales: &[f32],
qzeros: &[i32],
g_idx: Option<&[i32]>,
bias: Option<&[f32]>,
bits: u32,
group_size: usize,
in_features: usize,
out_features: usize,
) -> Result<Self>
pub fn from_raw( qweight: &[i32], scales: &[f32], qzeros: &[i32], g_idx: Option<&[i32]>, bias: Option<&[f32]>, bits: u32, group_size: usize, in_features: usize, out_features: usize, ) -> Result<Self>
Build from raw host-side GPTQ tensors. The Backend repacks into its preferred format once (Marlin tiles on CUDA, dequant on CPU) and returns a boxed Linear; inference uses the boxed forward.
qweight: [k/8, n] i32 (packed int4)
scales: [k/group_size, n] f32 (converted from f16 by caller)
qzeros: [k/group_size, n/8] i32
g_idx: [k] i32 — optional, only used for desc_act=true
bias: [n] f32 — optional fused bias (Qwen2.5 attention)
Trait Implementations§
Source§impl<B: Backend + BackendQuantMarlin> Linear<B> for GptqLinear<B>
impl<B: Backend + BackendQuantMarlin> Linear<B> for GptqLinear<B>
Auto Trait Implementations§
impl<B> Freeze for GptqLinear<B>
impl<B> !RefUnwindSafe for GptqLinear<B>
impl<B> Send for GptqLinear<B>
impl<B> Sync for GptqLinear<B>
impl<B> Unpin for GptqLinear<B>
impl<B> UnsafeUnpin for GptqLinear<B>
impl<B> !UnwindSafe for GptqLinear<B>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more