pub struct GptqLinear<B: Backend> { /* private fields */ }Implementations§
Source§impl<B: Backend> GptqLinear<B>
impl<B: Backend> GptqLinear<B>
Sourcepub fn from_raw(
qweight: &[i32],
scales: &[f32],
qzeros: &[i32],
g_idx: Option<&[i32]>,
bits: u32,
group_size: usize,
in_features: usize,
out_features: usize,
) -> Result<Self>
pub fn from_raw( qweight: &[i32], scales: &[f32], qzeros: &[i32], g_idx: Option<&[i32]>, bits: u32, group_size: usize, in_features: usize, out_features: usize, ) -> Result<Self>
Build from raw host-side GPTQ tensors. The Backend repacks into its preferred format once; inference uses the repacked store.
qweight: [k/8, n] i32 (packed int4)
scales: [k/group_size, n] f32 (converted from f16 by caller)
qzeros: [k/group_size, n/8] i32
g_idx: [k] i32 — optional, only used for desc_act=true
Sourcepub fn from_store(
store: B::GptqStore,
in_features: usize,
out_features: usize,
) -> Self
pub fn from_store( store: B::GptqStore, in_features: usize, out_features: usize, ) -> Self
Construct directly from a pre-built backend store (e.g. tests).
Sourcepub fn with_bias(self, bias: &[f32]) -> Self
pub fn with_bias(self, bias: &[f32]) -> Self
Attach a bias vector ([out_features] f32 on host, uploaded to backend).
Qwen2.5 / Llama-with-bias variants require this.
pub fn store(&self) -> &B::GptqStore
Trait Implementations§
Source§impl<B: Backend> Linear<B> for GptqLinear<B>
impl<B: Backend> Linear<B> for GptqLinear<B>
Auto Trait Implementations§
impl<B> Freeze for GptqLinear<B>
impl<B> RefUnwindSafe for GptqLinear<B>
impl<B> Send for GptqLinear<B>
impl<B> Sync for GptqLinear<B>
impl<B> Unpin for GptqLinear<B>
impl<B> UnsafeUnpin for GptqLinear<B>
impl<B> UnwindSafe for GptqLinear<B>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more