pub struct CpuGptqLinear {
pub weight_f32: Vec<f32>,
pub bias: Option<Vec<f32>>,
pub in_features: usize,
pub out_features: usize,
}Expand description
CPU GPTQ Linear: holds dequantized fp32 weights [out_features, in_features]
row-major, optional bias [out_features], dispatches via CpuBackend::gemm.
The dequantization happens once in BackendQuantMarlin::load_gptq —
inference is just a regular f32 GEMM.
Fields§
§weight_f32: Vec<f32>§bias: Option<Vec<f32>>§in_features: usize§out_features: usizeTrait Implementations§
Source§impl Linear<CpuBackend> for CpuGptqLinear
impl Linear<CpuBackend> for CpuGptqLinear
fn in_features(&self) -> usize
fn out_features(&self) -> usize
Source§fn forward(
&self,
ctx: &mut <CpuBackend as Backend>::Context,
input: &<CpuBackend as Backend>::Buffer,
out: &mut <CpuBackend as Backend>::Buffer,
m: usize,
)
fn forward( &self, ctx: &mut <CpuBackend as Backend>::Context, input: &<CpuBackend as Backend>::Buffer, out: &mut <CpuBackend as Backend>::Buffer, m: usize, )
Append GEMM work onto
ctx. Caller flushes the context when results
must be materialised.Auto Trait Implementations§
impl Freeze for CpuGptqLinear
impl RefUnwindSafe for CpuGptqLinear
impl Send for CpuGptqLinear
impl Sync for CpuGptqLinear
impl Unpin for CpuGptqLinear
impl UnsafeUnpin for CpuGptqLinear
impl UnwindSafe for CpuGptqLinear
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more