pub struct Linear4Bit {
pub weight_packed: Tensor,
pub scales: Tensor,
pub bias: Option<Tensor>,
pub group_size: usize,
pub in_features: usize,
pub out_features: usize,
}Expand description
4-bit quantized linear layer.
4ビット量子化された線形層。 グループごとのスケーリングにより精度を保ちつつメモリ使用量を削減します。
Stores weights in 4-bit format with group-wise scaling to maintain accuracy while reducing memory usage.
Fields§
§weight_packed: TensorPacked weights: [out_dim, in_dim/2] as U8 各バイトに2つの4ビット重みを格納
scales: TensorPer-group scales: [out_dim, n_groups] as F16 グループごとのスケール係数
bias: Option<Tensor>Optional bias: [out_dim]
group_size: usizeGroup size for quantization (typically 64 or 128)
in_features: usizeInput feature dimension
out_features: usizeOutput feature dimension
Implementations§
Source§impl Linear4Bit
impl Linear4Bit
Sourcepub fn new(
weight_packed: Tensor,
scales: Tensor,
bias: Option<Tensor>,
group_size: usize,
in_features: usize,
out_features: usize,
) -> Result<Self>
pub fn new( weight_packed: Tensor, scales: Tensor, bias: Option<Tensor>, group_size: usize, in_features: usize, out_features: usize, ) -> Result<Self>
Create a new Linear4Bit layer with pre-computed packed weights.
事前計算されたパック済み重みで新しいLinear4Bit層を作成。
§Arguments / 引数
weight_packed: Packed weights[out_dim, in_dim/2]as U8scales: Per-group scales[out_dim, n_groups]as F16bias: Optional bias[out_dim]group_size: Size of each quantization groupin_features: Input dimensionout_features: Output dimension
Sourcepub fn load_4bit(vb: &VarBuilder<'_>, prefix: &str) -> Result<Self>
pub fn load_4bit(vb: &VarBuilder<'_>, prefix: &str) -> Result<Self>
Load 4-bit quantized weights from VarBuilder.
VarBuilderから4ビット量子化重みをロード。
§Arguments / 引数
vb: VarBuilder for loading weightsprefix: Prefix for weight names (e.g., “layers.0.mlp.gate_proj”)
§Expected files / 期待するファイル
{prefix}.weight_4bit: Packed weights[out_dim, in_dim/2]as U8{prefix}.scales_4bit: Per-group scales[out_dim, n_groups]as F16{prefix}.bias(optional): Bias[out_dim]
Sourcepub fn load_direct(
tensors: &HashMap<String, Tensor>,
prefix: &str,
in_dim: usize,
out_dim: usize,
_group_size: usize,
_symmetric: bool,
device: &Device,
) -> Result<Self>
pub fn load_direct( tensors: &HashMap<String, Tensor>, prefix: &str, in_dim: usize, out_dim: usize, _group_size: usize, _symmetric: bool, device: &Device, ) -> Result<Self>
Load from pre-loaded tensor HashMap (bypasses VarBuilder).
事前ロードしたテンソルHashMapから直接ロードします(VarBuilderをバイパス)。
§Arguments / 引数
tensors: Pre-loaded tensor HashMapprefix: Prefix for weight names (e.g., “layers.0.mlp.gate_proj”)in_dim: Input dimensionout_dim: Output dimensiongroup_size: Group size for quantization (use 128 as default)_symmetric: Whether quantization is symmetric (currently unused)device: Target device
Sourcepub fn forward_unpack(&self, input: &Tensor) -> Result<Tensor>
pub fn forward_unpack(&self, input: &Tensor) -> Result<Tensor>
Forward pass with weight unpacking (legacy path).
This method unpacks weights to FP32 before matmul.
Use forward() for better performance with fused GEMM.
Trait Implementations§
Source§impl Clone for Linear4Bit
impl Clone for Linear4Bit
Source§fn clone(&self) -> Linear4Bit
fn clone(&self) -> Linear4Bit
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreAuto Trait Implementations§
impl Freeze for Linear4Bit
impl !RefUnwindSafe for Linear4Bit
impl Send for Linear4Bit
impl Sync for Linear4Bit
impl Unpin for Linear4Bit
impl !UnwindSafe for Linear4Bit
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more