pub struct QuantizedLinear { /* private fields */ }Expand description
A linear layer with quantized base weights and trainable LoRA adapters.
§Dequantization Modes
- On-the-fly (default): Dequantizes during each forward pass, saves memory.
- Cached (opt-in via
cache_dequantized): Dequantizes once, faster inference.
For training, always use on-the-fly mode (default) to save memory. For inference, consider enabling caching for ~30% speedup.
Implementations§
Source§impl QuantizedLinear
impl QuantizedLinear
Sourcepub fn from_weight(
weight: &Tensor,
bias: Option<Tensor>,
config: &QLoraConfig,
device: &Device,
) -> Result<QuantizedLinear, QLoraError>
pub fn from_weight( weight: &Tensor, bias: Option<Tensor>, config: &QLoraConfig, device: &Device, ) -> Result<QuantizedLinear, QLoraError>
Create a new quantized linear layer from existing weights.
Uses on-the-fly dequantization by default (memory-optimal).
Set config.cache_dequantized = true for inference speedup.
§Arguments
weight- Full-precision weight tensor to quantizebias- Optional bias tensor (kept in full precision)config-QLoRAconfigurationdevice- Device for computation
§Errors
Returns error if weight tensor has invalid shape or quantization fails
Sourcepub fn from_weight_with_varbuilder(
weight: &Tensor,
bias: Option<Tensor>,
config: &QLoraConfig,
vb: VarBuilderArgs<'_, Box<dyn SimpleBackend + '_>>,
) -> Result<QuantizedLinear, QLoraError>
pub fn from_weight_with_varbuilder( weight: &Tensor, bias: Option<Tensor>, config: &QLoraConfig, vb: VarBuilderArgs<'_, Box<dyn SimpleBackend + '_>>, ) -> Result<QuantizedLinear, QLoraError>
Create a quantized linear layer with trainable LoRA weights registered via VarBuilder.
This constructor ensures LoRA A/B weights are tracked for gradient computation.
Use this for training; use from_weight for inference.
§Arguments
weight- Full-precision weight tensor to quantizebias- Optional bias tensor (kept in full precision)config-QLoRAconfigurationvb-VarBuilderbacked byVarMapfor gradient tracking
§Errors
Returns error if weight tensor has invalid shape or quantization fails
Sourcepub fn new(
in_features: usize,
out_features: usize,
config: &QLoraConfig,
device: &Device,
) -> Result<QuantizedLinear, QLoraError>
pub fn new( in_features: usize, out_features: usize, config: &QLoraConfig, device: &Device, ) -> Result<QuantizedLinear, QLoraError>
Create a new quantized linear layer with zero-initialized quantized weights.
Primarily for testing; use from_weight for actual models.
§Errors
Returns error if tensor creation or quantization fails
Sourcepub fn forward(&self, input: &Tensor) -> Result<Tensor, QLoraError>
pub fn forward(&self, input: &Tensor) -> Result<Tensor, QLoraError>
Forward pass through the quantized linear layer.
Computes: output = x @ W_q^T + x @ (B @ A)^T * scaling + bias
Uses on-the-fly dequantization unless cache_dequantized was enabled.
§Errors
Returns error if tensor operations fail
Sourcepub fn enable_weight_caching(&mut self) -> Result<(), QLoraError>
pub fn enable_weight_caching(&mut self) -> Result<(), QLoraError>
Enable weight caching for faster inference.
Call this after loading a trained model for inference. Not recommended for training (wastes memory).
§Errors
Returns error if dequantization fails.
Sourcepub fn disable_weight_caching(&mut self)
pub fn disable_weight_caching(&mut self)
Disable weight caching to save memory.
Sourcepub fn is_weight_cached(&self) -> bool
pub fn is_weight_cached(&self) -> bool
Check if weight caching is enabled.
Sourcepub fn config(&self) -> &QLoraConfig
pub fn config(&self) -> &QLoraConfig
Get the QLoRA configuration used to create this layer.
Sourcepub fn lora_weights(&self) -> (&Tensor, &Tensor)
pub fn lora_weights(&self) -> (&Tensor, &Tensor)
Get the LoRA A and B weight tensors.
Returns (lora_a, lora_b) where:
lora_ahas shape[r, in_features]lora_bhas shape[out_features, r]
Sourcepub fn num_trainable_parameters(&self) -> usize
pub fn num_trainable_parameters(&self) -> usize
Get the number of trainable parameters (LoRA only).
Sourcepub fn memory_bytes(&self) -> usize
pub fn memory_bytes(&self) -> usize
Get total memory usage in bytes.
Auto Trait Implementations§
impl Freeze for QuantizedLinear
impl !RefUnwindSafe for QuantizedLinear
impl Send for QuantizedLinear
impl Sync for QuantizedLinear
impl Unpin for QuantizedLinear
impl !UnwindSafe for QuantizedLinear
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more