Struct QuantizedLinear

Source

pub struct QuantizedLinear { /* private fields */ }

Expand description

A linear layer with quantized base weights and trainable LoRA adapters.

§Dequantization Modes

On-the-fly (default): Dequantizes during each forward pass, saves memory.
Cached (opt-in via cache_dequantized): Dequantizes once, faster inference.

For training, always use on-the-fly mode (default) to save memory. For inference, consider enabling caching for ~30% speedup.

Implementations§

Source §

impl QuantizedLinear

Source

pub fn from_weight( weight: &Tensor, bias: Option<Tensor>, config: &QLoraConfig, device: &Device, ) -> Result<QuantizedLinear, QLoraError>

Create a new quantized linear layer from existing weights.

Uses on-the-fly dequantization by default (memory-optimal). Set config.cache_dequantized = true for inference speedup.

§Arguments

weight - Full-precision weight tensor to quantize
bias - Optional bias tensor (kept in full precision)
config - QLoRA configuration
device - Device for computation

§Errors

Returns error if weight tensor has invalid shape or quantization fails

Source

pub fn from_weight_with_varbuilder( weight: &Tensor, bias: Option<Tensor>, config: &QLoraConfig, vb: VarBuilderArgs<'_, Box<dyn SimpleBackend + '_>>, ) -> Result<QuantizedLinear, QLoraError>

Create a quantized linear layer with trainable LoRA weights registered via VarBuilder.

This constructor ensures LoRA A/B weights are tracked for gradient computation. Use this for training; use from_weight for inference.

§Arguments

weight - Full-precision weight tensor to quantize
bias - Optional bias tensor (kept in full precision)
config - QLoRA configuration
vb - VarBuilder backed by VarMap for gradient tracking

§Errors

Returns error if weight tensor has invalid shape or quantization fails

Source

pub fn new( in_features: usize, out_features: usize, config: &QLoraConfig, device: &Device, ) -> Result<QuantizedLinear, QLoraError>

Create a new quantized linear layer with zero-initialized quantized weights.

Primarily for testing; use from_weight for actual models.

§Errors

Returns error if tensor creation or quantization fails

Source

pub fn forward(&self, input: &Tensor) -> Result<Tensor, QLoraError>

Forward pass through the quantized linear layer.

Computes: output = x @ W_q^T + x @ (B @ A)^T * scaling + bias

Uses on-the-fly dequantization unless cache_dequantized was enabled.

§Errors

Returns error if tensor operations fail

Source

pub fn enable_weight_caching(&mut self) -> Result<(), QLoraError>

Enable weight caching for faster inference.

Call this after loading a trained model for inference. Not recommended for training (wastes memory).

§Errors

Returns error if dequantization fails.

Source

pub fn disable_weight_caching(&mut self)

Disable weight caching to save memory.

Source

pub fn is_weight_cached(&self) -> bool

Check if weight caching is enabled.

Source

pub fn config(&self) -> &QLoraConfig

Get the QLoRA configuration used to create this layer.

Source

pub fn lora(&self) -> &LoraLayer

Get the LoRA adapter.

Source

pub fn lora_mut(&mut self) -> &mut LoraLayer

Get mutable access to the LoRA adapter.

Source

pub fn lora_weights(&self) -> (&Tensor, &Tensor)

Get the LoRA A and B weight tensors.

Returns (lora_a, lora_b) where:

lora_a has shape [r, in_features]
lora_b has shape [out_features, r]

Source

pub fn num_trainable_parameters(&self) -> usize

Get the number of trainable parameters (LoRA only).

Source

pub fn memory_bytes(&self) -> usize

Get total memory usage in bytes.

Auto Trait Implementations§

§

impl !UnwindSafe for QuantizedLinear

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T> Instrument for T

Source §

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

Source §

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §