Skip to main content

Module quantized_phi

candle_transformers::models

Module quantized_phi

Expand description

Phi2 model implementation with quantization support.

Phi2 is a 2.7B parameter language model using scaled-up Transformer decoder architecture. This implementation provides quantization for reduced memory and compute usage.

Key characteristics:

Partial attention with learned mixing to reduce quadratic costs
Layer reuse for improved inference efficiency
Linear transformations with scalar mixing
Rotary positional embeddings (RoPE)
Support for 8-bit quantization

References:

Structs§

ModelWeights

Constants§

MAX_SEQ_LEN