Skip to main content

Module quantized_phi

Module quantized_phi 

Source
Expand description

Phi2 model implementation with quantization support.

Phi2 is a 2.7B parameter language model using scaled-up Transformer decoder architecture. This implementation provides quantization for reduced memory and compute usage.

Key characteristics:

  • Partial attention with learned mixing to reduce quadratic costs
  • Layer reuse for improved inference efficiency
  • Linear transformations with scalar mixing
  • Rotary positional embeddings (RoPE)
  • Support for 8-bit quantization

References:

Structs§

ModelWeights

Constants§

MAX_SEQ_LEN