1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
//! CPU inference implementation for OwnedQuantizedModel
//!
//! This module contains the forward pass and generation logic
//! extracted from the monolith for better testability.
//!
//! ## Submodules
//!
//! - `forward/`: Forward pass methods (forward, forward_cached, forward_batch)
//! - `attention.rs`: Attention computation (apply_rope, causal_attention)
//! - `matmul.rs`: Quantized matrix operations (fused_matmul, qkv_matmul)
//! - `generation.rs`: Token generation (generate, sample_topk)
//! - `cached.rs`: Cached model wrappers for GPU inference
pub
// Re-export cached model types for external use
pub use ;
// PMAT-395: Re-export encoder-decoder types
pub use EncoderOutput;
// Re-export impl extension for OwnedQuantizedModel
// The actual impl blocks are in each submodule