pub fn create_isq_thread_pool(ty: Option<IsqType>) -> (ThreadPool, usize)Expand description
Create a rayon thread pool for parallel immediate ISQ.
Returns (pool, num_threads) so callers can log the thread count.
Thread count is based on the quantization type:
- GGML types (Q2K-Q8K) and F8E4M3:
rayon::current_num_threads()(CPU quantization) - HQQ/AFQ: 1 thread (GPU quantization, serialized by
QuantizeOntoGuard)