Expand description
GPU Inference Showcase with PMAT verification (PAR-040)
Benchmark harness for Qwen2.5-Coder showcase demonstrating >2x performance:
- trueno GPU PTX generation (persistent kernels, megakernels)
- trueno SIMD (AVX2/AVX-512/NEON)
- trueno-zram KV cache compression
- renacer GPU kernel profiling GPU Inference Showcase Module (PAR-040)
PMAT-verified benchmark harness for Qwen2.5-Coder showcase. Delivers >2x performance vs competitors via:
- trueno GPU PTX generation (persistent kernels, megakernels)
- trueno SIMD (AVX2/AVX-512/NEON)
- trueno-zram KV cache compression
- renacer GPU kernel profiling
§Performance Targets (Point 41)
| Engine | Target | Mechanism |
|---|---|---|
| APR GGUF | >2x llama.cpp | Phase 2 GPU optimizations |
| APR .apr | >2x Ollama | Native format + ZRAM |
§Usage
# Run full showcase benchmark
cargo run --example showcase_benchmark --features cuda
# Run with profiling
renacer trace -- cargo run --example showcase_benchmark --features cudaModules§
- profiler
- Stub module when renacer is not available
- zram
- Stub module when trueno-zram-core is not available
Structs§
- Benchmark
Result - Single benchmark run result
- Benchmark
Stats - Aggregated benchmark statistics
- Component
Timing - Component timing for profiling
- Pmat
Verification - PMAT verification result
- Profiling
Collector - Profiling collector for GPU kernel analysis
- Showcase
Config - Showcase benchmark configuration
- Showcase
Runner - Main showcase benchmark runner