Skip to main content

Module showcase

Module showcase 

Source
Expand description

GPU Inference Showcase with PMAT verification (PAR-040)

Benchmark harness for Qwen2.5-Coder showcase demonstrating >2x performance:

  • trueno GPU PTX generation (persistent kernels, megakernels)
  • trueno SIMD (AVX2/AVX-512/NEON)
  • trueno-zram KV cache compression
  • renacer GPU kernel profiling GPU Inference Showcase Module (PAR-040)

PMAT-verified benchmark harness for Qwen2.5-Coder showcase. Delivers >2x performance vs competitors via:

  • trueno GPU PTX generation (persistent kernels, megakernels)
  • trueno SIMD (AVX2/AVX-512/NEON)
  • trueno-zram KV cache compression
  • renacer GPU kernel profiling

§Performance Targets (Point 41)

EngineTargetMechanism
APR GGUF>2x llama.cppPhase 2 GPU optimizations
APR .apr>2x OllamaNative format + ZRAM

§Usage

# Run full showcase benchmark
cargo run --example showcase_benchmark --features cuda

# Run with profiling
renacer trace -- cargo run --example showcase_benchmark --features cuda

Modules§

profiler
Stub module when renacer is not available
zram
Stub module when trueno-zram-core is not available

Structs§

BenchmarkResult
Single benchmark run result
BenchmarkStats
Aggregated benchmark statistics
ComponentTiming
Component timing for profiling
PmatVerification
PMAT verification result
ProfilingCollector
Profiling collector for GPU kernel analysis
ShowcaseConfig
Showcase benchmark configuration
ShowcaseRunner
Main showcase benchmark runner