Expand description
Trueno: Multi-Target High-Performance Compute Library
Trueno (Spanish: “thunder”) provides unified, high-performance compute primitives across three execution targets:
- CPU SIMD - x86 (SSE2/AVX/AVX2/AVX-512), ARM (NEON), WASM (SIMD128)
- GPU - Vulkan/Metal/DX12/WebGPU via
wgpu - WebAssembly - Portable SIMD128 for browser/edge deployment
§Design Principles
- Write once, optimize everywhere: Single algorithm, multiple backends
- Runtime dispatch: Auto-select best implementation based on CPU features
- Zero unsafe in public API: Safety via type system,
unsafeisolated in backends - Benchmarked performance: Every optimization must prove ≥10% speedup
- Extreme TDD: >90% test coverage, mutation testing, property-based tests
§Quick Start
use trueno::Vector;
let a = Vector::from_slice(&[1.0, 2.0, 3.0, 4.0]);
let b = Vector::from_slice(&[5.0, 6.0, 7.0, 8.0]);
// Auto-selects best backend (AVX2/GPU/WASM)
let result = a.add(&b).unwrap();
assert_eq!(result.as_slice(), &[6.0, 8.0, 10.0, 12.0]);Re-exports§
pub use eigen::SymmetricEigen;pub use error::Result;pub use error::TruenoError;pub use hash::hash_bytes;pub use hash::hash_key;pub use hash::hash_keys_batch;pub use hash::hash_keys_batch_with_backend;pub use matrix::Matrix;pub use monitor::cuda_monitor_available;pub use monitor::GpuBackend;pub use monitor::GpuClockMetrics;pub use monitor::GpuDeviceInfo;pub use monitor::GpuMemoryMetrics;pub use monitor::GpuMetrics;pub use monitor::GpuMonitor;pub use monitor::GpuPcieMetrics;pub use monitor::GpuPowerMetrics;pub use monitor::GpuThermalMetrics;pub use monitor::GpuUtilization;pub use monitor::GpuVendor;pub use monitor::MonitorConfig;pub use monitor::MonitorError;pub use vector::Vector;pub use brick::fnv1a_f32_checksum;pub use brick::AddOp;pub use brick::AssertionResult;pub use brick::AttentionOp;pub use brick::BlockQ5K;pub use brick::BlockQ6K;pub use brick::BrickBottleneck;pub use brick::BrickCategory;pub use brick::BrickError;pub use brick::BrickId;pub use brick::BrickIdTimer;pub use brick::BrickLayer;pub use brick::BrickProfiler;pub use brick::BrickSample;pub use brick::BrickStats;pub use brick::BrickTimer;pub use brick::BrickVerification;pub use brick::ByteBudget;pub use brick::CategoryStats;pub use brick::ComputeAssertion;pub use brick::ComputeBackend;pub use brick::ComputeBrick;pub use brick::ComputeOp;pub use brick::DivergenceInfo;pub use brick::DotOp;pub use brick::DotQ5KOp;pub use brick::DotQ6KOp;pub use brick::EdgeType;pub use brick::ExecutionEdge;pub use brick::ExecutionGraph;pub use brick::ExecutionNode;pub use brick::ExecutionNodeId;pub use brick::FusedGateUpOp;pub use brick::FusedGateUpWeights;pub use brick::FusedQKVOp;pub use brick::FusedQKVWeights;pub use brick::KernelChecksum;pub use brick::MatmulOp;pub use brick::PtxRegistry;pub use brick::SoftmaxOp;pub use brick::SyncMode;pub use brick::TileLevel;pub use brick::TileStats;pub use brick::TileTimer;pub use brick::TokenBudget;pub use brick::TokenResult;pub use hardware::default_hardware_path;pub use hardware::Bottleneck;pub use hardware::CpuCapability;pub use hardware::GpuBackend as HardwareGpuBackend;pub use hardware::GpuCapability;pub use hardware::HardwareCapability;pub use hardware::RooflineParams;pub use hardware::SimdWidth;pub use tuner::BottleneckClass;pub use tuner::BottleneckPrediction;pub use tuner::BrickTuner;pub use tuner::ConceptDriftStatus;pub use tuner::ExperimentSuggestion;pub use tuner::FeatureExtractor;pub use tuner::KernelClassifier;pub use tuner::KernelRecommendation;pub use tuner::KernelType;pub use tuner::QuantType;pub use tuner::RunConfig;pub use tuner::ThroughputPrediction;pub use tuner::ThroughputRegressor;pub use tuner::TrainingSample;pub use tuner::TrainingStats;pub use tuner::TunerDataCollector;pub use tuner::TunerError;pub use tuner::TunerFeatures;pub use tuner::TunerRecommendation;pub use tuner::UserFeedback;pub use tiling::optimal_prefetch_distance;pub use tiling::pack_a_index;pub use tiling::pack_b_index;pub use tiling::swizzle_index;pub use tiling::PackingLayout;pub use tiling::PrefetchLocality;pub use tiling::TcbGeometry;pub use tiling::TcbIndexCalculator;pub use tiling::TcbLevel;pub use tiling::TiledQ4KMatvec;pub use tiling::TilingBackend;pub use tiling::TilingConfig;pub use tiling::TilingError;pub use tiling::TilingStats;pub use tiling::Q4K_SUPERBLOCK_BYTES;pub use tiling::Q4K_SUPERBLOCK_SIZE;
Modules§
- backends
- Backend implementations for different SIMD instruction sets
- blis
- BLIS-Style Matrix Multiplication
- brick
- ComputeBrick: Token-Centric Compute Units
- chaos
- Chaos Engineering Configuration
- eigen
- Eigendecomposition for symmetric matrices
- error
- Error types for Trueno operations
- hardware
- Hardware Capability Detection (PMAT-447)
- hash
- SIMD-optimized hash functions for key-value store operations.
- matrix
- Matrix operations for Trueno
- monitor
- GPU Monitoring, Tracing, and Visualization (TRUENO-SPEC-010)
- simulation
- Simulation Testing Framework (TRUENO-SPEC-012)
- tiling
- Tiling Compute Blocks (TCB) - Work Partitioning for High-Performance Kernels
- tuner
- ML-Based ComputeBrick Tuner
- vector
- Vector type with multi-backend support
Macros§
- dispatch_
binary_ op - Macro to dispatch binary operations to appropriate backend
- dispatch_
reduction - Macro to dispatch reduction operations (return f32)
- dispatch_
unary_ op - Macro to dispatch unary operations (a -> result)
- time_
brick - Macro for convenient brick timing with automatic sync.
Enums§
- Backend
- Backend execution target
- OpComplexity
- Operation complexity for GPU dispatch eligibility
- Operation
Type - Operation type for SIMD backend selection
Functions§
- select_
backend_ for_ operation - Select the optimal backend for a specific operation type
- select_
best_ available_ backend - Select the best available backend for the current platform