Skip to main content

Crate trueno

Crate trueno 

Source
Expand description

Trueno: Multi-Target High-Performance Compute Library

Trueno (Spanish: “thunder”) provides unified, high-performance compute primitives across three execution targets:

  1. CPU SIMD - x86 (SSE2/AVX/AVX2/AVX-512), ARM (NEON), WASM (SIMD128)
  2. GPU - Vulkan/Metal/DX12/WebGPU via wgpu
  3. WebAssembly - Portable SIMD128 for browser/edge deployment

§Design Principles

  • Write once, optimize everywhere: Single algorithm, multiple backends
  • Runtime dispatch: Auto-select best implementation based on CPU features
  • Zero unsafe in public API: Safety via type system, unsafe isolated in backends
  • Benchmarked performance: Every optimization must prove ≥10% speedup
  • Extreme TDD: >90% test coverage, mutation testing, property-based tests

§Quick Start

use trueno::Vector;

let a = Vector::from_slice(&[1.0, 2.0, 3.0, 4.0]);
let b = Vector::from_slice(&[5.0, 6.0, 7.0, 8.0]);

// Auto-selects best backend (AVX2/GPU/WASM)
let result = a.add(&b).unwrap();
assert_eq!(result.as_slice(), &[6.0, 8.0, 10.0, 12.0]);

Re-exports§

pub use eigen::SymmetricEigen;
pub use error::Result;
pub use error::TruenoError;
pub use hash::hash_bytes;
pub use hash::hash_key;
pub use hash::hash_keys_batch;
pub use hash::hash_keys_batch_with_backend;
pub use matrix::Matrix;
pub use monitor::cuda_monitor_available;
pub use monitor::GpuBackend;
pub use monitor::GpuClockMetrics;
pub use monitor::GpuDeviceInfo;
pub use monitor::GpuMemoryMetrics;
pub use monitor::GpuMetrics;
pub use monitor::GpuMonitor;
pub use monitor::GpuPcieMetrics;
pub use monitor::GpuPowerMetrics;
pub use monitor::GpuThermalMetrics;
pub use monitor::GpuUtilization;
pub use monitor::GpuVendor;
pub use monitor::MonitorConfig;
pub use monitor::MonitorError;
pub use vector::Vector;
pub use brick::fnv1a_f32_checksum;
pub use brick::AddOp;
pub use brick::AssertionResult;
pub use brick::AttentionOp;
pub use brick::BlockQ5K;
pub use brick::BlockQ6K;
pub use brick::BrickBottleneck;
pub use brick::BrickCategory;
pub use brick::BrickError;
pub use brick::BrickId;
pub use brick::BrickIdTimer;
pub use brick::BrickLayer;
pub use brick::BrickProfiler;
pub use brick::BrickSample;
pub use brick::BrickStats;
pub use brick::BrickTimer;
pub use brick::BrickVerification;
pub use brick::ByteBudget;
pub use brick::CategoryStats;
pub use brick::ComputeAssertion;
pub use brick::ComputeBackend;
pub use brick::ComputeBrick;
pub use brick::ComputeOp;
pub use brick::DivergenceInfo;
pub use brick::DotOp;
pub use brick::DotQ5KOp;
pub use brick::DotQ6KOp;
pub use brick::EdgeType;
pub use brick::ExecutionEdge;
pub use brick::ExecutionGraph;
pub use brick::ExecutionNode;
pub use brick::ExecutionNodeId;
pub use brick::FusedGateUpOp;
pub use brick::FusedGateUpWeights;
pub use brick::FusedQKVOp;
pub use brick::FusedQKVWeights;
pub use brick::KernelChecksum;
pub use brick::MatmulOp;
pub use brick::PtxRegistry;
pub use brick::SoftmaxOp;
pub use brick::SyncMode;
pub use brick::TileLevel;
pub use brick::TileStats;
pub use brick::TileTimer;
pub use brick::TokenBudget;
pub use brick::TokenResult;
pub use hardware::default_hardware_path;
pub use hardware::Bottleneck;
pub use hardware::CpuCapability;
pub use hardware::GpuBackend as HardwareGpuBackend;
pub use hardware::GpuCapability;
pub use hardware::HardwareCapability;
pub use hardware::RooflineParams;
pub use hardware::SimdWidth;
pub use tuner::BottleneckClass;
pub use tuner::BottleneckPrediction;
pub use tuner::BrickTuner;
pub use tuner::ConceptDriftStatus;
pub use tuner::ExperimentSuggestion;
pub use tuner::FeatureExtractor;
pub use tuner::KernelClassifier;
pub use tuner::KernelRecommendation;
pub use tuner::KernelType;
pub use tuner::QuantType;
pub use tuner::RunConfig;
pub use tuner::ThroughputPrediction;
pub use tuner::ThroughputRegressor;
pub use tuner::TrainingSample;
pub use tuner::TrainingStats;
pub use tuner::TunerDataCollector;
pub use tuner::TunerError;
pub use tuner::TunerFeatures;
pub use tuner::TunerRecommendation;
pub use tuner::UserFeedback;
pub use tiling::optimal_prefetch_distance;
pub use tiling::pack_a_index;
pub use tiling::pack_b_index;
pub use tiling::swizzle_index;
pub use tiling::PackingLayout;
pub use tiling::PrefetchLocality;
pub use tiling::TcbGeometry;
pub use tiling::TcbIndexCalculator;
pub use tiling::TcbLevel;
pub use tiling::TiledQ4KMatvec;
pub use tiling::TilingBackend;
pub use tiling::TilingConfig;
pub use tiling::TilingError;
pub use tiling::TilingStats;
pub use tiling::Q4K_SUPERBLOCK_BYTES;
pub use tiling::Q4K_SUPERBLOCK_SIZE;

Modules§

backends
Backend implementations for different SIMD instruction sets
blis
BLIS-Style Matrix Multiplication
brick
ComputeBrick: Token-Centric Compute Units
chaos
Chaos Engineering Configuration
eigen
Eigendecomposition for symmetric matrices
error
Error types for Trueno operations
hardware
Hardware Capability Detection (PMAT-447)
hash
SIMD-optimized hash functions for key-value store operations.
matrix
Matrix operations for Trueno
monitor
GPU Monitoring, Tracing, and Visualization (TRUENO-SPEC-010)
simulation
Simulation Testing Framework (TRUENO-SPEC-012)
tiling
Tiling Compute Blocks (TCB) - Work Partitioning for High-Performance Kernels
tuner
ML-Based ComputeBrick Tuner
vector
Vector type with multi-backend support

Macros§

dispatch_binary_op
Macro to dispatch binary operations to appropriate backend
dispatch_reduction
Macro to dispatch reduction operations (return f32)
dispatch_unary_op
Macro to dispatch unary operations (a -> result)
time_brick
Macro for convenient brick timing with automatic sync.

Enums§

Backend
Backend execution target
OpComplexity
Operation complexity for GPU dispatch eligibility
OperationType
Operation type for SIMD backend selection

Functions§

select_backend_for_operation
Select the optimal backend for a specific operation type
select_best_available_backend
Select the best available backend for the current platform