Expand description
trueno has moved to aprender-compute.
This crate re-exports aprender-compute for backward compatibility.
New code should depend on aprender-compute directly.
Modules§
- activations
- Canonical scalar activation functions.
- backends
- Backend implementations for different SIMD instruction sets
- blis
- BLIS-Style Matrix Multiplication
- brick
- ComputeBrick: Token-Centric Compute Units
- chaos
- Chaos Engineering Configuration
- contracts
- GH-279: Kernel-Level Contracts for the Sovereign AI Stack
- eigen
- Eigendecomposition for symmetric matrices
- error
- Error types for Trueno operations
- hardware
- Hardware Capability Detection (PMAT-447)
- hash
- SIMD-optimized hash functions for key-value store operations.
- inference
- End-to-end LLM inference engine.
- matrix
- Matrix operations for Trueno
- monitor
- GPU Monitoring, Tracing, and Visualization (TRUENO-SPEC-010)
- simulation
- Simulation Testing Framework (TRUENO-SPEC-012)
- tiling
- Tiling Compute Blocks (TCB) - Work Partitioning for High-Performance Kernels
- tuner
- ML-Based ComputeBrick Tuner
- vector
- Vector type with multi-backend support
Macros§
- dispatch_
binary_ op - Macro to dispatch binary operations to appropriate backend
- dispatch_
reduction - Macro to dispatch reduction operations (return f32)
- dispatch_
unary_ op - Macro to dispatch unary operations (a -> result)
- time_
brick - Macro for convenient brick timing with automatic sync.
Structs§
- AddOp
- Element-wise add operation.
- Assertion
Result - Result of a single assertion check.
- Attention
Op - Scaled dot-product attention operation.
- Block
Q5K - Q5_K block format (5-bit with super-blocks).
- Block
Q6K - Q6_K block format (6-bit with super-blocks).
- Bottleneck
Prediction - Bottleneck prediction result
- Brick
IdTimer - Timer handle returned by
start_brick()(PAR-200 fast path). - Brick
Layer - A layer of compute bricks that execute sequentially. Throughput ceiling = min(component throughputs).
- Brick
Profiler - Per-brick profiler using pure Rust timing.
- Brick
Sample - Individual brick timing sample.
Pure Rust timing using
std::time::Instant. - Brick
Stats - Accumulated per-brick statistics.
- Brick
Timer - Timer handle returned by
start()(legacy string-based API). - Brick
Tuner - ML-based ComputeBrick tuner ensemble.
- Brick
Verification - Verification result from ComputeBrick.
- Byte
Budget - Performance budget for byte-oriented operations (compression, I/O). Use this for trueno-zram, disk I/O, network throughput, etc.
- Category
Stats - Aggregated statistics for a brick category.
- Compute
Brick - Self-verifying, token-centric compute unit. Bundles: operation + assertions + budget + verification
- Concept
Drift Status - Concept drift detection result
- CpuCapability
- CPU capabilities
- Divergence
Info - Information about a detected divergence between CPU and GPU.
- DotOp
- Dot product operation.
- DotQ5K
Op - Q5_K dot product operation.
- DotQ6K
Op - Q6_K dot product operation.
- Execution
Edge - An edge in the execution graph.
- Execution
Graph - Execution path graph for tracking brick → kernel → PTX relationships.
- Execution
Node Id - Node ID in the execution graph.
- Feature
Extractor - Extracts features from BrickProfiler and runtime configuration.
- Fused
Gate UpOp - Fused Gate+Up FFN projection with SiLU activation.
- Fused
Gate UpWeights - Weights for fused gate+up FFN projection
- FusedQKV
Op - Fused Q/K/V projection operation for transformer attention.
- FusedQKV
Weights - Weights for fused QKV projection
- GpuCapability
- GPU capabilities
- GpuClock
Metrics - GPU clock metrics
- GpuDevice
Info - GPU device information (TRUENO-SPEC-010)
- GpuMemory
Metrics - GPU memory metrics
- GpuMetrics
- Complete GPU metrics snapshot
- GpuMonitor
- GPU Monitor for real-time metrics collection (TRUENO-SPEC-010)
- GpuPcie
Metrics - GPU PCIe metrics
- GpuPower
Metrics - GPU power metrics
- GpuThermal
Metrics - GPU thermal metrics
- GpuUtilization
- GPU utilization metrics
- Hardware
Capability - Complete hardware capability profile
- Kernel
Checksum - Kernel checksum for divergence detection.
- Kernel
Classifier - Kernel classifier using simple rule-based logic.
- Kernel
Recommendation - Kernel recommendation result
- Matmul
Op - Matrix multiplication operation.
- Matrix
- A 2D matrix with row-major storage
- Monitor
Config - Configuration for GPU monitoring
- PtxRegistry
- PTX kernel registry for execution graph correlation.
- Roofline
Params - Roofline model parameters
- RunConfig
- Runtime configuration for feature extraction
- Softmax
Op - Softmax operation.
- Symmetric
Eigen - Symmetric matrix eigendecomposition
- TcbGeometry
- Dimensions for a Tiling Compute Block
- TcbIndex
Calculator - Index calculator for hierarchical tiling
- Throughput
Prediction - Throughput prediction result
- Throughput
Regressor - Simple linear regression model for throughput prediction.
- Tile
Stats - Tile-level profiling statistics.
- Tile
Timer - Timer handle for tile-level profiling.
- Tiled
Q4KMatvec - Tiled Q4_K MatVec executor
- Tiling
Config - Complete tiling configuration for a kernel
- Tiling
Stats - Statistics for a tiled operation
- Token
Budget - Performance budget expressed in token terms. Aligns compute costs with LLM inference metrics.
- Token
Result - Result of ComputeBrick execution with token metrics.
- Training
Sample - Training sample for the tuner
- Training
Stats - Training statistics summary
- Tuner
Data Collector - Training data collector with online learning support (T-TUNER-005, GitHub #82)
- Tuner
Features - Feature vector for ML-based kernel tuning.
- Tuner
Recommendation - Combined tuner recommendation
- Vector
- High-performance vector with multi-backend support
Enums§
- Backend
- Backend execution target
- Bottleneck
- Workload bottleneck classification
- Bottleneck
Class - Bottleneck classification for ML model.
- Brick
Bottleneck - Bottleneck classification for roofline analysis (PMAT-451)
- Brick
Category - Category for hierarchical aggregation of brick statistics.
- Brick
Error - Errors from ComputeBrick execution. Tells you exactly what failed (Jidoka: stop and signal).
- BrickId
- Well-known brick types for O(1) lookup on hot path.
- Compute
Assertion - Type of assertion for compute verification.
- Compute
Backend - Execution backend for compute operations. This is the brick-specific backend enum with additional GPU backends.
- Edge
Type - Edge types in execution graph.
- Execution
Node - Execution graph node types.
- Experiment
Suggestion - Suggested experiment to improve performance
- GpuBackend
- GPU compute backend
- GpuVendor
- GPU vendor identifier based on PCI vendor ID
- Hardware
GpuBackend - GPU compute backend
- Kernel
Type - Kernel type for feature encoding.
- Monitor
Error - Errors from GPU monitoring operations
- OpComplexity
- Operation complexity for GPU dispatch eligibility
- Operation
Type - Operation type for SIMD backend selection
- Packing
Layout - Memory layout for packed matrices
- Prefetch
Locality - Prefetch locality hint
- Quant
Type - Quantization type for feature encoding.
- Simd
Width - SIMD instruction set width
- Sync
Mode - Synchronization mode for GPU profiling.
- TcbLevel
- Tiling hierarchy level
- Tile
Level - Tile hierarchy level for profiling.
- Tiling
Backend - Backend target for tiling configuration
- Tiling
Error - Tiling configuration errors
- Trueno
Error - Errors that can occur during Trueno operations
- Tuner
Error - Tuner error type
- User
Feedback - User feedback on a recommendation
Constants§
- Q4K_
SUPERBLOCK_ BYTES - Q4K_
SUPERBLOCK_ SIZE - Q4_K superblock constants (per GGML specification)
Traits§
- Compute
Op - Trait for compute operations that can be wrapped in a ComputeBrick.
Functions§
- cuda_
monitor_ available - Check if CUDA monitoring is available (stub when feature disabled)
- default_
hardware_ path - Default hardware.toml path
- f16_
to_ f32 - f16 → f32 conversion (IEEE 754 half-precision).
- f32_
to_ f16 - f32 → f16 conversion (IEEE 754 half-precision).
- fnv1a_
f32_ checksum - FNV-1a hash of f32 slice (first 64 elements for efficiency).
- gelu_
scalar - GELU (Gaussian Error Linear Unit) activation.
- hash_
bytes - Hash raw bytes to u64.
- hash_
key - Hash a single key to u64.
- hash_
keys_ batch - Hash multiple keys in batch (SIMD-optimized).
- hash_
keys_ batch_ with_ backend - Hash multiple keys with explicit backend selection.
- optimal_
prefetch_ distance - Calculate optimal prefetch distance based on tile geometry and cache level
- pack_
a_ index - Calculate packed index for panel-major A layout
- pack_
b_ index - Calculate packed index for panel-major B layout
- relu_
scalar - ReLU (Rectified Linear Unit) activation.
- select_
backend_ for_ operation - Select the optimal backend for a specific operation type
- select_
best_ available_ backend - Select the best available backend for the current platform
- sigmoid_
scalar - Sigmoid activation: σ(x) = 1 / (1 + exp(-x)).
- silu_
scalar - SiLU (Sigmoid Linear Unit) / Swish activation: x * σ(x).
- swizzle_
index - Apply XOR swizzling for shared memory bank conflict avoidance
- tanh_
scalar - Tanh activation.
Type Aliases§
- Result
- Result type for Trueno operations