Skip to main content

Crate trueno

Crate trueno 

Source
Expand description

trueno has moved to aprender-compute.

This crate re-exports aprender-compute for backward compatibility. New code should depend on aprender-compute directly.

Modules§

activations
Canonical scalar activation functions.
backends
Backend implementations for different SIMD instruction sets
blis
BLIS-Style Matrix Multiplication
brick
ComputeBrick: Token-Centric Compute Units
chaos
Chaos Engineering Configuration
contracts
GH-279: Kernel-Level Contracts for the Sovereign AI Stack
eigen
Eigendecomposition for symmetric matrices
error
Error types for Trueno operations
hardware
Hardware Capability Detection (PMAT-447)
hash
SIMD-optimized hash functions for key-value store operations.
inference
End-to-end LLM inference engine.
matrix
Matrix operations for Trueno
monitor
GPU Monitoring, Tracing, and Visualization (TRUENO-SPEC-010)
simulation
Simulation Testing Framework (TRUENO-SPEC-012)
tiling
Tiling Compute Blocks (TCB) - Work Partitioning for High-Performance Kernels
tuner
ML-Based ComputeBrick Tuner
vector
Vector type with multi-backend support

Macros§

dispatch_binary_op
Macro to dispatch binary operations to appropriate backend
dispatch_reduction
Macro to dispatch reduction operations (return f32)
dispatch_unary_op
Macro to dispatch unary operations (a -> result)
time_brick
Macro for convenient brick timing with automatic sync.

Structs§

AddOp
Element-wise add operation.
AssertionResult
Result of a single assertion check.
AttentionOp
Scaled dot-product attention operation.
BlockQ5K
Q5_K block format (5-bit with super-blocks).
BlockQ6K
Q6_K block format (6-bit with super-blocks).
BottleneckPrediction
Bottleneck prediction result
BrickIdTimer
Timer handle returned by start_brick() (PAR-200 fast path).
BrickLayer
A layer of compute bricks that execute sequentially. Throughput ceiling = min(component throughputs).
BrickProfiler
Per-brick profiler using pure Rust timing.
BrickSample
Individual brick timing sample. Pure Rust timing using std::time::Instant.
BrickStats
Accumulated per-brick statistics.
BrickTimer
Timer handle returned by start() (legacy string-based API).
BrickTuner
ML-based ComputeBrick tuner ensemble.
BrickVerification
Verification result from ComputeBrick.
ByteBudget
Performance budget for byte-oriented operations (compression, I/O). Use this for trueno-zram, disk I/O, network throughput, etc.
CategoryStats
Aggregated statistics for a brick category.
ComputeBrick
Self-verifying, token-centric compute unit. Bundles: operation + assertions + budget + verification
ConceptDriftStatus
Concept drift detection result
CpuCapability
CPU capabilities
DivergenceInfo
Information about a detected divergence between CPU and GPU.
DotOp
Dot product operation.
DotQ5KOp
Q5_K dot product operation.
DotQ6KOp
Q6_K dot product operation.
ExecutionEdge
An edge in the execution graph.
ExecutionGraph
Execution path graph for tracking brick → kernel → PTX relationships.
ExecutionNodeId
Node ID in the execution graph.
FeatureExtractor
Extracts features from BrickProfiler and runtime configuration.
FusedGateUpOp
Fused Gate+Up FFN projection with SiLU activation.
FusedGateUpWeights
Weights for fused gate+up FFN projection
FusedQKVOp
Fused Q/K/V projection operation for transformer attention.
FusedQKVWeights
Weights for fused QKV projection
GpuCapability
GPU capabilities
GpuClockMetrics
GPU clock metrics
GpuDeviceInfo
GPU device information (TRUENO-SPEC-010)
GpuMemoryMetrics
GPU memory metrics
GpuMetrics
Complete GPU metrics snapshot
GpuMonitor
GPU Monitor for real-time metrics collection (TRUENO-SPEC-010)
GpuPcieMetrics
GPU PCIe metrics
GpuPowerMetrics
GPU power metrics
GpuThermalMetrics
GPU thermal metrics
GpuUtilization
GPU utilization metrics
HardwareCapability
Complete hardware capability profile
KernelChecksum
Kernel checksum for divergence detection.
KernelClassifier
Kernel classifier using simple rule-based logic.
KernelRecommendation
Kernel recommendation result
MatmulOp
Matrix multiplication operation.
Matrix
A 2D matrix with row-major storage
MonitorConfig
Configuration for GPU monitoring
PtxRegistry
PTX kernel registry for execution graph correlation.
RooflineParams
Roofline model parameters
RunConfig
Runtime configuration for feature extraction
SoftmaxOp
Softmax operation.
SymmetricEigen
Symmetric matrix eigendecomposition
TcbGeometry
Dimensions for a Tiling Compute Block
TcbIndexCalculator
Index calculator for hierarchical tiling
ThroughputPrediction
Throughput prediction result
ThroughputRegressor
Simple linear regression model for throughput prediction.
TileStats
Tile-level profiling statistics.
TileTimer
Timer handle for tile-level profiling.
TiledQ4KMatvec
Tiled Q4_K MatVec executor
TilingConfig
Complete tiling configuration for a kernel
TilingStats
Statistics for a tiled operation
TokenBudget
Performance budget expressed in token terms. Aligns compute costs with LLM inference metrics.
TokenResult
Result of ComputeBrick execution with token metrics.
TrainingSample
Training sample for the tuner
TrainingStats
Training statistics summary
TunerDataCollector
Training data collector with online learning support (T-TUNER-005, GitHub #82)
TunerFeatures
Feature vector for ML-based kernel tuning.
TunerRecommendation
Combined tuner recommendation
Vector
High-performance vector with multi-backend support

Enums§

Backend
Backend execution target
Bottleneck
Workload bottleneck classification
BottleneckClass
Bottleneck classification for ML model.
BrickBottleneck
Bottleneck classification for roofline analysis (PMAT-451)
BrickCategory
Category for hierarchical aggregation of brick statistics.
BrickError
Errors from ComputeBrick execution. Tells you exactly what failed (Jidoka: stop and signal).
BrickId
Well-known brick types for O(1) lookup on hot path.
ComputeAssertion
Type of assertion for compute verification.
ComputeBackend
Execution backend for compute operations. This is the brick-specific backend enum with additional GPU backends.
EdgeType
Edge types in execution graph.
ExecutionNode
Execution graph node types.
ExperimentSuggestion
Suggested experiment to improve performance
GpuBackend
GPU compute backend
GpuVendor
GPU vendor identifier based on PCI vendor ID
HardwareGpuBackend
GPU compute backend
KernelType
Kernel type for feature encoding.
MonitorError
Errors from GPU monitoring operations
OpComplexity
Operation complexity for GPU dispatch eligibility
OperationType
Operation type for SIMD backend selection
PackingLayout
Memory layout for packed matrices
PrefetchLocality
Prefetch locality hint
QuantType
Quantization type for feature encoding.
SimdWidth
SIMD instruction set width
SyncMode
Synchronization mode for GPU profiling.
TcbLevel
Tiling hierarchy level
TileLevel
Tile hierarchy level for profiling.
TilingBackend
Backend target for tiling configuration
TilingError
Tiling configuration errors
TruenoError
Errors that can occur during Trueno operations
TunerError
Tuner error type
UserFeedback
User feedback on a recommendation

Constants§

Q4K_SUPERBLOCK_BYTES
Q4K_SUPERBLOCK_SIZE
Q4_K superblock constants (per GGML specification)

Traits§

ComputeOp
Trait for compute operations that can be wrapped in a ComputeBrick.

Functions§

cuda_monitor_available
Check if CUDA monitoring is available (stub when feature disabled)
default_hardware_path
Default hardware.toml path
f16_to_f32
f16 → f32 conversion (IEEE 754 half-precision).
f32_to_f16
f32 → f16 conversion (IEEE 754 half-precision).
fnv1a_f32_checksum
FNV-1a hash of f32 slice (first 64 elements for efficiency).
gelu_scalar
GELU (Gaussian Error Linear Unit) activation.
hash_bytes
Hash raw bytes to u64.
hash_key
Hash a single key to u64.
hash_keys_batch
Hash multiple keys in batch (SIMD-optimized).
hash_keys_batch_with_backend
Hash multiple keys with explicit backend selection.
optimal_prefetch_distance
Calculate optimal prefetch distance based on tile geometry and cache level
pack_a_index
Calculate packed index for panel-major A layout
pack_b_index
Calculate packed index for panel-major B layout
relu_scalar
ReLU (Rectified Linear Unit) activation.
select_backend_for_operation
Select the optimal backend for a specific operation type
select_best_available_backend
Select the best available backend for the current platform
sigmoid_scalar
Sigmoid activation: σ(x) = 1 / (1 + exp(-x)).
silu_scalar
SiLU (Sigmoid Linear Unit) / Swish activation: x * σ(x).
swizzle_index
Apply XOR swizzling for shared memory bank conflict avoidance
tanh_scalar
Tanh activation.

Type Aliases§

Result
Result type for Trueno operations