Crate trueno

Expand description

trueno has moved to aprender-compute.

This crate re-exports aprender-compute for backward compatibility. New code should depend on aprender-compute directly.

Modules§

activations: Canonical scalar activation functions.
backends: Backend implementations for different SIMD instruction sets
blis: BLIS-Style Matrix Multiplication
brick: ComputeBrick: Token-Centric Compute Units
chaos: Chaos Engineering Configuration
contracts: GH-279: Kernel-Level Contracts for the Sovereign AI Stack
eigen: Eigendecomposition for symmetric matrices
error: Error types for Trueno operations
hardware: Hardware Capability Detection (PMAT-447)
hash: SIMD-optimized hash functions for key-value store operations.
inference: End-to-end LLM inference engine.
matrix: Matrix operations for Trueno
monitor: GPU Monitoring, Tracing, and Visualization (TRUENO-SPEC-010)
simulation: Simulation Testing Framework (TRUENO-SPEC-012)
tiling: Tiling Compute Blocks (TCB) - Work Partitioning for High-Performance Kernels
tuner: ML-Based ComputeBrick Tuner
vector: Vector type with multi-backend support

Macros§

dispatch_binary_op: Macro to dispatch binary operations to appropriate backend
dispatch_reduction: Macro to dispatch reduction operations (return f32)
dispatch_unary_op: Macro to dispatch unary operations (a -> result)
time_brick: Macro for convenient brick timing with automatic sync.

Structs§

AddOp: Element-wise add operation.
AssertionResult: Result of a single assertion check.
AttentionOp: Scaled dot-product attention operation.
BlockQ5K: Q5_K block format (5-bit with super-blocks).
BlockQ6K: Q6_K block format (6-bit with super-blocks).
BottleneckPrediction: Bottleneck prediction result
BrickIdTimer: Timer handle returned by start_brick() (PAR-200 fast path).
BrickLayer: A layer of compute bricks that execute sequentially. Throughput ceiling = min(component throughputs).
BrickProfiler: Per-brick profiler using pure Rust timing.
BrickSample: Individual brick timing sample. Pure Rust timing using std::time::Instant.
BrickStats: Accumulated per-brick statistics.
BrickTimer: Timer handle returned by start() (legacy string-based API).
BrickTuner: ML-based ComputeBrick tuner ensemble.
BrickVerification: Verification result from ComputeBrick.
ByteBudget: Performance budget for byte-oriented operations (compression, I/O). Use this for trueno-zram, disk I/O, network throughput, etc.
CategoryStats: Aggregated statistics for a brick category.
ComputeBrick: Self-verifying, token-centric compute unit. Bundles: operation + assertions + budget + verification
ConceptDriftStatus: Concept drift detection result
CpuCapability: CPU capabilities
DivergenceInfo: Information about a detected divergence between CPU and GPU.
DotOp: Dot product operation.
DotQ5KOp: Q5_K dot product operation.
DotQ6KOp: Q6_K dot product operation.
ExecutionEdge: An edge in the execution graph.
ExecutionGraph: Execution path graph for tracking brick → kernel → PTX relationships.
ExecutionNodeId: Node ID in the execution graph.
FeatureExtractor: Extracts features from BrickProfiler and runtime configuration.
FusedGateUpOp: Fused Gate+Up FFN projection with SiLU activation.
FusedGateUpWeights: Weights for fused gate+up FFN projection
FusedQKVOp: Fused Q/K/V projection operation for transformer attention.
FusedQKVWeights: Weights for fused QKV projection
GpuCapability: GPU capabilities
GpuClockMetrics: GPU clock metrics
GpuDeviceInfo: GPU device information (TRUENO-SPEC-010)
GpuMemoryMetrics: GPU memory metrics
GpuMetrics: Complete GPU metrics snapshot
GpuMonitor: GPU Monitor for real-time metrics collection (TRUENO-SPEC-010)
GpuPcieMetrics: GPU PCIe metrics
GpuPowerMetrics: GPU power metrics
GpuThermalMetrics: GPU thermal metrics
GpuUtilization: GPU utilization metrics
HardwareCapability: Complete hardware capability profile
KernelChecksum: Kernel checksum for divergence detection.
KernelClassifier: Kernel classifier using simple rule-based logic.
KernelRecommendation: Kernel recommendation result
MatmulOp: Matrix multiplication operation.
Matrix: A 2D matrix with row-major storage
MonitorConfig: Configuration for GPU monitoring
PtxRegistry: PTX kernel registry for execution graph correlation.
RooflineParams: Roofline model parameters
RunConfig: Runtime configuration for feature extraction
SoftmaxOp: Softmax operation.
SymmetricEigen: Symmetric matrix eigendecomposition
TcbGeometry: Dimensions for a Tiling Compute Block
TcbIndexCalculator: Index calculator for hierarchical tiling
ThroughputPrediction: Throughput prediction result
ThroughputRegressor: Simple linear regression model for throughput prediction.
TileStats: Tile-level profiling statistics.
TileTimer: Timer handle for tile-level profiling.
TiledQ4KMatvec: Tiled Q4_K MatVec executor
TilingConfig: Complete tiling configuration for a kernel
TilingStats: Statistics for a tiled operation
TokenBudget: Performance budget expressed in token terms. Aligns compute costs with LLM inference metrics.
TokenResult: Result of ComputeBrick execution with token metrics.
TrainingSample: Training sample for the tuner
TrainingStats: Training statistics summary
TunerDataCollector: Training data collector with online learning support (T-TUNER-005, GitHub #82)
TunerFeatures: Feature vector for ML-based kernel tuning.
TunerRecommendation: Combined tuner recommendation
Vector: High-performance vector with multi-backend support

Enums§

Backend: Backend execution target
Bottleneck: Workload bottleneck classification
BottleneckClass: Bottleneck classification for ML model.
BrickBottleneck: Bottleneck classification for roofline analysis (PMAT-451)
BrickCategory: Category for hierarchical aggregation of brick statistics.
BrickError: Errors from ComputeBrick execution. Tells you exactly what failed (Jidoka: stop and signal).
BrickId: Well-known brick types for O(1) lookup on hot path.
ComputeAssertion: Type of assertion for compute verification.
ComputeBackend: Execution backend for compute operations. This is the brick-specific backend enum with additional GPU backends.
EdgeType: Edge types in execution graph.
ExecutionNode: Execution graph node types.
ExperimentSuggestion: Suggested experiment to improve performance
GpuBackend: GPU compute backend
GpuVendor: GPU vendor identifier based on PCI vendor ID
HardwareGpuBackend: GPU compute backend
KernelType: Kernel type for feature encoding.
MonitorError: Errors from GPU monitoring operations
OpComplexity: Operation complexity for GPU dispatch eligibility
OperationType: Operation type for SIMD backend selection
PackingLayout: Memory layout for packed matrices
PrefetchLocality: Prefetch locality hint
QuantType: Quantization type for feature encoding.
SimdWidth: SIMD instruction set width
SyncMode: Synchronization mode for GPU profiling.
TcbLevel: Tiling hierarchy level
TileLevel: Tile hierarchy level for profiling.
TilingBackend: Backend target for tiling configuration
TilingError: Tiling configuration errors
TruenoError: Errors that can occur during Trueno operations
TunerError: Tuner error type
UserFeedback: User feedback on a recommendation

Constants§

Q4K_SUPERBLOCK_BYTES
Q4K_SUPERBLOCK_SIZE: Q4_K superblock constants (per GGML specification)

Traits§

ComputeOp: Trait for compute operations that can be wrapped in a ComputeBrick.

Functions§

cuda_monitor_available: Check if CUDA monitoring is available (stub when feature disabled)
default_hardware_path: Default hardware.toml path
f16_to_f32: f16 → f32 conversion (IEEE 754 half-precision).
f32_to_f16: f32 → f16 conversion (IEEE 754 half-precision).
fnv1a_f32_checksum: FNV-1a hash of f32 slice (first 64 elements for efficiency).
gelu_scalar: GELU (Gaussian Error Linear Unit) activation.
hash_bytes: Hash raw bytes to u64.
hash_key: Hash a single key to u64.
hash_keys_batch: Hash multiple keys in batch (SIMD-optimized).
hash_keys_batch_with_backend: Hash multiple keys with explicit backend selection.
optimal_prefetch_distance: Calculate optimal prefetch distance based on tile geometry and cache level
pack_a_index: Calculate packed index for panel-major A layout
pack_b_index: Calculate packed index for panel-major B layout
relu_scalar: ReLU (Rectified Linear Unit) activation.
select_backend_for_operation: Select the optimal backend for a specific operation type
select_best_available_backend: Select the best available backend for the current platform
sigmoid_scalar: Sigmoid activation: σ(x) = 1 / (1 + exp(-x)).
silu_scalar: SiLU (Sigmoid Linear Unit) / Swish activation: x * σ(x).
swizzle_index: Apply XOR swizzling for shared memory bank conflict avoidance
tanh_scalar: Tanh activation.

Type Aliases§

Result: Result type for Trueno operations

Crate trueno

Crate trueno Copy item path

Modules§

Macros§

Structs§

Enums§

Constants§

Traits§

Functions§

Type Aliases§

Crate trueno