Crate ruvector_sparse_inference

Expand description

§Sparse Inference Engine for RuVector

PowerInfer-style activation locality inference engine for efficient neural network inference on edge devices.

This crate provides efficient sparse inference for large language models using adaptive neuron prediction and quantization techniques.

§Key Features

Activation Locality: Exploits power-law distribution of neuron activations
Low-Rank Prediction: Fast neuron selection using P·Q matrix factorization
Sparse FFN: Only compute active neurons, skip cold ones
SIMD Optimization: AVX2, SSE4.1, NEON, and WASM SIMD support
GGUF Support: Full compatibility with quantized Llama models
Hot/Cold Caching: Intelligent neuron weight management
π Integration: Structural constants for calibration, drift detection, and chaos
Precision Lanes: 3/5/7-bit layered quantization with graduation policies

§Performance Targets

LFM2 350M: ~5-10ms per sentence (2.5x speedup)
Llama 7B: 50-100ms per token (5-10x speedup)
Memory: 1.5-2x reduction via weight offloading

§π Integration

π is irrational, non-repeating, and structure-rich. This makes it ideal for:

Calibration: π-derived constants avoid power-of-2 resonance artifacts
Drift Detection: Quantization honesty signals using π transforms
Angular Embeddings: Hyperspherical projections with π phase encoding
Chaos Seeding: Deterministic pseudo-randomness without RNG state

§Example

use ruvector_sparse_inference::{SparseInferenceEngine, SparsityConfig, PiContext};

// Create sparse inference engine
let engine = SparseInferenceEngine::new_sparse(512, 2048, 0.1)?;

// Use π context for calibration
let pi_ctx = PiContext::new(PrecisionLane::Bit5);
let calibrated = pi_ctx.calibrate(input_value);

// Run inference
let input = vec![0.1f32; 512];
let output = engine.infer(&input)?;

Re-exports§

pub use config::SparsityConfig;
pub use config::ActivationType;
pub use config::CacheConfig;
pub use config::ModelConfig;
pub use config::CacheStrategy;
pub use error::SparseInferenceError;
pub use error::Result;
pub use predictor::Predictor;
pub use predictor::LowRankPredictor;
pub use sparse::SparseFfn;
pub use sparse::FeedForward;
pub use memory::QuantizedWeights;
pub use memory::NeuronCache;
pub use model::GgufParser;
pub use model::ModelInput;
pub use model::ModelOutput;
pub use model::InferenceConfig;
pub use model::ModelRunner;
pub use model::LlamaModel;
pub use model::ModelMetadata;
pub use integration::SparseEmbeddingProvider;
pub use integration::SparseInferenceBackend;
pub use precision::PrecisionLane;
pub use precision::LaneConfig;
pub use precision::GraduationPolicy;
pub use precision::GraduationDecision;
pub use precision::Quantizer3Bit;
pub use precision::Quantizer5Bit;
pub use precision::Quantizer7Bit;
pub use precision::LaneTelemetry;
pub use pi::PiContext;
pub use pi::PiCalibration;
pub use pi::DriftDetector;
pub use pi::DriftReport;
pub use pi::QuantizationHonesty;
pub use pi::AngularEmbedding;
pub use pi::PhaseEncoder;
pub use pi::HypersphericalProjection;
pub use pi::PiChaos;
pub use pi::DeterministicJitter;
pub use pi::PiScheduler;
pub use pi::PI_SCALE_3BIT;
pub use pi::PI_SCALE_5BIT;
pub use pi::PI_SCALE_7BIT;

Modules§

backend: Backend abstraction for hardware-specific optimizations
config: Configuration structures for sparse inference.
error: Error types for the sparse inference engine.
integration: Integration modules for Ruvector and RuvLLM ecosystems
memory: Memory management for sparse inference.
model: Model loading and inference infrastructure
ops: Basic neural network operations
pi: π (Pi) Integration Module - Structural Constants for Low-Precision Systems
precision: Precision Lanes Module - Layered Quantization for Sparse Inference
predictor: Activation predictor module.
sparse: Sparse computation module.

Structs§

SparseInferenceEngine: Sparse inference engine that coordinates prediction and computation
SparsityStats: Statistics about sparsity during inference

Crate ruvector_sparse_inference

Crate ruvector_sparse_inference Copy item path

§Sparse Inference Engine for RuVector

§Key Features

§Performance Targets

§π Integration

§Example

Re-exports§

Modules§

Structs§

Crate ruvector_sparse_inference