Crate ruvector_sparse_inference

Crate ruvector_sparse_inference 

Source
Expand description

§Sparse Inference Engine for RuVector

PowerInfer-style activation locality inference engine for efficient neural network inference on edge devices.

This crate provides efficient sparse inference for large language models using adaptive neuron prediction and quantization techniques.

§Key Features

  • Activation Locality: Exploits power-law distribution of neuron activations
  • Low-Rank Prediction: Fast neuron selection using P·Q matrix factorization
  • Sparse FFN: Only compute active neurons, skip cold ones
  • SIMD Optimization: AVX2, SSE4.1, NEON, and WASM SIMD support
  • GGUF Support: Full compatibility with quantized Llama models
  • Hot/Cold Caching: Intelligent neuron weight management
  • π Integration: Structural constants for calibration, drift detection, and chaos
  • Precision Lanes: 3/5/7-bit layered quantization with graduation policies

§Performance Targets

  • LFM2 350M: ~5-10ms per sentence (2.5x speedup)
  • Llama 7B: 50-100ms per token (5-10x speedup)
  • Memory: 1.5-2x reduction via weight offloading

§π Integration

π is irrational, non-repeating, and structure-rich. This makes it ideal for:

  • Calibration: π-derived constants avoid power-of-2 resonance artifacts
  • Drift Detection: Quantization honesty signals using π transforms
  • Angular Embeddings: Hyperspherical projections with π phase encoding
  • Chaos Seeding: Deterministic pseudo-randomness without RNG state

§Example

use ruvector_sparse_inference::{SparseInferenceEngine, SparsityConfig, PiContext};

// Create sparse inference engine
let engine = SparseInferenceEngine::new_sparse(512, 2048, 0.1)?;

// Use π context for calibration
let pi_ctx = PiContext::new(PrecisionLane::Bit5);
let calibrated = pi_ctx.calibrate(input_value);

// Run inference
let input = vec![0.1f32; 512];
let output = engine.infer(&input)?;

Re-exports§

pub use config::SparsityConfig;
pub use config::ActivationType;
pub use config::CacheConfig;
pub use config::ModelConfig;
pub use config::CacheStrategy;
pub use error::SparseInferenceError;
pub use error::Result;
pub use predictor::Predictor;
pub use predictor::LowRankPredictor;
pub use sparse::SparseFfn;
pub use sparse::FeedForward;
pub use memory::QuantizedWeights;
pub use memory::NeuronCache;
pub use model::GgufParser;
pub use model::ModelInput;
pub use model::ModelOutput;
pub use model::InferenceConfig;
pub use model::ModelRunner;
pub use model::LlamaModel;
pub use model::ModelMetadata;
pub use integration::SparseEmbeddingProvider;
pub use integration::SparseInferenceBackend;
pub use precision::PrecisionLane;
pub use precision::LaneConfig;
pub use precision::GraduationPolicy;
pub use precision::GraduationDecision;
pub use precision::Quantizer3Bit;
pub use precision::Quantizer5Bit;
pub use precision::Quantizer7Bit;
pub use precision::LaneTelemetry;
pub use pi::PiContext;
pub use pi::PiCalibration;
pub use pi::DriftDetector;
pub use pi::DriftReport;
pub use pi::QuantizationHonesty;
pub use pi::AngularEmbedding;
pub use pi::PhaseEncoder;
pub use pi::HypersphericalProjection;
pub use pi::PiChaos;
pub use pi::DeterministicJitter;
pub use pi::PiScheduler;
pub use pi::PI_SCALE_3BIT;
pub use pi::PI_SCALE_5BIT;
pub use pi::PI_SCALE_7BIT;

Modules§

backend
Backend abstraction for hardware-specific optimizations
config
Configuration structures for sparse inference.
error
Error types for the sparse inference engine.
integration
Integration modules for Ruvector and RuvLLM ecosystems
memory
Memory management for sparse inference.
model
Model loading and inference infrastructure
ops
Basic neural network operations
pi
π (Pi) Integration Module - Structural Constants for Low-Precision Systems
precision
Precision Lanes Module - Layered Quantization for Sparse Inference
predictor
Activation predictor module.
sparse
Sparse computation module.

Structs§

SparseInferenceEngine
Sparse inference engine that coordinates prediction and computation
SparsityStats
Statistics about sparsity during inference