ruvector-solver 2.0.3

Sublinear-time solver for RuVector: O(log n) to O(√n) algorithms for sparse linear systems, PageRank, and spectral methods
Documentation

ruvector-solver

Crates.io docs.rs License: MIT Tests

Sublinear-time sparse solvers for RuVector — O(log n) to O(sqrt(n)) algorithms that power graph analytics, spectral methods, and AI coherence.

Dense Solvers (nalgebra) ruvector-solver
Complexity O(n^3) O(nnz * log n) to O(log n)
Memory O(n^2) dense O(nnz) sparse CSR
SIMD Partial AVX2 8-wide + fused kernels
Algorithms LU, QR 7 specialized + auto router
WASM No Full wasm-bindgen bindings
PageRank Not supported 3 sublinear algorithms

All solvers operate on a shared CSR (Compressed Sparse Row) matrix representation and expose a uniform SolverEngine trait for seamless algorithm swapping and automatic routing.

Algorithms

Algorithm Module Complexity Applicable to
Jacobi-preconditioned Neumann series neumann O(nnz * log(1/eps)) Diagonally dominant Ax = b
Conjugate Gradient (Hestenes-Stiefel) cg O(nnz * sqrt(kappa)) Symmetric positive-definite Ax = b
Forward Push (Andersen-Chung-Lang) forward_push O(1/epsilon) Single-source Personalized PageRank
Backward Push backward_push O(1/epsilon) Reverse relevance / target-centric PPR
Hybrid Random Walk random_walk O(sqrt(n)/epsilon) Large-graph PPR with push initialisation
TRUE (JL + sparsification + Neumann) true_solver O(nnz * log n) Batch linear systems with shared A
BMSSP Multigrid (V-cycle + Jacobi) bmssp O(n log n) Ill-conditioned / graph Laplacian systems

Quick Start

Add the crate to your Cargo.toml:

[dependencies]
ruvector-solver = "0.1"

Solve a diagonally dominant 3x3 system with the Neumann solver:

use ruvector_solver::types::CsrMatrix;
use ruvector_solver::neumann::NeumannSolver;

// Build a diagonally dominant 3x3 matrix in COO format
//     [ 2.0  -0.5   0.0]
// A = [-0.5   2.0  -0.5]
//     [ 0.0  -0.5   2.0]
let a = CsrMatrix::<f32>::from_coo(3, 3, vec![
    (0, 0, 2.0_f32), (0, 1, -0.5),
    (1, 0, -0.5),    (1, 1, 2.0),  (1, 2, -0.5),
    (2, 1, -0.5),    (2, 2, 2.0),
]);
let b = vec![1.0_f32, 0.0, 1.0];

let solver = NeumannSolver::new(1e-6, 500);
let result = solver.solve(&a, &b).unwrap();

println!("solution: {:?}", result.solution);
println!("iterations: {}", result.iterations);
println!("residual:   {:.2e}", result.residual_norm);
assert!(result.residual_norm < 1e-4);

Use the SolverEngine trait for algorithm-agnostic code:

use ruvector_solver::types::{ComputeBudget, CsrMatrix};
use ruvector_solver::traits::SolverEngine;
use ruvector_solver::neumann::NeumannSolver;

let engine: Box<dyn SolverEngine> = Box::new(NeumannSolver::new(1e-6, 500));
let a = CsrMatrix::<f64>::from_coo(3, 3, vec![
    (0, 0, 2.0), (0, 1, -0.5),
    (1, 0, -0.5), (1, 1, 2.0), (1, 2, -0.5),
    (2, 1, -0.5), (2, 2, 2.0),
]);
let b = vec![1.0_f64, 0.0, 1.0];
let budget = ComputeBudget::default();

let result = engine.solve(&a, &b, &budget).unwrap();
assert!(result.residual_norm < 1e-4);

Feature Flags

Feature Default Description
neumann Yes Jacobi-preconditioned Neumann series solver
cg Yes Conjugate Gradient (Hestenes-Stiefel) solver
forward-push Yes Forward push for single-source PPR
backward-push No Backward push for reverse relevance computation
hybrid-random-walk No Hybrid random walk with push initialisation (enables getrandom)
true-solver No TRUE batch solver (implies neumann)
bmssp No BMSSP multigrid solver (V-cycle with Jacobi smoothing)
all-algorithms No Enables every algorithm above
simd No AVX2 SIMD-accelerated SpMV (x86_64 only)
wasm No WebAssembly target support
parallel No Multi-threaded SpMV and solver loops (enables rayon, crossbeam)
full No All algorithms + parallel + nalgebra-backend

Enable all algorithms:

[dependencies]
ruvector-solver = { version = "0.1", features = ["all-algorithms"] }

Performance Optimisations

Bounds-check-free SpMV (spmv_unchecked)

The inner SpMV loop is the single hottest path in every iterative solver. The spmv_unchecked method on CsrMatrix<f32> and CsrMatrix<f64> uses raw pointers to eliminate per-element bounds checks, relying on a one-time CSR structural validation (validation::validate_csr_matrix) performed before entering the solve loop.

Fused residual + norm computation (fused_residual_norm_sq)

Standard implementations compute the residual r = b - Ax and its squared norm ||r||^2 in separate passes (SpMV, vector subtraction, dot product -- three full memory traversals). fused_residual_norm_sq computes both in a single pass, reducing memory traffic by roughly 3x per iteration.

AVX2 8-wide SIMD SpMV

When the simd feature is enabled on x86_64, spmv_simd dispatches to an AVX2 kernel that processes 8 f32 values per instruction using _mm256 intrinsics with a horizontal sum reduction at the end of each row. Falls back to a portable scalar loop on other architectures.

4-wide unrolled Jacobi update

The Neumann iteration's update step x[j] += d_inv[j] * r[j] is manually unrolled 4-wide for instruction-level parallelism, with a scalar remainder loop for dimensions not divisible by 4.

Arena allocator

SolverArena provides bump allocation for per-solve scratch buffers. All temporary vectors are allocated from a single contiguous backing buffer and reclaimed in O(1) via arena.reset(), eliminating heap allocation overhead inside the iteration loop.

Architecture

                          +-------------------+
                          |   SolverRouter    |
                          | (algorithm select)|
                          +--------+----------+
                                   |
            +----------+-----------+-----------+----------+
            |          |           |           |          |
       +----v---+ +----v---+ +----v------+ +--v----+ +---v----+
       |Neumann | |   CG   | |ForwardPush| | TRUE  | | BMSSP  |
       |Solver  | | Solver | |  Solver   | |Solver | |Solver  |
       +----+---+ +----+---+ +-----+-----+ +--+----+ +---+----+
            |          |            |          |          |
            +-----+----+-----+-----+-----+----+-----+---+
                  |          |           |           |
             +----v---+ +---v----+ +----v----+ +----v-----+
             |types.rs| |simd.rs | |arena.rs | |budget.rs |
             |CsrMatrix| |AVX2   | |Bump     | |ComputeBudget|
             +--------+ |SpMV   | |Alloc    | |enforcement|
                         +-------+ +--------+ +----------+
                  |          |           |
             +----v---+ +---v------+ +--v---------+
             |traits.rs| |validate.rs| |error.rs    |
             |SolverEngine| |CSR check| |SolverError |
             +--------+ +---------+ +-----------+

The SolverRouter analyses the matrix SparsityProfile and QueryType to select the optimal algorithm. When the selected algorithm fails, SolverOrchestrator::solve_with_fallback tries a deterministic fallback chain: selected -> CG -> Dense.

API Overview

Core types (types.rs)

Type Description
CsrMatrix<T> Compressed Sparse Row matrix with spmv, spmv_unchecked, from_coo, transpose
SolverResult Solution vector, iteration count, residual norm, wall time, convergence history
ComputeBudget Maximum time, max iterations, target tolerance
Algorithm Enum of all solver algorithms (Neumann, CG, ForwardPush, ...)
SparsityProfile Matrix structural analysis (density, diagonal dominance, spectral radius estimate)
QueryType What the caller wants to solve (LinearSystem, PageRankSingle, Batch, ...)
ComplexityEstimate Predicted flops, iterations, memory, and complexity class

Traits (traits.rs)

Trait Description
SolverEngine Core trait: solve(matrix, rhs, budget) -> SolverResult
SparseLaplacianSolver Extension for graph Laplacian systems and effective resistance
SublinearPageRank Extension for sublinear PPR: ppr(matrix, source, alpha, epsilon)

Error hierarchy (error.rs)

Error Cause
SolverError::NonConvergence Iteration budget exhausted without reaching tolerance
SolverError::NumericalInstability NaN/Inf or residual growth > 2x detected
SolverError::SpectralRadiusExceeded Spectral radius >= 1.0 (Neumann series would diverge)
SolverError::BudgetExhausted Wall-clock time limit exceeded
SolverError::InvalidInput Dimension mismatch, non-finite values, index out of bounds
SolverError::BackendError Backend-specific failure (nalgebra, BLAS)

Infrastructure modules

Module Description
router.rs SolverRouter for automatic algorithm selection; SolverOrchestrator with fallback
simd.rs AVX2-accelerated SpMV with runtime feature detection
validation.rs CSR structural validation (index bounds, monotonic row_ptr, NaN/Inf)
arena.rs SolverArena bump allocator for zero per-iteration heap allocation
budget.rs ComputeBudget enforcement during solve
audit.rs Audit logging for solver invocations
events.rs Event system for solver lifecycle hooks

Testing

The crate includes 177 tests (138 unit tests + 39 integration/doctests):

# Run all tests
cargo test -p ruvector-solver

# Run tests with all algorithms enabled
cargo test -p ruvector-solver --features all-algorithms

# Run a specific test module
cargo test -p ruvector-solver -- neumann::tests

Benchmarks

Five Criterion benchmark groups are provided:

# Run all benchmarks
cargo bench -p ruvector-solver

# Run a specific benchmark
cargo bench -p ruvector-solver --bench solver_neumann
Benchmark Description
solver_baseline Baseline SpMV and vector operations
solver_neumann Neumann solver convergence on tridiagonal systems
solver_cg Conjugate Gradient on SPD matrices
solver_push Forward/backward push on graph adjacency matrices
solver_e2e End-to-end solve through the router with algorithm selection

Step 1: Build a CSR matrix

use ruvector_solver::types::CsrMatrix;

// 4x4 tridiagonal system (diagonally dominant)
let a = CsrMatrix::<f32>::from_coo(4, 4, vec![
    (0, 0, 3.0), (0, 1, -1.0),
    (1, 0, -1.0), (1, 1, 3.0), (1, 2, -1.0),
    (2, 1, -1.0), (2, 2, 3.0), (2, 3, -1.0),
    (3, 2, -1.0), (3, 3, 3.0),
]);
let b = vec![2.0f32, 1.0, 1.0, 2.0];

Step 2: Choose a solver

use ruvector_solver::neumann::NeumannSolver;

let solver = NeumannSolver::new(1e-6, 500);
let result = solver.solve(&a, &b).unwrap();

println!("Solution:   {:?}", result.solution);
println!("Iterations: {}", result.iterations);
println!("Residual:   {:.2e}", result.residual_norm);

Step 3: Use the automatic router

use ruvector_solver::router::{SolverRouter, QueryType};
use ruvector_solver::types::{CsrMatrix, ComputeBudget};

let a64 = CsrMatrix::<f64>::from_coo(4, 4, vec![/* ... */]);
let b64 = vec![2.0, 1.0, 1.0, 2.0];
let budget = ComputeBudget::default();

let router = SolverRouter::new();
let (algo, result) = router.solve(&a64, &b64, &budget, QueryType::LinearSystem).unwrap();
println!("Router selected: {:?}", algo);

Step 4: Validate input

use ruvector_solver::validation::validate_csr_matrix;

let errors = validate_csr_matrix(&a);
assert!(errors.is_empty(), "CSR validation failed: {:?}", errors);

Step 5: Benchmark

cargo bench -p ruvector-solver --bench solver_neumann
cargo bench -p ruvector-solver --bench solver_e2e
use ruvector_solver::forward_push::ForwardPushSolver;
use ruvector_solver::types::CsrMatrix;

// Build adjacency matrix for a small graph
let adj = CsrMatrix::<f32>::from_coo(4, 4, vec![
    (0, 1, 1.0), (1, 0, 1.0),
    (1, 2, 1.0), (2, 1, 1.0),
    (2, 3, 1.0), (3, 2, 1.0),
    (0, 3, 1.0), (3, 0, 1.0),
]);

let solver = ForwardPushSolver::new(0.85, 1e-6);  // alpha=0.85
let ppr = solver.ppr(&adj, 0);  // PPR from node 0

println!("PPR scores: {:?}", ppr);

Related Crates

Crate Role
ruvector-attn-mincut Min-cut gating using graph solvers
ruvector-coherence Coherence metrics for attention comparison
ruvector-profiler Benchmarking memory, power, latency

Minimum Supported Rust Version

Rust 1.77 or later.

License

Licensed under the MIT License.