Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
kryst
High-performance Krylov subspace and preconditioned iterative solvers for dense and sparse linear systems, with advanced preconditioning strategies and automated parameter optimization.
Features
Iterative Solvers
- Krylov Methods: CG, PCG, GMRES, FGMRES, BiCGStab, CGS, QMR, TFQMR, MINRES, CGNR
- Direct Methods: LU and QR factorization via PREONLY solver type
- Parallel Support: Shared-memory (Rayon) and distributed-memory (MPI) parallelism
Preconditioners
Basic Preconditioners
- Jacobi: Diagonal scaling preconditioner
- Block Jacobi: Block-wise diagonal preconditioning
- SOR/SSOR: Successive Over-Relaxation methods
- None: No preconditioning (identity)
Incomplete Factorizations
- ILU(0): Zero fill-in incomplete LU factorization
- ILU(k): Incomplete LU with k levels of fill-in
- ILUT: Threshold-based incomplete LU factorization
- ILUTP: ILUT with partial pivoting
- ILUP: Incomplete LU with partial pivoting
Advanced Preconditioners
- Chebyshev: Enhanced polynomial preconditioning with eigenvalue estimation
- AMG: Algebraic Multigrid with configurable smoothing parameters
- ASM: Additive Schwarz Method (domain decomposition)
- Approximate Inverse: SPAI-type approximate inverse preconditioners
Composite Preconditioning
- PC-Chaining: Sequential application of multiple preconditioners via
pc_chainoption - Enhanced Chebyshev: Matrix-aware polynomial preconditioning with automatic eigenvalue estimation
- Smoothed AMG: Configurable pre- and post-smoothing parameters (
amg_nu_pre,amg_nu_post)
MPI support notes
- MPI-local (per-rank): AMG, Chebyshev, SOR/SSOR, ILU/ILUT/ILUP/ILUTP, Approximate Inverse, LU/QR (dense-direct). These operate on the local block of a distributed matrix.
- Distributed-capable: ASM over
DistCsrOp, Block Jacobi onDistCsrOp, andSuperLU_DIST(when thesuperlu_distfeature is enabled). - PC-Chaining works on MPI, but each preconditioner in the chain keeps its local vs distributed behavior.
Monitoring & Automation
- Iteration Monitoring: Real-time convergence tracking with
IterationMonitor - Parameter Tuning: Automated optimization with
ParameterTunerand grid search - Data Export: CSV output for convergence analysis with
enable_csv_logging() - Performance Metrics: Comprehensive timing and convergence rate analysis
Scalar Modes
- Real (default): Builds without extra features keep all public APIs monomorphic on
f64. - Complex (
--features complex): Internals promote Kryst's scalar aliasStonum_complex::Complex64while the Matrix Market tooling converts boundary data to and from complex storage.
S is the internal scalar alias and R is its real partner. In real builds
S = R = f64. In complex builds S = Complex64 and R = f64.
Cargo Features
| Feature | Enables | Notes |
|---|---|---|
mpi |
MPI communication backend | Requires MPI installed; examples run via mpirun |
complex |
Complex scalar S |
Classical and pipelined GMRES/FGMRES variants are supported |
backend-faer |
Dense/CSR backends and most PCs | Default feature |
| backend flags | Direct solvers / matrix backends | e.g. superlu_dist (where available) |
Cargo feature summary
mpi— enable distributed-memory execution via thempicrate. Optional and independent from Rayon.rayon— turn on shared-memory parallel kernels. Combine with-ksp_threadsto size the worker pool.complex— lift internal kernels toComplex64while keeping the public API monomorphic onf64inputs.logging— route internal tracing to thelogfacade for integration with env_logger or similar backends.backend-faer+rayon+mpi— supported for distributed runs with parallel local kernels; seedocs/matrix_features.mdfor the expected feature combinations and matrix capabilities.
Latency-aware solver knobs
The Krylov drivers expose command-line options to balance global reductions
against additional local work. The most common flags mirror PETSc's -ksp_*
options and can be combined with the deterministic reduction feature for
reproducible CI runs.
| Flag | Default | Effect |
|---|---|---|
| `-ksp_cg_variant classic | pipelined` | classic |
-ksp_reproducible |
false |
Enable deterministic reductions (rank-ordered MPI sums and fixed-order local kernels). |
-ksp_threads <N> |
unset | Request N Rayon workers (requires --features rayon). Ignored in builds without Rayon. |
| `-ksp_gmres_variant classical | pipelined | sstep[:s]` |
-ksp_residual_replacement <iters> |
50 |
Force periodic residual recomputation in pipelined CG to control drift (0 disables). |
-ksp_trust_region <radius> |
unset | Enable CG trust-region safeguarding with the provided radius. |
| `-ksp_reorthog never | ifneeded | always` |
Rayon tuning
Recommended settings for local kernels:
-ksp_threads <N>selects the Rayon worker count used by Kryst kernels (shared-memory only).KRYST_PAR_CUTOFF=<rows>controls the minimum CSR row count before parallel SpMV is used (default4096); raise it if you see parallel overhead on small problems.
Legacy -ksp_cg_pipelined remains available as an alias for
-ksp_cg_variant pipelined. For bit-for-bit reproducibility, combine
-ksp_reproducible with -ksp_threads 1. When Rayon is enabled with more
than one worker, runs remain deterministic for a fixed thread count but may
differ across thread-count configurations.
Reproducible reductions
When -ksp_reproducible is enabled the solver switches to rank-ordered MPI
reductions and fixed-order local kernels. This guarantees bit-for-bit equality
between runs that use the same communicator size and Rayon thread count. For
strict reproducibility we recommend pinning Rayon to a single thread via
-ksp_threads 1 (or the RAYON_NUM_THREADS environment variable); otherwise,
results remain deterministic for the configured thread count but may differ
between thread-count configurations.
Reproducibility recipe (MPI + Rayon)
Use this configuration when validating deterministic reductions:
RAYON_NUM_THREADS=1
Each solver also records the number of global reductions performed in
SolveStats::counters.num_global_reductions, making it easy to assert expected
latency costs in automated tests.
Hybrid MPI + Rayon scaling recipes
Use these rules of thumb when combining MPI ranks with Rayon threads:
- Throughput-oriented runs: allocate threads per rank so that
(MPI ranks) × (threads per rank)matches physical cores. Start with-ksp_threads 2-4per rank and adjust based on local cache behavior and kernel mix (SpMV vs. ILU/ASM work). - Reproducibility-oriented runs: keep
-ksp_reproducibleenabled and fix the thread count per rank (-ksp_threads 1orRAYON_NUM_THREADS=1). Results remain deterministic for a fixed communicator size and thread count.
Example hybrid runs:
# Throughput-oriented: 4 ranks × 4 threads (16 cores total)
RAYON_NUM_THREADS=4
# Reproducible: 4 ranks × 1 thread
RAYON_NUM_THREADS=1
For performance studies across MPI-only, Rayon-only, and hybrid builds, run the
mpi_rayon_suite benchmark via cargo bench (see scripts/bench_mpi_rayon.sh)
to compare ILU and ASM preconditioner workloads on small/medium/large matrices.
Architecture
- PETSc-style API: Unified KSP context for runtime solver selection
- Command-line Options: Complete options database with 50+ parameters
- Trait-based Design: Extensible for custom matrices and preconditioners
- Memory Efficiency: In-place operations and configurable workspace management
- High Performance: Optimized inner kernels with SIMD and parallelization
- Matrix-Free Operators: Shell matrices for callback-based MatVec operations
- Setup Reuse: Two-phase API with preconditioner and workspace recycling
- CSR utilities: zero-copy
row_ptr/col_idx/valuesaccess and sparse kernels (spgemm, CSR Galerkin triple product)
Installation
Add to your Cargo.toml:
[]
= "1.0"
Feature Flags
[]
= [] # Opt in to exactly the features you need
= ["dep:rayon", "dep:num_cpus"]
= ["dep:mpi"]
= ["dep:log"]
= ["dep:num-complex"]
= [] # Auto-tuned std::simd sparse mat-vec kernels
= [] # Optional x86_64 gather/prefetch micro-tuning
Enabling the simd feature activates the runtime SpMV planner, which selects
between the scalar CSR baseline, a gather-based SIMD kernel, and a SELL-C-σ
kernel. Plans are built once per matrix (e.g., during AMG setup) and cached for
deterministic, allocation-free application time.
Quick Start
Basic Usage with KspContext (Recommended)
use *;
use DenseOp;
use Mat;
use Arc;
// Create a 100×100 test system
let n = 100;
let mat = from_fn;
let a = new;
let rhs = vec!;
let mut solution = vec!;
// Configure solver and preconditioner
let mut ksp = new;
ksp.set_type?
.set_pc_type?
.set_operators;
ksp.rtol = 1e-8;
ksp.maxits = 1000;
// Setup once then solve
ksp.setup?;
let stats = ksp.solve?;
println!;
Explicit Setup and Reuse
Reuse factorization and workspace across multiple solves by calling setup() once:
let mut ksp = new;
ksp.set_type?
.set_pc_type?
.set_operators;
ksp.setup?; // perform factorization and allocate workspace
for rhs in rhs_set.iter
Advanced Features: Composite Preconditioning
use KspContext;
use ;
let mut ksp_opts = default;
ksp_opts.ksp_type = Some;
let mut pc_opts = default;
pc_opts.pc_chain = Some;
pc_opts.chebyshev_degree = Some;
let mut ksp = new;
ksp.set_from_options?
.set_operators;
ksp.setup?;
let stats = ksp.solve?;
Enhanced AMG with Smoothing
use ;
use PcType;
use PcOptions;
let mut pc_opts = default;
pc_opts.amg_levels = Some;
pc_opts.amg_strength_threshold = Some;
pc_opts.amg_nu_pre = Some; // Pre-smoothing steps
pc_opts.amg_nu_post = Some; // Post-smoothing steps
let mut ksp = new;
ksp.set_type?
.set_pc_type?
.set_operators;
ksp.setup?;
let stats = ksp.solve?;
Iteration Monitoring and Analysis
use ;
use Duration;
// Monitor convergence behavior
let mut monitor = new;
// In practice, integrate monitor with solver iteration callbacks
// Automated parameter tuning
let mut tuner = new;
tuner.set_solver_types;
tuner.set_pc_types;
tuner.set_tolerances;
tuner.set_max_config_time;
let = tuner.tune_parameters.unwrap;
println!;
Command-line Interface (PETSc-style)
use ;
use KspContext;
// Parse command-line options
let args: = args.collect;
let = parse_all_options?;
// Configure from options
let mut ksp = new;
ksp.set_from_all_options?
.set_operators;
ksp.setup?;
let stats = ksp.solve?;
Run your program with PETSc-style options:
# Basic solver configuration
# Direct solvers
# Advanced preconditioning
# Show all available options
Supported Command-line Options
KSP (Krylov Solver) Options
-ksp_type <solver>- Solver type:cg,pcg,gmres,fgmres,bicgstab,cgs,qmr,tfqmr,minres,cgnr,preonly-ksp_rtol <float>- Relative convergence tolerance (default: 1e-5)-ksp_atol <float>- Absolute convergence tolerance (default: 1e-50)-ksp_dtol <float>- Divergence tolerance (default: 1e5)-ksp_max_it <int>- Maximum number of iterations (default: 10000)-ksp_gmres_restart <int>- GMRES restart parameter (default: 50)-ksp_pc_side <side>- Preconditioning side:left,right,symmetric-ksp_reproducible- Enable deterministic reductions; forces rank-ordered MPI sums and stable intra-rank chunking.
PC (Preconditioner) Options
Basic Preconditioner Options
-pc_type <pc>- Preconditioner type:jacobi,blockjacobi,sor,none
Incomplete Factorization Options
-pc_type <pc>- ILU variants:ilu0,ilu,ilut,ilutp,ilup-pc_ilu_levels <int>- ILU fill levels (default: 0)-pc_ilut_drop_tol <float>- ILUT drop tolerance (default: 1e-3)-pc_ilut_max_fill <int>- ILUT maximum fill per row (default: 10)
Enhanced Preconditioner Options
-pc_type chebyshev- Enhanced Chebyshev with eigenvalue estimation-chebyshev_degree <int>- Polynomial degree (default: 3)-pc_type amg- Algebraic multigrid with smoothing control-amg_levels <int>- Number of AMG levels (default: 4)-amg_strength_threshold <float>- Strong connection threshold (default: 0.25)-amg_nu_pre <int>- Pre-smoothing steps (default: 1)-amg_nu_post <int>- Post-smoothing steps (default: 1)
AMG CLI knobs
-pc_amg- shorthand alias for-pc_type amg.-pc_amg_coarsen <rs|hmis|pmis|falgout>- Coarsening strategy (maps toAMGConfig::coarsen_type).-pc_amg_interp <classical|direct|multipass|extended|standard>- Interpolation/extended-smoothing variant.-pc_amg_smoother <jacobi|gs|gsr|sgs|hgs|l1jacobi|chebyshev>- Smoother applied on each level.-pc_amg_smoother_steps <int>and-pc_amg_smoother_omega <float>control smoothing sweeps/relaxation weight.-pc_amg_truncation_factor <float>/-pc_amg_interp_maxnnz <int>trim interpolation fill.-pc_amg_rap_truncation_factor <float>/-pc_amg_rap_truncation_abs <float>/-pc_amg_rap_maxnnz <int>prune RAP entries.-pc_amg_keep_transpose <bool>/-pc_amg_keep_pivot_in_rap <bool>control symmetry-preserving entries.-pc_amg_require_spd <bool>/-pc_amg_print_setup <bool>control SPD enforcement and setup printing.
Example AMG invocation:
Composite Preconditioning Options
-pc_chain <string>- Sequential preconditioner chain (e.g., "jacobi,chebyshev")-pc_type asm- Additive Schwarz Method-pc_type approxinv- Approximate inverse preconditioner
ILU preconditioners
-pc_type ilu selects Kryst's HYPRE-inspired incomplete LU family (Ilu). -pc_type ilut/-pc_type ilutp run the lighter-weight row-filter ILUT or pivoting ILUTP preconditioners, while
-pc_type blockjacobi with -pc_local <ilu|ilut|ilutp> wraps a local ILU variant inside MPI
block-Jacobi. Setting -pc_type ilu with -pc_ilu_type ilut runs the canonical ILU threshold
factorization; Ilu::create_specialized may route that variant to crate::preconditioner::ilut::Ilut
for simplicity/efficiency.
| CLI flag | Config field | Notes |
|---|---|---|
| `-pc_ilu_type <ilu0 | milu0 | iluk |
-pc_ilu_level_of_fill <int> |
IluConfig::level_of_fill |
Controls level-of-fill for ILUK (typical 0–5). |
-pc_ilu_max_fill_per_row <int> |
IluConfig::max_fill_per_row |
Per-row fill cap for ILUK/ILUT; 10–50 keeps memory bounded. |
-pc_ilu_offdiag_drop_tolerance <float> |
IluConfig::offdiag_drop_tolerance |
Drop entries outside LU blocks. |
-pc_ilu_schur_drop_tolerance <float> |
IluConfig::schur_drop_tolerance |
For future Schur complements (currently dormant). |
| `-pc_ilu_triangular_solve <exact | jacobi | gauss_seidel>` |
-pc_ilu_lower_jacobi_iters <int> / -pc_ilu_upper_jacobi_iters <int> |
Jacobi iteration counts | Only used when the triangular solve is iterative. |
-pc_ilu_tolerance <float> / -pc_ilu_max_iterations <int> |
Iterative solve controls | Defaults 1e-6 & 1; iterative delivers residual-based refinement. |
-pc_ilu_parallel_factorization / -pc_ilu_parallel_trisolve / -pc_ilu_parallel_chunk_size <int> |
IluConfig::enable_parallel_*, parallel_chunk_size |
Enable experimental rayon paths; chunk size typically 16–256. |
-pc_ilut_drop_tol <float> |
IluConfig::drop_tolerance (row-filter ILUT) |
Simple heuristic ILUT drop threshold (1e-3–1e-6). |
-pc_ilut_max_fill <int> |
IluConfig::max_fill_per_row (row-filter ILUT) |
Limits kept entries per row (10–100). |
-pc_ilut_perm_tol <float> |
Pivot tolerance for row-filter ILUT | Not used by canonical Ilu but available for the lightweight ILUT preconditioner. |
-pc_ilutp_max_fill <int> / -pc_ilutp_drop_tol <float> / -pc_ilutp_perm_tol <float> |
Ilutp parameters |
Controls density, drop tolerance, and pivoting aggressiveness for ILUTP. |
Environment variables mirror the flags: KRYST_PC_ILU_TYPE, KRYST_PC_ILU_LEVEL_OF_FILL, KRYST_PC_ILU_MAX_FILL_PER_ROW, KRYST_PC_ILU_OFFDIAG_DROP_TOL, KRYST_PC_ILU_SCHUR_DROP_TOL, KRYST_PC_ILU_TRI_SOLVE, KRYST_PC_ILU_LOWER_JACOBI_ITERS, KRYST_PC_ILU_UPPER_JACOBI_ITERS, KRYST_PC_ILU_PARALLEL_FACTORIZATION, KRYST_PC_ILU_PARALLEL_TRISOLVE, KRYST_PC_ILU_PARALLEL_CHUNK_SIZE, plus KRYST_PC_ILUT_DROP_TOL, KRYST_PC_ILUT_MAX_FILL, KRYST_PC_ILUT_PERM_TOL, KRYST_PC_ILUTP_MAX_FILL, KRYST_PC_ILUTP_DROP_TOL, and KRYST_PC_ILUTP_PERM_TOL. Command-line flags override environment variables, which in turn override the built-in defaults.
Examples
The first line compares Jacobi vs ILU(0) on examples/poisson_spd_ilu0_vs_jacobi.rs; the second
shows ILUT tuning. The third line mirrors the convection–diffusion ILUTP demo
(examples/convection_diffusion_ilutp.rs), and the last line is the MPI block-Jacobi + ILU(0)
toy from examples/mpi_poisson_block_jacobi_ilu.rs.
Direct Solver Options
-pc_type lu- Direct LU factorization via SuperLU-pc_type qr- Direct QR factorization
Domain Decomposition Options
-asm_overlap <int>- ASM subdomain overlap (default: 1)-asm_type <type>- ASM variant:restrict,interpolate,basic
Usage Examples
# Enhanced Chebyshev preconditioning
# AMG with custom smoothing
# Composite preconditioning (PC-chaining)
# High-accuracy direct solve
# BiCGStab with threshold ILU
# GMRES with additive Schwarz
Monitoring and Automation
Iteration Monitoring
Track solver convergence with real-time monitoring:
use IterationMonitor;
use ;
use PcType;
use ;
use Duration;
// Create and configure monitor
let mut monitor = new;
monitor.enable_csv_logging.unwrap;
// Configure solver with monitoring callback
let monitor_ref = new;
let monitor_clone = clone;
let mut ksp = new;
ksp.set_type?
.set_pc_type?
.set_operators;
// Add monitoring callback
ksp.add_monitor;
// Solve with monitoring
ksp.setup?;
let stats = ksp.solve?;
// Analyze convergence
if let Ok = monitor_ref.lock
Automated Parameter Tuning
Optimize solver/preconditioner combinations automatically:
use ;
use SolverType;
use PcType;
use Duration;
let mut tuner = new;
// Configure search space
tuner.set_solver_types
.set_pc_types
.set_tolerances
.set_max_config_time;
// Add PC-chain configurations for composite preconditioning
tuner.add_pc_chains;
// Run automated tuning
let = tuner.tune_parameters.unwrap;
println!;
println!;
println!;
println!;
if let Some = &best_config.pc_chain
println!;
// Export results for further analysis
tuner.export_results.unwrap;
let summary = tuner.get_summary;
println!;
Advanced Monitoring Features
use IterationMonitor;
use Duration;
let mut monitor = new;
monitor.start_solve;
// Record some iterations
monitor.record_iteration;
monitor.record_iteration;
monitor.record_iteration;
// Mark convergence
monitor.mark_converged;
// Get detailed statistics
let stats = monitor.get_statistics;
println!;
println!;
println!;
println!;
println!;
// Check recent convergence behavior
if let Some = monitor.recent_convergence_rate
// Set up real-time monitoring callbacks
let mut ksp = new;
ksp.add_monitor;
Profiling and Performance Analysis
Enable detailed timing and performance information:
[]
= { = "1.0", = ["logging"] }
Run with environment variables for detailed profiling:
# Trace-level logging shows detailed stage timing
RUST_LOG=trace
# Debug-level shows major operations
RUST_LOG=debug
# Info-level shows high-level progress
RUST_LOG=info
Profiling output includes:
- KSPSetup: Preconditioner setup and workspace allocation timing
- KSPSolve: Complete solve time breakdown
- PCSetup: Individual preconditioner setup timing
- WorkspaceAllocation: Memory allocation timing
- MatVec: Matrix-vector product timing
- PCApply: Preconditioner application timing
Solver Algorithms
Krylov Methods
- CG: Conjugate Gradient for symmetric positive definite systems
- PCG: Preconditioned Conjugate Gradient
- GMRES: Generalized Minimal Residual with restart
- FGMRES: Flexible GMRES for variable preconditioning
- BiCGStab: BiConjugate Gradient Stabilized for nonsymmetric systems
- CGS: Conjugate Gradient Squared
- QMR: Quasi-Minimal Residual method
- TFQMR: Transpose-Free QMR
- MINRES: Minimal Residual for symmetric indefinite systems
- CGNR: Conjugate Gradient on the Normal Equations
Direct Methods
- PREONLY: Single-step direct solve using LU or QR factorization
- Supports both
-pc_type luand-pc_type qr - Ideal for well-conditioned systems where direct methods are preferred
Preconditioner Details
Basic Preconditioners
- Jacobi: Diagonal scaling
M⁻¹ = diag(A)⁻¹ - Block Jacobi: Block-wise diagonal preconditioning with configurable block sizes
- SOR/SSOR: Successive Over-Relaxation with configurable relaxation parameter
- None: Identity preconditioning (no preconditioning)
Incomplete Factorizations
- ILU(0): Zero fill-in incomplete LU factorization
- ILU(k): Incomplete LU with k levels of fill-in
- ILUT: ILU with threshold-based dropping strategy
- ILUTP: ILUT with partial pivoting for numerical stability
- ILUP: Incomplete LU with partial pivoting
Advanced Preconditioners
Enhanced Chebyshev
Enhanced polynomial preconditioning implementation based on eigenvalue estimation:
use Chebyshev;
use PcOptions;
use PcType;
// Enhanced Chebyshev with automatic eigenvalue estimation
let mut pc_opts = default;
pc_opts.chebyshev_degree = Some; // Higher degree for better approximation
ksp.set_pc_type?;
Features:
- Matrix-aware: Automatic eigenvalue bound estimation using power iteration
- Configurable Degree: Polynomial degree optimization (default: 3, range: 1-20)
- Storage Efficient: Reuses matrix storage for eigenvalue computation
- Robust: Handles near-singular matrices with adaptive bounds
Enhanced AMG
Advanced Algebraic Multigrid with configurable smoothing:
use Amg;
use PcOptions;
use PcType;
// Enhanced AMG with smoothing control
let mut pc_opts = default;
pc_opts.amg_levels = Some; // Multigrid levels
pc_opts.amg_strength_threshold = Some; // Strong connection threshold
pc_opts.amg_nu_pre = Some; // Pre-smoothing steps
pc_opts.amg_nu_post = Some; // Post-smoothing steps
ksp.set_pc_type?;
Features:
- Smoothed Multigrid: Configurable pre- and post-smoothing parameters
- Adaptive Coarsening: Automatic grid hierarchy construction based on strength
- Strength Threshold: Customizable strong connection criteria (default: 0.25)
- Flexible Smoothing: Separate control of pre/post smoothing iterations
Composite Preconditioning
PC-chaining allows sequential application of multiple preconditioners:
use ;
// Example 1: Jacobi + Chebyshev combination
let mut pc_opts = default;
pc_opts.pc_chain = Some;
pc_opts.chebyshev_degree = Some;
ksp.set_from_options?;
// Example 2: Multi-stage preconditioning
let mut pc_opts = default;
pc_opts.pc_chain = Some;
ksp.set_from_options?;
// Example 3: Domain decomposition + multigrid
let mut pc_opts = default;
pc_opts.pc_chain = Some;
pc_opts.amg_nu_pre = Some;
ksp.set_from_options?;
Features:
- Flexible Combinations: Mix any preconditioner types in sequence
- Automatic Setup: Transparent handling of composite preconditioner construction
- Parameter Inheritance: Specialized parameters apply to respective stages
- Performance Tuning: Optimize combinations via
ParameterTuner
Domain Decomposition
- ASM: Additive Schwarz Method with configurable overlap
- Approximate Inverse: SPAI-type sparse approximate inverse
Performance Features
Parallelization
- Shared Memory: Rayon-based parallel execution for matrix operations and preconditioner application
- Distributed Memory: MPI support for distributed linear algebra operations (via mpi feature)
- SIMD Optimization: Leverages hardware acceleration through optimized inner kernels via faer
- Parallel Preconditioners: Thread-safe preconditioner application with work stealing
Memory Management
- In-place Operations: Minimizes memory allocations during iteration
- Workspace Reuse: Preallocated workspace vectors for Krylov methods
- Block Operations: Efficient cache usage through blocked algorithms
- Sparse Patterns: Memory-efficient storage for sparse matrices and preconditioners
Algorithm Optimizations
- Eigenvalue Estimation: Fast power iteration for Chebyshev eigenvalue bounds
- Adaptive Restart: GMRES restart optimization based on convergence behavior
- Early Termination: Configurable stopping criteria with multiple tolerance options
- Matrix Preprocessing: Reordering and scaling for improved conditioning
Matrix Support
Dense Matrices
- Full support via
faer::Mat<T>integration - Optimized BLAS-level operations
- Support for f32, f64 precision
- Efficient dense matrix-vector products
Sparse Matrices
- Custom CSR format implementation
- Efficient sparse matrix-vector products
- Pattern-based optimization for preconditioners
- Memory-efficient storage with configurable sparsity patterns
Matrix-Free Methods
- Trait-based
MatVecinterface for custom matrix implementations - Support for implicit matrix representations
- Easy integration of matrix-free operators
- Efficient for PDE discretizations and other structured problems
Examples and Demonstrations
The library includes comprehensive demonstration programs:
Basic Usage Examples
# Options and CLI interface demonstration
# Direct solver usage
# Matrix market file demonstration
Advanced Feature Examples
# Convergence behavior analysis
# Iteration monitoring demonstration
# HYPRE-style ILU demonstration
# MPI parallel examples (requires MPI)
Note: Matrix Market example files (*.mtx) are excluded from the published crate to stay within size limits. The matrix_market_demo example will auto-generate test data if example files are not found.
Command-line Examples
# Enhanced Chebyshev preconditioning
# AMG with custom smoothing parameters
# Composite preconditioning with PC-chaining
# High-precision direct solve
# Complex preconditioner combinations
Benchmarks and Performance
Performance benchmarks are available via:
Benchmark categories include:
- Solver Comparison: GMRES vs BiCGStab vs CG performance on various problems
- Preconditioner Effectiveness: Impact of different preconditioners on convergence
- Direct vs Iterative: Performance comparison for different problem sizes
- Parallel Scaling: Shared-memory (Rayon) and distributed-memory (MPI) performance
- Phase III Features: PC-chaining and enhanced preconditioning performance
- Memory Usage: Workspace allocation and memory efficiency analysis
Sample benchmark results (varies by system and problem):
solver_comparison/gmres time: 45.2 ms (convergence: 23 iterations)
solver_comparison/bicgstab time: 38.7 ms (convergence: 31 iterations)
solver_comparison/cg time: 22.1 ms (convergence: 18 iterations)
pc_effectiveness/jacobi time: 156 ms (convergence: 89 iterations)
pc_effectiveness/amg time: 67.3 ms (convergence: 12 iterations)
pc_chaining/jacobi+cheby time: 43.8 ms (convergence: 15 iterations)
Custom Extensions
Custom Solvers
use ;
Custom Preconditioners
use ;
Matrix-Free Operators
use MatVec;
use KError;
// Usage with KspContext
use Arc;
let laplacian = new;
let mut ksp = new;
ksp.set_type?
.set_pc_type?
.set_operators;
// Can use matrix-free operator directly
let rhs = vec!;
let mut sol = vec!;
ksp.setup?;
let stats = ksp.solve?;
Documentation and Resources
- API Documentation - Complete API reference with examples
- Repository - Source code, issues, and discussions
- Examples Directory - Comprehensive demonstration programs
- Benchmarks - Performance comparison suite
- Phase III/IV Summary - Advanced preconditioning and automation features
Mathematical References
- Saad, Y. (2003). Iterative Methods for Sparse Linear Systems, 2nd Edition. SIAM.
- Barrett, R. et al. (1994). Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM.
- Trefethen, L.N. & Bau, D. (1997). Numerical Linear Algebra. SIAM.
- Briggs, W.L., Henson, V.E. & McCormick, S.F. (2000). A Multigrid Tutorial, 2nd Edition. SIAM.
Software References
- PETSc Documentation: https://petsc.org/release/documentation/
- Trilinos Documentation: https://trilinos.github.io/
Testing and Validation
Run the comprehensive test suite:
# All tests
# Specific test categories
# Integration tests
# With specific features
# Performance testing
MPI/Rayon targeted matrix tests
The matrix feature matrix and MPI/Rayon test plan live in
docs/matrix_features.md. Use them to validate communicator reductions,
distributed SpMV/halo exchange, and Rayon-local kernels for
backend-faer + mpi + rayon builds.
Minimal MPI CI recipe
Use the following steps as a minimal MPI validation recipe (local or CI):
Test Coverage
- Unit Tests: 200+ individual component tests across solvers, preconditioners, and utilities
- Integration Tests: End-to-end validation including monitor integration and parameter tuning
- Options Tests: CLI parsing and configuration validation
- Feature Tests: Advanced functionality validation (PC-chaining, monitoring, tuning)
- Performance Tests: Benchmark validation and regression testing
Migration Guide
From Version 0.x to 1.0
New Features:
- Enhanced Chebyshev preconditioner with eigenvalue estimation
- AMG with configurable pre/post smoothing parameters
- PC-chaining for composite preconditioning
- Iteration monitoring and automated parameter tuning
- Expanded CLI options (50+ parameters)
Breaking Changes:
- None! Version 1.0 maintains full backward compatibility
Recommended Upgrades:
// Old approach
ksp.set_pc_type?;
// Enhanced approach (optional)
let mut pc_opts = default;
pc_opts.chebyshev_degree = Some;
ksp.set_pc_type?;
New Monitoring Capabilities:
// Add iteration monitoring
use IterationMonitor;
let mut monitor = new;
ksp.add_monitor;
// Add automated parameter tuning
use ParameterTuner;
let mut tuner = new;
let = tuner.tune_parameters.unwrap;
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Development Setup
-
Clone the repository:
-
Install Rust (stable toolchain recommended):
| -
Optional: Install MPI for distributed features:
# Ubuntu/Debian # macOS -
Run tests and benchmarks:
Developer scripts
scripts/ci_checks.sh– runscargo fmt --all -- --check,cargo clippy --all-targets --all-features, andcargo test --all-features.scripts/ub_paranoia.sh– executes ASan-enabled tests on the nightly toolchain for the buffer pool and dot engines.scripts/miri_reduction.sh– runs the same focused suite undercargo miri(nightly) to catch UB in the unsafe utilities.
Areas for Contribution
High Priority
- GPU Acceleration: CUDA/OpenCL backends for matrix operations
- Additional Solvers: LOBPCG, IDR(s), BiCGStab(l) variants
- Matrix Formats: Coordinate (COO), block sparse (BSR) formats
- Performance: SIMD optimizations, better cache utilization
Medium Priority
- Multigrid Variants: Classical AMG, smoothed aggregation
- Eigenvalue Solvers: Integration with Krylov eigenvalue methods
- Nonlinear Solvers: Newton-Krylov, JFNK methods
- Adaptive Methods: Adaptive restart, dynamic tolerance adjustment
Lower Priority
- Complex Arithmetic: Complex-valued linear systems support
- Mixed Precision: fp16/fp32/fp64 combinations for accuracy/performance tradeoffs
- Advanced I/O: HDF5, NetCDF matrix I/O support
- Visualization: Integration with plotting libraries for convergence analysis
Code Style and Standards
- Follow Rust standard formatting:
cargo fmt - Ensure clippy compliance:
cargo clippy - Add comprehensive tests for new features
- Include benchmark tests for performance-critical code
- Document public APIs with examples
- Follow semantic versioning for releases
Pull Request Process
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes and add tests
- Ensure all tests pass:
cargo test - Run formatting and linting:
cargo fmt && cargo clippy - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request with a clear description
kryst provides a comprehensive, high-performance linear algebra toolkit for the Rust ecosystem, with particular focus on iterative methods for large-scale scientific computing applications. The library combines the mathematical rigor of established numerical libraries like PETSc with the safety and performance characteristics of Rust, making it ideal for research, scientific computing, and production applications requiring robust linear system solvers.