single-algebra 🧮
A high-performance linear algebra library optimized for sparse matrices and dimensionality reduction algorithms. Designed for machine learning, data analysis, and scientific computing applications where efficiency with sparse data is crucial.
Features 🚀
- Sparse Matrix Operations: Efficient CSR/CSC matrix implementations with comprehensive operations
- Advanced PCA: Multiple PCA variants including standard and masked sparse PCA
- Flexible SVD: Support for both Lanczos and randomized SVD algorithms
- Feature Masking: Selective analysis of feature subsets for targeted dimensionality reduction
- Parallel Processing: Multi-threaded operations using Rayon for large datasets
- Memory Efficient: Optimized for large, sparse datasets that don't fit in memory
- Type Generic: Supports both
f32andf64numeric types - Utilities: Data preprocessing with normalization and logarithmic transformations
Core Modules 📊
Sparse Matrix Operations
- CSR/CSC Formats: Comprehensive sparse matrix support with efficient storage
- Matrix Arithmetic: Sum operations, column statistics, and element-wise operations
- Memory Optimization: Designed for large, high-dimensional sparse datasets
Dimensionality Reduction ⬇️
- Sparse PCA: Principal Component Analysis optimized for sparse CSR matrices
- Masked Sparse PCA: PCA with feature masking for selective analysis
- SVD Algorithms: Choice between Lanczos (exact) and randomized (fast) SVD methods
- Variance Analysis: Explained variance ratios and cumulative variance calculations
- Feature Importance: Component loading analysis for feature interpretation
Data Preprocessing �
- Normalization: Row and column normalization utilities
- Log Transformations: Log1P transformations for numerical stability
- Centering: Optional data centering for PCA and other algorithms
Installation
Add this to your Cargo.toml:
[]
= "0.8.6"
Usage Examples
Sparse PCA with Builder Pattern
use CsrMatrix;
use ;
use PowerIterationNormalizer;
// Create or load your sparse matrix (samples × features)
let sparse_matrix: = create_your_sparse_matrix;
// Build PCA with customized parameters
let mut pca = new
.n_components
.center
.verbose
.svd_method
.build;
// Fit and transform data
let transformed = pca.fit_transform.unwrap;
// Analyze results
let explained_variance_ratio = pca.explained_variance_ratio.unwrap;
let cumulative_variance = pca.cumulative_explained_variance_ratio.unwrap;
let feature_importance = pca.feature_importances.unwrap;
Masked Sparse PCA for Feature Subset Analysis
use ;
// Create a feature mask (true = include, false = exclude)
let feature_mask = vec!; // Include features 0, 2, 3, 5
// Build masked PCA
let mut masked_pca = new
.n_components
.mask
.center
.verbose
.svd_method
.build;
// Perform PCA on selected features only
let transformed = masked_pca.fit_transform.unwrap;
Sparse Matrix Operations
use ;
use MatrixSum;
// Create a sparse matrix
let mut coo = new;
// ... populate with data ...
let csr: = .into;
// Efficient column operations
let col_sums: = csr.sum_col.unwrap;
let col_squared_sums: = csr.sum_col_squared.unwrap;
Data Preprocessing
use ;
// Apply preprocessing transformations
let normalized_data = your_data.normalize?;
let log_transformed = your_data.log1p?;
Algorithm Selection Guide
When to Use Each PCA Variant
- SparsePCA: For standard dimensionality reduction on sparse matrices
- MaskedSparsePCA: When you need to analyze specific feature subsets or handle missing data patterns
SVD Method Selection
- Lanczos: More accurate, deterministic results. Best for smaller problems or when precision is critical
- Randomized: Faster computation, especially for large matrices. Configurable accuracy vs. speed trade-off
Performance Optimization
- Use sparse matrices (CSR format) for datasets with >90% zero values
- Enable verbose mode to monitor performance and convergence
- For very large datasets, consider using randomized SVD with appropriate oversampling
- Parallel processing is automatically utilized for transformation operations
Planned Features 🚧
- t-SNE: t-Distributed Stochastic Neighbor Embedding for non-linear visualization
- UMAP: Uniform Manifold Approximation and Projection for manifold learning
- Additional similarity measures: More distance metrics and similarity functions
- Batch processing: Enhanced support for processing data in chunks
Performance Focus
This library is specifically optimized for:
- Large sparse datasets (text analysis, genomics, recommendation systems)
- Memory-constrained environments
- High-dimensional data requiring dimensionality reduction
- Scientific computing workflows requiring numerical precision
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the BSD 3-Clause License - see the LICENSE.md file for details.
Acknowledgments
- The LAPACK integration is built upon the
nalgebra-lapackcrate - Some components are inspired by scikit-learn's implementations
- The Faer backend leverages the high-performance
faercrate