sparsers

sparse-rs; sparse matrix computation written in rust

version note

v0.1.1
- ExecutionPolicy added to common.rs.

Why reinvent another wheel?

As we know, there're great frameworks to deal with matrix multiplication including sparse matrices. As a Deep Learning Engineer who has lived and breathed PyTorch for years, I recognize it as a great framework. However, we’ve all been there: wrestling with torch-sparse or torch-scatter dependency hell during deployment, or hitting performance walls in CPU-bound sparse operations. Perhaps one day, when the industry fully embraces Rust for ML(who knows? lol), this kind of works could be a "building block" for rustaceans.
So called "agentic engineering" is rising. We're truly living in an era of "Agentic Engineering." I co-work with AI to write codes in my day-to-day work. It's efficient, but something was missing-the feeling of "hand made". I missed the deep dives into documentation, the meditative struggle with the borrow checker, and the rewarding "click" when rust-analyzer finally clears those red lines. To me, this project, sparsers, isn't just about performance. it's kinda nostalgia for the vanshing craftmanship

Features

spmm: sparse X dense matrix multiplication. see also
sddmm: multiplies two dense matrices X1 and X2, then elementwise-multiplies the result with sparse matrix A at the nonzero locations. like this: dgl

naive benchmark

machine spec

Component	Specification
CPU	AMD Ryzen 5 7600 (Zen 4 Architecture)
Cores / Threads	6 Cores / 12 Threads
L3 Cache	32 MiB (Unified)

sparsers and torch (num_threads=12)

Implementation	runtime
torch	572.08 ms
sparsers	146.48 ms

alt text

  import torch
  import numpy as np
  import torch.utils.benchmark as benchmark
  from scipy.sparse import load_npz

  def setup_data(npz_path, feature_dim=128):
      sparse_matrix = load_npz(npz_path)
      
      crow_indices = torch.from_numpy(sparse_matrix.indptr).to(torch.int64)
      col_indices = torch.from_numpy(sparse_matrix.indices).to(torch.int64)
      values = torch.from_numpy(sparse_matrix.data).to(torch.float64)
      
      csr_tensor = torch.sparse_csr_tensor(
          crow_indices, col_indices, values, 
          size=sparse_matrix.shape, dtype=torch.float64
      )
      
      dense_b = torch.ones((sparse_matrix.shape[1], feature_dim), dtype=torch.float64)
      
      return csr_tensor, dense_b

  def bench_pytorch_spmm(csr, dense_b):
      return torch.mm(csr, dense_b)

  if __name__ == "__main__":
      npz_path = "/home/seom/workspace/sparsers/data/roadNet-CA.csr.npz"
      csr, dense_b = setup_data(npz_path)

      t0 = benchmark.Timer(
          stmt='bench_pytorch_spmm(csr, dense_b)',
          setup='from __main__ import bench_pytorch_spmm',
          globals={'csr': csr, 'dense_b': dense_b},
          num_threads=1,
          label='SpMM Performance',
          sub_label='PyTorch Native CSR',
          description='roadNet-CA'
      )

      
      t1 = benchmark.Timer(
          stmt='bench_pytorch_spmm(csr, dense_b)',
          setup='from __main__ import bench_pytorch_spmm',
          globals={'csr': csr, 'dense_b': dense_b},
          num_threads=12, 
          label='SpMM Performance',
          sub_label='PyTorch Native CSR (Multi-thread)',
          description='roadNet-CA'
      )

      print(t0.blocked_autorange())
      print(t1.blocked_autorange())

simple example with ndarray

you can find more examples at examples/

use ndarray::{Array2, array};
use sparsers::{sddmm_ndarray, spmm_ndarray};

fn main() {
    // Example: SpMM using CSR matrix converted from 10 * 10 ndarray
    let sparse_array = array![
        [0.0, 0.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
        [4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
        [0.0, 5.0, 6.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
        [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.0, 0.0, 0.0, 0.0],
        [0.0, 0.0, 0.0, 8.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
        [0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 0.0, 0.0, 0.0, 0.0],
        [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.0, 0.0, 0.0],
        [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 11.0, 0.0],
        [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.0],
        [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
    ];

    let dense_matrix = array![
        // Dense matrix (10x2)
        [1.0, 2.0],
        [3.0, 4.0],
        [5.0, 6.0],
        [7.0, 8.0],
        [9.0, 10.0],
        [11.0, 12.0],
        [13.0, 14.0],
        [15.0, 16.0],
        [17.0, 18.0],
        [19.0, 20.0],
    ];
    let result = spmm_ndarray::<u32, f32>(sparse_array.view(), dense_matrix.view());
    println!("Result of SpMM:\n{:#?}", result);

    // Example: SDDMM using CSR matrix converted from 5 * 5 ndarray
    let sparse_array = array![
        [0.0, 0.0, 3.0, 0.0, 0.0],
        [4.0, 0.0, 0.0, 0.0, 0.0],
        [0.0, 5.0, 6.0, 0.0, 0.0],
        [0.0, 0.0, 0.0, 0.0, 7.0],
        [0.0, 0.0, 0.0, 8.0, 0.0],
    ];

    let dense_matrix_d1 = array![
        // Dense matrix d1 (5x1)
        [1.0],
        [2.0],
        [3.0],
        [4.0],
        [5.0],
    ];
    let dense_matrix_d2 = array![
        // Dense matrix d2 (1x5)
        [1.0, 2.0, 3.0, 4.0, 5.0],
    ];

    let result = sddmm_ndarray::<u32, f32>(
        sparse_array.view(),
        dense_matrix_d1.view(),
        dense_matrix_d2.view(),
    );
    println!("Result of SDDMM:\n{:#?}", result);
}

things to do

add more ops to kernel: transpose(csr to csc), broadcasting element-wise ops, etc.
python binding(pyo3)
comprehensive benchmarks

sparsers 0.1.1

sparsers

version note

Why reinvent another wheel?

Features

naive benchmark

machine spec

sparsers and torch (num_threads=12)

simple example with ndarray

things to do