libsvm-rs 0.2.0

Pure Rust reimplementation of LIBSVM — SVM training and prediction
Documentation

libsvm-rs

A pure Rust reimplementation of the classic LIBSVM library, targeting numerical equivalence and model-file compatibility.

Crates.io Documentation License

Status: Early development (February 2026). Core implementation not yet started.

What is LIBSVM?

LIBSVM is one of the most widely cited machine learning libraries ever created:

  • Authors: Chih-Chung Chang and Chih-Jen Lin (National Taiwan University).
  • First release: ~2000, still actively maintained (v3.37, December 2025).
  • Citations: >53,000 (Google Scholar) for the original paper.
  • Core functionality: Efficient training and inference for Support Vector Machines (SVMs).
    • Classification: C-SVC, ν-SVC
    • Regression: ε-SVR, ν-SVR
    • Distribution estimation / novelty detection: one-class SVM
  • Key features:
    • Multiple kernels: linear, polynomial, RBF (Gaussian), sigmoid, precomputed.
    • Probability estimates (via Platt scaling).
    • Cross-validation and parameter selection helpers.
    • Simple text-based model format for interoperability.
    • CLI tools: svm-train, svm-predict, svm-scale.
  • Strengths: Battle-tested SMO (Sequential Minimal Optimization) solver, excellent performance on sparse/high-dimensional data (text classification, bioinformatics, sensor data), compact codebase (~3,300 LOC core).

Why a Pure Rust Port?

Existing Rust options for SVMs don't provide full LIBSVM-compatible training:

Option Type Pros Cons
libsvm FFI bindings to C++ Full feature parity Stale (last updated 2022), requires native build
linfa-svm Pure Rust (linfa) Modern API, active Different algorithms/heuristics, not compatible
smartcore Pure Rust Good coverage, active Approximate solver, not LIBSVM-equivalent
ffsvm Pure Rust LIBSVM model loading, fast inference Prediction only — no training

This project aims to fill the gap by providing:

  • Numerical equivalence with LIBSVM (same predictions and model files on benchmark datasets, within floating-point tolerance).
  • Full memory/thread safety via Rust's ownership model — no undefined behavior in sparse data handling.
  • Zero C/C++ dependencies at runtime (pure Rust, no native linkage).
  • Fearless concurrency (e.g., parallel cross-validation with Rayon).
  • Easy deployment: single binary, WebAssembly support for browser inference.
  • Modern ergonomics while preserving compatibility (builders, iterators, Result-based error handling).

Ideal for:

  • Reproducible research needing LIBSVM-compatible results.
  • Embedded/lightweight ML (WASM, edge devices).
  • Rust data/ML pipelines without native build headaches.

A Note on Numerical Equivalence

We target numerical equivalence, not bitwise identity. Floating-point results across different compilers (GCC vs LLVM) and languages are not guaranteed to be identical due to operation reordering, FMA instructions, and intermediate precision differences. This is an open problem even within C++ itself.

In practice, this means:

  • Identical predicted labels on benchmark datasets.
  • Probabilities within ~1e-8 tolerance.
  • Model files interoperable with original LIBSVM (loadable by either implementation).
  • Same support vectors selected (barring degenerate tie-breaking cases).

Goals

  1. Compatibility

    • Pass all official LIBSVM test scenarios.
    • Equivalent output (predictions, probabilities, model files) on standard datasets (heart_scale, a9a, etc.).
    • Model files readable by both this library and original LIBSVM.
  2. Safety

    • 100% safe Rust where possible (no unsafe unless heavily justified and tested).
    • Comprehensive error handling (thiserror).
    • Graceful handling of malformed input.
  3. Performance

    • Target: match original C++ speed after optimization (initial port may be 10–20% slower).
    • Optional Rayon parallelism for cross-validation and grid search.
  4. Extras (Post-MVP)

    • PyO3 bindings for Python drop-in replacement.
    • WASM examples.
    • Optional dense matrix support via ndarray.

Features Roadmap

  • Core data structures (SvmNode, SvmProblem, SvmParameter, SvmModel)
  • All kernels (linear, polynomial, RBF, sigmoid, precomputed)
  • Kernel cache
  • Full SMO solver (C-SVC, ν-SVC, ε-SVR, ν-SVR, one-class)
  • Shrinking heuristic
  • Probability estimates (Platt scaling)
  • Cross-validation (parallel optional)
  • Model save/load (exact LIBSVM text format)
  • CLI tools: svm-train-rs, svm-predict-rs, svm-scale-rs
  • Comprehensive test suite with reference outputs

Installation

# Cargo.toml — when published
[dependencies]
libsvm-rs = "0.1.0"

Until published:

cargo add libsvm-rs --git https://github.com/ricardofrantz/libsvm-rs

Usage Example

use libsvm_rs::{SvmParameter, SvmType, KernelType, Trainer, Predictor};

let mut param = SvmParameter::default();
param.svm_type = SvmType::CSvc;
param.kernel_type = KernelType::Rbf;
param.gamma = 0.5;
param.c = 1.0;

let problem = /* load your svm_problem */;
let model = Trainer::train(&problem, &param)?;

let nodes = /* your test instance as Vec<SvmNode> */;
let prediction = Predictor::predict(&model, &nodes);
println!("Predicted label: {}", prediction);

See examples/ for full demos (once implemented).

Development Plan

Project Structure

src/
  lib.rs
  types.rs      # SvmNode, SvmProblem, SvmParameter, SvmModel
  kernel.rs     # kernel functions + cache
  solver.rs     # core SMO
  cache.rs      # LRU kernel cache
  io.rs         # model/problem parsing (LIBSVM text format)
  bin/
    train.rs
    predict.rs
    scale.rs
tests/
  integration/
examples/
benches/

Phases

Phase Description Estimated Effort
0 Repository setup, CI, dependencies 1–2 days
1 Data structures & I/O (parsing, model format) 1–2 weeks
2 Kernels, cache & prediction (load pre-trained models, verify) 1–2 weeks
3 Core SMO solver (all SVM types) 6–12 weeks
4 Probability estimates, shrinking, cross-validation 2–4 weeks
5 CLI tools (svm-train-rs, svm-predict-rs, svm-scale-rs) 1–2 weeks
6 Testing & validation (reference outputs, fuzzing, benchmarks) Ongoing
7 Documentation, polish, publish to crates.io 1–2 weeks

Total estimated effort: 3–6 months.

Phase 3 is the bulk of the work — the SMO solver in svm.cpp is ~1,000 lines of subtle numerical code with heuristics (working set selection, shrinking, cache management). Translating C++ manual memory management to Rust ownership patterns, plus verifying numerical correctness across all SVM types, is the primary challenge.

Key References

Testing Strategy

  1. Run original LIBSVM on benchmark datasets → save all outputs as reference.
  2. Integration tests compare against reference:
    • Exact label matches.
    • Probabilities within tolerance (float-cmp with ε ≈ 1e-8).
    • Model file compatibility (load in both directions).
  3. Include regression suite from official LIBSVM tools/ subdirectory.
  4. Fuzz with cargo-fuzz on input parsing.
  5. Benchmark with criterion against original C++ implementation.

Contributing

Contributions welcome! Especially:

  • Translating specific solver components.
  • Adding dataset-based tests.
  • Performance improvements (preserving numerical behavior).

Open an issue first for major changes.

License

BSD-3-Clause (same as original LIBSVM) for maximum compatibility.

Acknowledgments

  • Original LIBSVM by Chih-Chung Chang and Chih-Jen Lin.
  • Existing Rust ML ecosystem (linfa, smartcore, ffsvm) for prior art.