libsvm-rs 0.3.0

Pure Rust reimplementation of LIBSVM — SVM training and prediction
Documentation

libsvm-rs

A pure Rust reimplementation of the classic LIBSVM library, targeting numerical equivalence and model-file compatibility.

Crates.io Documentation License

Status: Active development (February 2026). Phases 0–3 complete (types, I/O, kernels, cache, prediction, full SMO solver). Training works for all 5 SVM types.

What is LIBSVM?

LIBSVM is one of the most widely cited machine learning libraries ever created:

  • Authors: Chih-Chung Chang and Chih-Jen Lin (National Taiwan University).
  • First release: ~2000, still actively maintained (v3.37, December 2025).
  • Citations: >53,000 (Google Scholar) for the original paper.
  • Core functionality: Efficient training and inference for Support Vector Machines (SVMs).
    • Classification: C-SVC, ν-SVC
    • Regression: ε-SVR, ν-SVR
    • Distribution estimation / novelty detection: one-class SVM
  • Key features:
    • Multiple kernels: linear, polynomial, RBF (Gaussian), sigmoid, precomputed.
    • Probability estimates (via Platt scaling).
    • Cross-validation and parameter selection helpers.
    • Simple text-based model format for interoperability.
    • CLI tools: svm-train, svm-predict, svm-scale.
  • Strengths: Battle-tested SMO (Sequential Minimal Optimization) solver, excellent performance on sparse/high-dimensional data (text classification, bioinformatics, sensor data), compact codebase (~3,300 LOC core).

Why a Pure Rust Port?

Existing Rust options for SVMs don't provide full LIBSVM-compatible training:

Option Type Pros Cons
libsvm FFI bindings to C++ Full feature parity Stale (last updated 2022), requires native build
linfa-svm Pure Rust (linfa) Modern API, active Different algorithms/heuristics, not compatible
smartcore Pure Rust Good coverage, active Approximate solver, not LIBSVM-equivalent
ffsvm Pure Rust LIBSVM model loading, fast inference Prediction only — no training

This project aims to fill the gap by providing:

  • Numerical equivalence with LIBSVM (same predictions and model files on benchmark datasets, within floating-point tolerance).
  • Full memory/thread safety via Rust's ownership model — no undefined behavior in sparse data handling.
  • Zero C/C++ dependencies at runtime (pure Rust, no native linkage).
  • Fearless concurrency (e.g., parallel cross-validation with Rayon).
  • Easy deployment: single binary, WebAssembly support for browser inference.
  • Modern ergonomics while preserving compatibility (builders, iterators, Result-based error handling).

Ideal for:

  • Reproducible research needing LIBSVM-compatible results.
  • Embedded/lightweight ML (WASM, edge devices).
  • Rust data/ML pipelines without native build headaches.

A Note on Numerical Equivalence

We target numerical equivalence, not bitwise identity. Floating-point results across different compilers (GCC vs LLVM) and languages are not guaranteed to be identical due to operation reordering, FMA instructions, and intermediate precision differences. This is an open problem even within C++ itself.

In practice, this means:

  • Identical predicted labels on benchmark datasets.
  • Probabilities within ~1e-8 tolerance.
  • Model files interoperable with original LIBSVM (loadable by either implementation).
  • Same support vectors selected (barring degenerate tie-breaking cases).

Goals

  1. Compatibility

    • Pass all official LIBSVM test scenarios.
    • Equivalent output (predictions, probabilities, model files) on standard datasets (heart_scale, a9a, etc.).
    • Model files readable by both this library and original LIBSVM.
  2. Safety

    • 100% safe Rust where possible (no unsafe unless heavily justified and tested).
    • Comprehensive error handling (thiserror).
    • Graceful handling of malformed input.
  3. Performance

    • Target: match original C++ speed after optimization (initial port may be 10–20% slower).
    • Optional Rayon parallelism for cross-validation and grid search.
  4. Extras (Post-MVP)

    • PyO3 bindings for Python drop-in replacement.
    • WASM examples.
    • Optional dense matrix support via ndarray.

Features Roadmap

  • Core data structures (SvmNode, SvmProblem, SvmParameter, SvmModel)
  • All kernels (linear, polynomial, RBF, sigmoid, precomputed)
  • Kernel cache (O(1) LRU)
  • Model save/load (exact LIBSVM text format, byte-exact roundtrip)
  • Prediction (verified zero mismatches against C svm-predict)
  • Full SMO solver (C-SVC, ν-SVC, ε-SVR, ν-SVR, one-class)
  • Shrinking heuristic
  • WSS3 working-set selection (Fan et al., JMLR 2005)
  • QMatrix implementations (SvcQ, OneClassQ, SvrQ)
  • Probability estimates (Platt scaling)
  • Cross-validation (parallel optional)
  • CLI tools: svm-train-rs, svm-predict-rs, svm-scale-rs
  • Comprehensive test suite with reference outputs

Installation

# Cargo.toml — when published
[dependencies]
libsvm-rs = "0.3.0"

Until published:

cargo add libsvm-rs --git https://github.com/ricardofrantz/libsvm-rs

Usage Example

use libsvm_rs::io::{load_problem, save_model};
use libsvm_rs::train::svm_train;
use libsvm_rs::predict::predict;
use libsvm_rs::{SvmParameter, SvmType, KernelType};

// Load training data
let problem = load_problem("heart_scale").unwrap();

// Set parameters
let param = SvmParameter {
    svm_type: SvmType::CSvc,
    kernel_type: KernelType::Rbf,
    gamma: 1.0 / 13.0,  // 1/num_features
    c: 1.0,
    ..Default::default()
};

// Train
let model = svm_train(&problem, &param);

// Predict
let label = predict(&model, &problem.instances[0]);
println!("Predicted label: {}", label);

// Save model (loadable by original LIBSVM)
save_model("heart_scale.model", &model).unwrap();

See examples/ for full demos (once implemented).

Development Plan

Project Structure

src/
  lib.rs
  types.rs      # SvmNode, SvmProblem, SvmParameter, SvmModel
  kernel.rs     # kernel functions + cache
  solver.rs     # core SMO
  cache.rs      # LRU kernel cache
  io.rs         # model/problem parsing (LIBSVM text format)
  bin/
    train.rs
    predict.rs
    scale.rs
tests/
  integration/
examples/
benches/

Phases

Phase Description Estimated Effort
0 Repository setup, CI, dependencies 1–2 days
1 Data structures & I/O (parsing, model format) 1–2 weeks
2 Kernels, cache & prediction (load pre-trained models, verify) 1–2 weeks
3 Core SMO solver (all SVM types) 6–12 weeks
4 Probability estimates, shrinking, cross-validation 2–4 weeks
5 CLI tools (svm-train-rs, svm-predict-rs, svm-scale-rs) 1–2 weeks
6 Testing & validation (reference outputs, fuzzing, benchmarks) Ongoing
7 Documentation, polish, publish to crates.io 1–2 weeks

Total estimated effort: 3–6 months.

Phase 3 is the bulk of the work — the SMO solver in svm.cpp is ~1,000 lines of subtle numerical code with heuristics (working set selection, shrinking, cache management). Translating C++ manual memory management to Rust ownership patterns, plus verifying numerical correctness across all SVM types, is the primary challenge.

Key References

Testing Strategy

  1. Run original LIBSVM on benchmark datasets → save all outputs as reference.
  2. Integration tests compare against reference:
    • Exact label matches.
    • Probabilities within tolerance (float-cmp with ε ≈ 1e-8).
    • Model file compatibility (load in both directions).
  3. Include regression suite from official LIBSVM tools/ subdirectory.
  4. Fuzz with cargo-fuzz on input parsing.
  5. Benchmark with criterion against original C++ implementation.

Contributing

Contributions welcome! Especially:

  • Translating specific solver components.
  • Adding dataset-based tests.
  • Performance improvements (preserving numerical behavior).

Open an issue first for major changes.

Changelog

v0.3.0 (February 2026) — SMO Solver

  • Full SMO solver for all 5 SVM types: C-SVC, ν-SVC, one-class, ε-SVR, ν-SVR
  • WSS3 working-set selection (second-order heuristic from Fan et al., JMLR 2005)
  • Shrinking heuristic with gradient reconstruction
  • QMatrix trait with three implementations: SvcQ, OneClassQ, SvrQ
  • svm_train function producing SvmModel compatible with C LIBSVM
  • Multiclass support via one-vs-one with class grouping and sv_coef assembly
  • Cache::swap_index bug fix — added column swap loop (critical for shrinking correctness)
  • Kernel refactorVec<&[SvmNode]> for swappable data point references
  • 50 tests (12 new), verified against C LIBSVM reference outputs

v0.2.0 (February 2026) — Prediction & I/O

  • Core types: SvmNode, SvmProblem, SvmParameter, SvmModel
  • All 5 kernel functions (linear, polynomial, RBF, sigmoid, precomputed)
  • LRU kernel cache
  • Model and problem I/O (LIBSVM text format, byte-exact roundtrip)
  • Prediction (zero mismatches against C svm-predict on heart_scale)
  • Parameter validation with ν-SVC feasibility check
  • 38 tests

v0.1.0 (February 2026) — Initial Release

  • Repository setup, workspace layout, CI

License

BSD-3-Clause (same as original LIBSVM) for maximum compatibility.

Acknowledgments

  • Original LIBSVM by Chih-Chung Chang and Chih-Jen Lin.
  • Existing Rust ML ecosystem (linfa, smartcore, ffsvm) for prior art.