libsvm-rs 0.3.0

Pure Rust reimplementation of LIBSVM — SVM training and prediction

Coverage
89.52%
94 out of 105 items documented0 out of 43 items with examples
Size
Source code size: 144.9 kB This is the summed size of all the files inside the crates.io package for this release.
Documentation size: 7.71 MB This is the summed size of all files generated by rustdoc for all configured targets
Ø build duration
this release: 17s Average build duration of successful builds.
all releases: 22s Average build duration of successful builds in releases after 2024-10-23.
Links
Homepage
ricardofrantz/libsvm-rs
0 0 0
crates.io
Dependencies
Versions
Owners

libsvm-rs

A pure Rust reimplementation of the classic LIBSVM library, targeting numerical equivalence and model-file compatibility.

Status: Active development (February 2026). Phases 0–3 complete (types, I/O, kernels, cache, prediction, full SMO solver). Training works for all 5 SVM types.

What is LIBSVM?

LIBSVM is one of the most widely cited machine learning libraries ever created:

Authors: Chih-Chung Chang and Chih-Jen Lin (National Taiwan University).
First release: ~2000, still actively maintained (v3.37, December 2025).
Citations: >53,000 (Google Scholar) for the original paper.
Core functionality: Efficient training and inference for Support Vector Machines (SVMs).
- Classification: C-SVC, ν-SVC
- Regression: ε-SVR, ν-SVR
- Distribution estimation / novelty detection: one-class SVM
Key features:
- Multiple kernels: linear, polynomial, RBF (Gaussian), sigmoid, precomputed.
- Probability estimates (via Platt scaling).
- Cross-validation and parameter selection helpers.
- Simple text-based model format for interoperability.
- CLI tools: svm-train, svm-predict, svm-scale.
Strengths: Battle-tested SMO (Sequential Minimal Optimization) solver, excellent performance on sparse/high-dimensional data (text classification, bioinformatics, sensor data), compact codebase (~3,300 LOC core).

Why a Pure Rust Port?

Existing Rust options for SVMs don't provide full LIBSVM-compatible training:

Option	Type	Pros	Cons
libsvm	FFI bindings to C++	Full feature parity	Stale (last updated 2022), requires native build
linfa-svm	Pure Rust (linfa)	Modern API, active	Different algorithms/heuristics, not compatible
smartcore	Pure Rust	Good coverage, active	Approximate solver, not LIBSVM-equivalent
ffsvm	Pure Rust	LIBSVM model loading, fast inference	Prediction only — no training

This project aims to fill the gap by providing:

Numerical equivalence with LIBSVM (same predictions and model files on benchmark datasets, within floating-point tolerance).
Full memory/thread safety via Rust's ownership model — no undefined behavior in sparse data handling.
Zero C/C++ dependencies at runtime (pure Rust, no native linkage).
Fearless concurrency (e.g., parallel cross-validation with Rayon).
Easy deployment: single binary, WebAssembly support for browser inference.
Modern ergonomics while preserving compatibility (builders, iterators, Result-based error handling).

Ideal for:

Reproducible research needing LIBSVM-compatible results.
Embedded/lightweight ML (WASM, edge devices).
Rust data/ML pipelines without native build headaches.

A Note on Numerical Equivalence

We target numerical equivalence, not bitwise identity. Floating-point results across different compilers (GCC vs LLVM) and languages are not guaranteed to be identical due to operation reordering, FMA instructions, and intermediate precision differences. This is an open problem even within C++ itself.

In practice, this means:

Identical predicted labels on benchmark datasets.
Probabilities within ~1e-8 tolerance.
Model files interoperable with original LIBSVM (loadable by either implementation).
Same support vectors selected (barring degenerate tie-breaking cases).

Goals

Compatibility
- Pass all official LIBSVM test scenarios.
- Equivalent output (predictions, probabilities, model files) on standard datasets (heart_scale, a9a, etc.).
- Model files readable by both this library and original LIBSVM.
Safety
- 100% safe Rust where possible (no unsafe unless heavily justified and tested).
- Comprehensive error handling (thiserror).
- Graceful handling of malformed input.
Performance
- Target: match original C++ speed after optimization (initial port may be 10–20% slower).
- Optional Rayon parallelism for cross-validation and grid search.
Extras (Post-MVP)
- PyO3 bindings for Python drop-in replacement.
- WASM examples.
- Optional dense matrix support via ndarray.

Features Roadmap

Core data structures (SvmNode, SvmProblem, SvmParameter, SvmModel)
All kernels (linear, polynomial, RBF, sigmoid, precomputed)
Kernel cache (O(1) LRU)
Model save/load (exact LIBSVM text format, byte-exact roundtrip)
Prediction (verified zero mismatches against C svm-predict)
Full SMO solver (C-SVC, ν-SVC, ε-SVR, ν-SVR, one-class)
Shrinking heuristic
WSS3 working-set selection (Fan et al., JMLR 2005)
QMatrix implementations (SvcQ, OneClassQ, SvrQ)
Probability estimates (Platt scaling)
Cross-validation (parallel optional)
CLI tools: svm-train-rs, svm-predict-rs, svm-scale-rs
Comprehensive test suite with reference outputs

Installation

# Cargo.toml — when published
[dependencies]
libsvm-rs = "0.3.0"

Until published:

cargo add libsvm-rs --git https://github.com/ricardofrantz/libsvm-rs

Usage Example

use libsvm_rs::io::{load_problem, save_model};
use libsvm_rs::train::svm_train;
use libsvm_rs::predict::predict;
use libsvm_rs::{SvmParameter, SvmType, KernelType};

// Load training data
let problem = load_problem("heart_scale").unwrap();

// Set parameters
let param = SvmParameter {
    svm_type: SvmType::CSvc,
    kernel_type: KernelType::Rbf,
    gamma: 1.0 / 13.0,  // 1/num_features
    c: 1.0,
    ..Default::default()
};

// Train
let model = svm_train(&problem, &param);

// Predict
let label = predict(&model, &problem.instances[0]);
println!("Predicted label: {}", label);

// Save model (loadable by original LIBSVM)
save_model("heart_scale.model", &model).unwrap();

See examples/ for full demos (once implemented).

Development Plan

Project Structure

src/
  lib.rs
  types.rs      # SvmNode, SvmProblem, SvmParameter, SvmModel
  kernel.rs     # kernel functions + cache
  solver.rs     # core SMO
  cache.rs      # LRU kernel cache
  io.rs         # model/problem parsing (LIBSVM text format)
  bin/
    train.rs
    predict.rs
    scale.rs
tests/
  integration/
examples/
benches/

Phases

Phase	Description	Estimated Effort
0	Repository setup, CI, dependencies	1–2 days
1	Data structures & I/O (parsing, model format)	1–2 weeks
2	Kernels, cache & prediction (load pre-trained models, verify)	1–2 weeks
3	Core SMO solver (all SVM types)	6–12 weeks
4	Probability estimates, shrinking, cross-validation	2–4 weeks
5	CLI tools (`svm-train-rs`, `svm-predict-rs`, `svm-scale-rs`)	1–2 weeks
6	Testing & validation (reference outputs, fuzzing, benchmarks)	Ongoing
7	Documentation, polish, publish to crates.io	1–2 weeks

Total estimated effort: 3–6 months.

Phase 3 is the bulk of the work — the SMO solver in svm.cpp is ~1,000 lines of subtle numerical code with heuristics (working set selection, shrinking, cache management). Translating C++ manual memory management to Rust ownership patterns, plus verifying numerical correctness across all SVM types, is the primary challenge.

Key References

svm.h — API and struct definitions
svm.cpp — Core implementation (~3,300 LOC)
LIBSVM datasets — Benchmark data

Testing Strategy

Run original LIBSVM on benchmark datasets → save all outputs as reference.
Integration tests compare against reference:
- Exact label matches.
- Probabilities within tolerance (float-cmp with ε ≈ 1e-8).
- Model file compatibility (load in both directions).
Include regression suite from official LIBSVM tools/ subdirectory.
Fuzz with cargo-fuzz on input parsing.
Benchmark with criterion against original C++ implementation.

Contributing

Contributions welcome! Especially:

Translating specific solver components.
Adding dataset-based tests.
Performance improvements (preserving numerical behavior).

Open an issue first for major changes.

Changelog

v0.3.0 (February 2026) — SMO Solver

Full SMO solver for all 5 SVM types: C-SVC, ν-SVC, one-class, ε-SVR, ν-SVR
WSS3 working-set selection (second-order heuristic from Fan et al., JMLR 2005)
Shrinking heuristic with gradient reconstruction
QMatrix trait with three implementations: SvcQ, OneClassQ, SvrQ
svm_train function producing SvmModel compatible with C LIBSVM
Multiclass support via one-vs-one with class grouping and sv_coef assembly
Cache::swap_index bug fix — added column swap loop (critical for shrinking correctness)
Kernel refactor — Vec<&[SvmNode]> for swappable data point references
50 tests (12 new), verified against C LIBSVM reference outputs

v0.2.0 (February 2026) — Prediction & I/O

Core types: SvmNode, SvmProblem, SvmParameter, SvmModel
All 5 kernel functions (linear, polynomial, RBF, sigmoid, precomputed)
LRU kernel cache
Model and problem I/O (LIBSVM text format, byte-exact roundtrip)
Prediction (zero mismatches against C svm-predict on heart_scale)
Parameter validation with ν-SVC feasibility check
38 tests

v0.1.0 (February 2026) — Initial Release

Repository setup, workspace layout, CI

License

BSD-3-Clause (same as original LIBSVM) for maximum compatibility.

Acknowledgments

Original LIBSVM by Chih-Chung Chang and Chih-Jen Lin.
Existing Rust ML ecosystem (linfa, smartcore, ffsvm) for prior art.