libsvm-rs
A pure Rust reimplementation of the classic LIBSVM library, targeting numerical equivalence and model-file compatibility.
Status: Active development (February 2026). Phases 0–3 complete (types, I/O, kernels, cache, prediction, full SMO solver). Training works for all 5 SVM types.
What is LIBSVM?
LIBSVM is one of the most widely cited machine learning libraries ever created:
- Authors: Chih-Chung Chang and Chih-Jen Lin (National Taiwan University).
- First release: ~2000, still actively maintained (v3.37, December 2025).
- Citations: >53,000 (Google Scholar) for the original paper.
- Core functionality: Efficient training and inference for Support Vector Machines (SVMs).
- Classification: C-SVC, ν-SVC
- Regression: ε-SVR, ν-SVR
- Distribution estimation / novelty detection: one-class SVM
- Key features:
- Multiple kernels: linear, polynomial, RBF (Gaussian), sigmoid, precomputed.
- Probability estimates (via Platt scaling).
- Cross-validation and parameter selection helpers.
- Simple text-based model format for interoperability.
- CLI tools:
svm-train,svm-predict,svm-scale.
- Strengths: Battle-tested SMO (Sequential Minimal Optimization) solver, excellent performance on sparse/high-dimensional data (text classification, bioinformatics, sensor data), compact codebase (~3,300 LOC core).
Why a Pure Rust Port?
Existing Rust options for SVMs don't provide full LIBSVM-compatible training:
| Option | Type | Pros | Cons |
|---|---|---|---|
| libsvm | FFI bindings to C++ | Full feature parity | Stale (last updated 2022), requires native build |
| linfa-svm | Pure Rust (linfa) | Modern API, active | Different algorithms/heuristics, not compatible |
| smartcore | Pure Rust | Good coverage, active | Approximate solver, not LIBSVM-equivalent |
| ffsvm | Pure Rust | LIBSVM model loading, fast inference | Prediction only — no training |
This project aims to fill the gap by providing:
- Numerical equivalence with LIBSVM (same predictions and model files on benchmark datasets, within floating-point tolerance).
- Full memory/thread safety via Rust's ownership model — no undefined behavior in sparse data handling.
- Zero C/C++ dependencies at runtime (pure Rust, no native linkage).
- Fearless concurrency (e.g., parallel cross-validation with Rayon).
- Easy deployment: single binary, WebAssembly support for browser inference.
- Modern ergonomics while preserving compatibility (builders, iterators,
Result-based error handling).
Ideal for:
- Reproducible research needing LIBSVM-compatible results.
- Embedded/lightweight ML (WASM, edge devices).
- Rust data/ML pipelines without native build headaches.
A Note on Numerical Equivalence
We target numerical equivalence, not bitwise identity. Floating-point results across different compilers (GCC vs LLVM) and languages are not guaranteed to be identical due to operation reordering, FMA instructions, and intermediate precision differences. This is an open problem even within C++ itself.
In practice, this means:
- Identical predicted labels on benchmark datasets.
- Probabilities within ~1e-8 tolerance.
- Model files interoperable with original LIBSVM (loadable by either implementation).
- Same support vectors selected (barring degenerate tie-breaking cases).
Goals
-
Compatibility
- Pass all official LIBSVM test scenarios.
- Equivalent output (predictions, probabilities, model files) on standard datasets (
heart_scale,a9a, etc.). - Model files readable by both this library and original LIBSVM.
-
Safety
- 100% safe Rust where possible (no
unsafeunless heavily justified and tested). - Comprehensive error handling (
thiserror). - Graceful handling of malformed input.
- 100% safe Rust where possible (no
-
Performance
- Target: match original C++ speed after optimization (initial port may be 10–20% slower).
- Optional Rayon parallelism for cross-validation and grid search.
-
Extras (Post-MVP)
- PyO3 bindings for Python drop-in replacement.
- WASM examples.
- Optional dense matrix support via
ndarray.
Features Roadmap
- Core data structures (
SvmNode,SvmProblem,SvmParameter,SvmModel) - All kernels (linear, polynomial, RBF, sigmoid, precomputed)
- Kernel cache (O(1) LRU)
- Model save/load (exact LIBSVM text format, byte-exact roundtrip)
- Prediction (verified zero mismatches against C
svm-predict) - Full SMO solver (C-SVC, ν-SVC, ε-SVR, ν-SVR, one-class)
- Shrinking heuristic
- WSS3 working-set selection (Fan et al., JMLR 2005)
- QMatrix implementations (SvcQ, OneClassQ, SvrQ)
- Probability estimates (Platt scaling)
- Cross-validation (parallel optional)
- CLI tools:
svm-train-rs,svm-predict-rs,svm-scale-rs - Comprehensive test suite with reference outputs
Installation
# Cargo.toml — when published
[]
= "0.3.0"
Until published:
Usage Example
use ;
use svm_train;
use predict;
use ;
// Load training data
let problem = load_problem.unwrap;
// Set parameters
let param = SvmParameter ;
// Train
let model = svm_train;
// Predict
let label = predict;
println!;
// Save model (loadable by original LIBSVM)
save_model.unwrap;
See examples/ for full demos (once implemented).
Development Plan
Project Structure
src/
lib.rs
types.rs # SvmNode, SvmProblem, SvmParameter, SvmModel
kernel.rs # kernel functions + cache
solver.rs # core SMO
cache.rs # LRU kernel cache
io.rs # model/problem parsing (LIBSVM text format)
bin/
train.rs
predict.rs
scale.rs
tests/
integration/
examples/
benches/
Phases
| Phase | Description | Estimated Effort |
|---|---|---|
| 0 | Repository setup, CI, dependencies | 1–2 days |
| 1 | Data structures & I/O (parsing, model format) | 1–2 weeks |
| 2 | Kernels, cache & prediction (load pre-trained models, verify) | 1–2 weeks |
| 3 | Core SMO solver (all SVM types) | 6–12 weeks |
| 4 | Probability estimates, shrinking, cross-validation | 2–4 weeks |
| 5 | CLI tools (svm-train-rs, svm-predict-rs, svm-scale-rs) |
1–2 weeks |
| 6 | Testing & validation (reference outputs, fuzzing, benchmarks) | Ongoing |
| 7 | Documentation, polish, publish to crates.io | 1–2 weeks |
Total estimated effort: 3–6 months.
Phase 3 is the bulk of the work — the SMO solver in svm.cpp is ~1,000 lines of subtle numerical code with heuristics (working set selection, shrinking, cache management). Translating C++ manual memory management to Rust ownership patterns, plus verifying numerical correctness across all SVM types, is the primary challenge.
Key References
- svm.h — API and struct definitions
- svm.cpp — Core implementation (~3,300 LOC)
- LIBSVM datasets — Benchmark data
Testing Strategy
- Run original LIBSVM on benchmark datasets → save all outputs as reference.
- Integration tests compare against reference:
- Exact label matches.
- Probabilities within tolerance (
float-cmpwith ε ≈ 1e-8). - Model file compatibility (load in both directions).
- Include regression suite from official LIBSVM
tools/subdirectory. - Fuzz with
cargo-fuzzon input parsing. - Benchmark with
criterionagainst original C++ implementation.
Contributing
Contributions welcome! Especially:
- Translating specific solver components.
- Adding dataset-based tests.
- Performance improvements (preserving numerical behavior).
Open an issue first for major changes.
Changelog
v0.3.0 (February 2026) — SMO Solver
- Full SMO solver for all 5 SVM types: C-SVC, ν-SVC, one-class, ε-SVR, ν-SVR
- WSS3 working-set selection (second-order heuristic from Fan et al., JMLR 2005)
- Shrinking heuristic with gradient reconstruction
- QMatrix trait with three implementations:
SvcQ,OneClassQ,SvrQ svm_trainfunction producingSvmModelcompatible with C LIBSVM- Multiclass support via one-vs-one with class grouping and sv_coef assembly
- Cache::swap_index bug fix — added column swap loop (critical for shrinking correctness)
- Kernel refactor —
Vec<&[SvmNode]>for swappable data point references - 50 tests (12 new), verified against C LIBSVM reference outputs
v0.2.0 (February 2026) — Prediction & I/O
- Core types:
SvmNode,SvmProblem,SvmParameter,SvmModel - All 5 kernel functions (linear, polynomial, RBF, sigmoid, precomputed)
- LRU kernel cache
- Model and problem I/O (LIBSVM text format, byte-exact roundtrip)
- Prediction (zero mismatches against C
svm-predicton heart_scale) - Parameter validation with ν-SVC feasibility check
- 38 tests
v0.1.0 (February 2026) — Initial Release
- Repository setup, workspace layout, CI
License
BSD-3-Clause (same as original LIBSVM) for maximum compatibility.