libsvm-rs
A pure Rust port of LIBSVM with matching model format, matching CLI behavior targets, and a reproducible Rust-vs-C verification pipeline.
Project Status (2026-02-11)
- Upstream target is pinned to LIBSVM
v337(LIBSVM_VERSION=337) viareference/libsvm_upstream_lock.json. - Differential verification (
quickscope):45total,45 pass,0 warn,0 fail,0 skip. - Differential verification (
fullscope, generated2026-02-11T13:56:06Z):250total,236 pass,4 warn,0 fail,10 skip. - Active tolerance policy is
differential-v3(reference/tolerance_policy.md). - All
10current skips arenu_svcon the syntheticgen_binary_imbalanced.scalefamily, where both C and Rust fail identically. - Coverage gate currently passes:
libsvm-rsline:93.19%libsvm-rsfunction:92.86%- workspace line:
88.77%
- Benchmark infrastructure exists, but current report was generated with low sample count and should be rerun with higher repetitions before drawing performance conclusions.
- Revised security audit reports strong baseline posture and zero RustSec findings (
SECURITY_AUDIT.md).
Current reports:
reference/differential_report.mdreference/tolerance_policy.mdreference/coverage_report.mdreference/benchmark_report.mdSECURITY_AUDIT.md
What This Repository Is
This repository contains:
- A Rust library crate implementing LIBSVM-compatible training and prediction.
- Rust CLI binaries matching the
svm-train,svm-predict, andsvm-scaleworkflows. - Verification scripts that run Rust and upstream C on the same data/parameters and produce machine-readable artifacts under
reference/.
Goal: trust-grade parity evidence against upstream LIBSVM, not marketing claims.
What Was Implemented in This Revision
- Upstream lock and CI validation:
reference/libsvm_upstream_lock.jsonscripts/check_libsvm_reference_lock.sh- CI gate in
.github/workflows/ci.yml
- Pinned upstream build and provenance:
scripts/setup_reference_libsvm.shreference/reference_provenance.jsonreference/reference_build_report.md
- Deterministic synthetic differential datasets:
scripts/generate_differential_datasets.pydata/generated/*reference/dataset_manifest.json
- Differential suite harness with JSON + markdown outputs:
scripts/run_differential_suite.pyreference/differential_results.jsonreference/differential_report.md
- Differential tolerance policy and warning governance:
reference/tolerance_policy.md- policy id
differential-v3 - targeted warning guardrail for
housing_scale_s3_t2_tuned(epsilon-SVR, RBF, tuned)
- Security hardening and revised audit evidence:
SECURITY_AUDIT.md- negative feature index hardening in
svm-scale-rs - RustSec dependency audit clean
Upstream Compatibility Target and Versioning Policy
Parity target is locked to an upstream C release independent of crate semver.
- Crate version (
libsvm-rs): Rust semver for API/package lifecycle. - Parity target (
reference/libsvm_upstream_lock.json): exact upstream URL/tag/commit/version used for verification.
This allows stable Rust release management while keeping parity claims auditable against a specific upstream commit.
How To Verify This Port End-to-End
Run from repository root.
- Validate lock consistency:
- Build pinned upstream reference and generate provenance:
- Run differential verification:
# canonical matrix
# expanded matrix (canonical + generated + tuned)
DIFF_SCOPE=full
# strict-only (disable targeted SVR warning downgrade)
DIFF_ENABLE_TARGETED_SVR_WARN=0 DIFF_SCOPE=full
# sensitivity study (global non-prob scalar relative tolerance override)
DIFF_NONPROB_REL_TOL=2e-5 DIFF_SCOPE=full
- Run coverage gate:
- Run Rust-vs-C benchmarks:
# Example stronger sampling than the default
BENCH_WARMUP=3 BENCH_RUNS=30
Verification Artifact Map
- Lock and provenance:
reference/libsvm_upstream_lock.jsonreference/reference_provenance.jsonreference/reference_build_report.md
- Differential results:
reference/differential_results.jsonreference/differential_report.mdreference/tolerance_policy.md
- Coverage:
reference/coverage_report.md
- Performance:
reference/benchmark_results.jsonreference/benchmark_report.md
- Security:
SECURITY_AUDIT.md
How To Read Differential Results
pass: no parity issues detected for the case under configured tolerances.warn: non-fatal differences detected under explicit policy rules. In current runs these are limited to:- one targeted epsilon-SVR near-parity drift case (
housing_scale_s3_t2_tuned) with extra cross-predict checks - probability metadata drift
- rho-only near-equivalence drift
- one-class near-boundary label drift
- one targeted epsilon-SVR near-parity drift case (
fail: deterministic parity break (for example label mismatch or model header mismatch outside thresholds).skip: combo not executed, usually because training failed in both implementations for that combo.
Current Caveats Before Any "Full Parity" Claim
As of 2026-02-11 full-scope run:
0hard failures under defaultdifferential-v3policy.4warnings remain:- targeted epsilon-SVR near-parity drift (
housing_scale_s3_t2_tuned) - one probability header drift case (
probA) - one rho-only near-equivalence drift case
- one one-class near-boundary label drift case
- targeted epsilon-SVR near-parity drift (
10skips need classification as invalid-combo vs genuine missing coverage.- Benchmarks should be rerun with stronger statistical settings before claiming outperformance.
Current honest claim: no hard differential failures under the documented default policy, with a small set of explicitly justified warnings. This is strong parity evidence, but not bitwise identity across all modes.
Developer Notes: What Works Well
These areas are currently in good shape and can be treated as stable unless new evidence appears:
- End-to-end Rust-vs-C differential harness is reproducible and version-locked.
- Canonical matrix (
quick) is fully clean (45/45pass). - Full matrix (
full) has no hard failures under default policy. - Predictor path parity is strong; residual drift currently comes from training-side numerics, not prediction command behavior.
- Parser and CLI hardening include explicit feature-index bounds checks, with security audit evidence in
SECURITY_AUDIT.md.
Developer Notes: What Is Not Perfect Yet
These are known non-perfect areas and should not be hidden in release notes or parity claims:
housing_scale_s3_t2_tuned(epsilon-SVR, RBF, tuned) is a targeted warning, not a pass.- One generated regression case has probability header
probAdrift. - One generated extreme-scale case has rho-only near-equivalence drift.
- One generated one-class case has near-boundary label drift.
- Ten generated
nu_svcimbalanced cases are skipped because both implementations fail training.
Current warning case IDs (from latest full run):
housing_scale_s3_t2_tunedgen_regression_sparse_scale_s4_t3_tunedgen_extreme_scale_scale_s0_t1_defaultgen_extreme_scale_scale_s2_t1_default
Current skip case family:
gen_binary_imbalanced_scale_s1_*andgen_binary_imbalanced_scale_precomputed_s1_t4_*(default+tuned), all with reason: both C and Rust training failed.
Targeted Warning Policy (Why It Exists)
The targeted epsilon-SVR warning for housing_scale_s3_t2_tuned is intentionally narrow and guarded:
- applies to one case ID only
- max non-prob drift bounds must hold (
max_rel <= 6e-5,max_abs <= 6e-4) - model drift bounds must hold (
rho_rel <= 1e-5,max sv_coef abs diff <= 4e-3) - cross-predict parity must hold in both directions:
- Rust predictor on C model matches C predictor on C model
- C predictor on Rust model matches Rust predictor on Rust model
This can be disabled for strict-only runs:
DIFF_ENABLE_TARGETED_SVR_WARN=0 DIFF_SCOPE=full
Release Claim Checklist For Developers
Before stating parity/security status publicly, run and verify:
bash scripts/check_libsvm_reference_lock.shbash scripts/setup_reference_libsvm.shDIFF_SCOPE=quick python3 scripts/run_differential_suite.pyDIFF_SCOPE=full python3 scripts/run_differential_suite.pybash scripts/check_coverage_thresholds.shcargo audit
And then confirm these artifacts are current:
reference/differential_results.jsonreference/differential_report.mdreference/tolerance_policy.mdreference/coverage_report.mdSECURITY_AUDIT.md
Features
- All 5 SVM types (
-s 0..4) - All kernels including precomputed (
-t 0..4) - Model I/O compatible with LIBSVM text format
- Probability mode (
-b 1) implemented - Cross-validation support
- CLI tools:
svm-train-rssvm-predict-rssvm-scale-rs
Installation
[]
= "0.5"
Quick Start
use ;
use predict;
use svm_train;
use ;
use Path;
let problem = load_problem.unwrap;
let mut param = default;
param.kernel_type = Rbf;
param.gamma = 1.0 / 13.0;
let model = svm_train;
let label = predict;
println!;
save_model.unwrap;
CLI Usage
# train
# predict
# scale
License
BSD-3-Clause, same family as original LIBSVM. See LICENSE.