libsvm-rs

A pure Rust port of LIBSVM with matching model format, matching CLI behavior targets, and a reproducible Rust-vs-C verification pipeline.

Project Status (2026-02-11)

Upstream target is pinned to LIBSVM v337 (LIBSVM_VERSION=337) via reference/libsvm_upstream_lock.json.
Differential verification (quick scope): 45 total, 45 pass, 0 warn, 0 fail, 0 skip.
Differential verification (full scope, generated 2026-02-11T13:56:06Z): 250 total, 236 pass, 4 warn, 0 fail, 10 skip.
Active tolerance policy is differential-v3 (reference/tolerance_policy.md).
All 10 current skips are nu_svc on the synthetic gen_binary_imbalanced.scale family, where both C and Rust fail identically.
Coverage gate currently passes:
- libsvm-rs line: 93.19%
- libsvm-rs function: 92.86%
- workspace line: 88.77%
Benchmark infrastructure exists, but current report was generated with low sample count and should be rerun with higher repetitions before drawing performance conclusions.
Revised security audit reports strong baseline posture and zero RustSec findings (SECURITY_AUDIT.md).

Current reports:

reference/differential_report.md
reference/tolerance_policy.md
reference/coverage_report.md
reference/benchmark_report.md
SECURITY_AUDIT.md

What This Repository Is

This repository contains:

A Rust library crate implementing LIBSVM-compatible training and prediction.
Rust CLI binaries matching the svm-train, svm-predict, and svm-scale workflows.
Verification scripts that run Rust and upstream C on the same data/parameters and produce machine-readable artifacts under reference/.

Goal: trust-grade parity evidence against upstream LIBSVM, not marketing claims.

What Was Implemented in This Revision

Upstream lock and CI validation:
- reference/libsvm_upstream_lock.json
- scripts/check_libsvm_reference_lock.sh
- CI gate in .github/workflows/ci.yml
Pinned upstream build and provenance:
- scripts/setup_reference_libsvm.sh
- reference/reference_provenance.json
- reference/reference_build_report.md
Deterministic synthetic differential datasets:
- scripts/generate_differential_datasets.py
- data/generated/*
- reference/dataset_manifest.json
Differential suite harness with JSON + markdown outputs:
- scripts/run_differential_suite.py
- reference/differential_results.json
- reference/differential_report.md
Differential tolerance policy and warning governance:
- reference/tolerance_policy.md
- policy id differential-v3
- targeted warning guardrail for housing_scale_s3_t2_tuned (epsilon-SVR, RBF, tuned)
Security hardening and revised audit evidence:
- SECURITY_AUDIT.md
- negative feature index hardening in svm-scale-rs
- RustSec dependency audit clean

Upstream Compatibility Target and Versioning Policy

Parity target is locked to an upstream C release independent of crate semver.

Crate version (libsvm-rs): Rust semver for API/package lifecycle.
Parity target (reference/libsvm_upstream_lock.json): exact upstream URL/tag/commit/version used for verification.

This allows stable Rust release management while keeping parity claims auditable against a specific upstream commit.

How To Verify This Port End-to-End

Run from repository root.

Validate lock consistency:

bash scripts/check_libsvm_reference_lock.sh

Build pinned upstream reference and generate provenance:

bash scripts/setup_reference_libsvm.sh

Run differential verification:

# canonical matrix
python3 scripts/run_differential_suite.py

# expanded matrix (canonical + generated + tuned)
DIFF_SCOPE=full python3 scripts/run_differential_suite.py

# strict-only (disable targeted SVR warning downgrade)
DIFF_ENABLE_TARGETED_SVR_WARN=0 DIFF_SCOPE=full python3 scripts/run_differential_suite.py

# sensitivity study (global non-prob scalar relative tolerance override)
DIFF_NONPROB_REL_TOL=2e-5 DIFF_SCOPE=full python3 scripts/run_differential_suite.py

Run coverage gate:

bash scripts/check_coverage_thresholds.sh

Run Rust-vs-C benchmarks:

# Example stronger sampling than the default
BENCH_WARMUP=3 BENCH_RUNS=30 python3 scripts/benchmark_compare.py

Verification Artifact Map

Lock and provenance:
- reference/libsvm_upstream_lock.json
- reference/reference_provenance.json
- reference/reference_build_report.md
Differential results:
- reference/differential_results.json
- reference/differential_report.md
- reference/tolerance_policy.md
Coverage:
- reference/coverage_report.md
Performance:
- reference/benchmark_results.json
- reference/benchmark_report.md
Security:
- SECURITY_AUDIT.md

How To Read Differential Results

pass: no parity issues detected for the case under configured tolerances.
warn: non-fatal differences detected under explicit policy rules. In current runs these are limited to:
- one targeted epsilon-SVR near-parity drift case (housing_scale_s3_t2_tuned) with extra cross-predict checks
- probability metadata drift
- rho-only near-equivalence drift
- one-class near-boundary label drift
fail: deterministic parity break (for example label mismatch or model header mismatch outside thresholds).
skip: combo not executed, usually because training failed in both implementations for that combo.

Current Caveats Before Any "Full Parity" Claim

As of 2026-02-11 full-scope run:

0 hard failures under default differential-v3 policy.
4 warnings remain:
- targeted epsilon-SVR near-parity drift (housing_scale_s3_t2_tuned)
- one probability header drift case (probA)
- one rho-only near-equivalence drift case
- one one-class near-boundary label drift case
10 skips need classification as invalid-combo vs genuine missing coverage.
Benchmarks should be rerun with stronger statistical settings before claiming outperformance.

Current honest claim: no hard differential failures under the documented default policy, with a small set of explicitly justified warnings. This is strong parity evidence, but not bitwise identity across all modes.

Developer Notes: What Works Well

These areas are currently in good shape and can be treated as stable unless new evidence appears:

End-to-end Rust-vs-C differential harness is reproducible and version-locked.
Canonical matrix (quick) is fully clean (45/45 pass).
Full matrix (full) has no hard failures under default policy.
Predictor path parity is strong; residual drift currently comes from training-side numerics, not prediction command behavior.
Parser and CLI hardening include explicit feature-index bounds checks, with security audit evidence in SECURITY_AUDIT.md.

Developer Notes: What Is Not Perfect Yet

These are known non-perfect areas and should not be hidden in release notes or parity claims:

housing_scale_s3_t2_tuned (epsilon-SVR, RBF, tuned) is a targeted warning, not a pass.
One generated regression case has probability header probA drift.
One generated extreme-scale case has rho-only near-equivalence drift.
One generated one-class case has near-boundary label drift.
Ten generated nu_svc imbalanced cases are skipped because both implementations fail training.

Current warning case IDs (from latest full run):

housing_scale_s3_t2_tuned
gen_regression_sparse_scale_s4_t3_tuned
gen_extreme_scale_scale_s0_t1_default
gen_extreme_scale_scale_s2_t1_default

Current skip case family:

gen_binary_imbalanced_scale_s1_* and gen_binary_imbalanced_scale_precomputed_s1_t4_* (default+tuned), all with reason: both C and Rust training failed.

Targeted Warning Policy (Why It Exists)

The targeted epsilon-SVR warning for housing_scale_s3_t2_tuned is intentionally narrow and guarded:

applies to one case ID only
max non-prob drift bounds must hold (max_rel <= 6e-5, max_abs <= 6e-4)
model drift bounds must hold (rho_rel <= 1e-5, max sv_coef abs diff <= 4e-3)
cross-predict parity must hold in both directions:
- Rust predictor on C model matches C predictor on C model
- C predictor on Rust model matches Rust predictor on Rust model

This can be disabled for strict-only runs:

DIFF_ENABLE_TARGETED_SVR_WARN=0 DIFF_SCOPE=full python3 scripts/run_differential_suite.py

Release Claim Checklist For Developers

Before stating parity/security status publicly, run and verify:

bash scripts/check_libsvm_reference_lock.sh
bash scripts/setup_reference_libsvm.sh
DIFF_SCOPE=quick python3 scripts/run_differential_suite.py
DIFF_SCOPE=full python3 scripts/run_differential_suite.py
bash scripts/check_coverage_thresholds.sh
cargo audit

And then confirm these artifacts are current:

reference/differential_results.json
reference/differential_report.md
reference/tolerance_policy.md
reference/coverage_report.md
SECURITY_AUDIT.md

Features

All 5 SVM types (-s 0..4)
All kernels including precomputed (-t 0..4)
Model I/O compatible with LIBSVM text format
Probability mode (-b 1) implemented
Cross-validation support
CLI tools:
- svm-train-rs
- svm-predict-rs
- svm-scale-rs

Installation

[dependencies]
libsvm-rs = "0.5"

Quick Start

use libsvm_rs::io::{load_problem, save_model};
use libsvm_rs::predict::predict;
use libsvm_rs::train::svm_train;
use libsvm_rs::{KernelType, SvmParameter};
use std::path::Path;

let problem = load_problem(Path::new("data/heart_scale")).unwrap();

let mut param = SvmParameter::default();
param.kernel_type = KernelType::Rbf;
param.gamma = 1.0 / 13.0;

let model = svm_train(&problem, &param);
let label = predict(&model, &problem.instances[0]);

println!("predicted label: {label}");
save_model(Path::new("heart_scale.model"), &model).unwrap();

CLI Usage

# train
svm-train-rs data/heart_scale
svm-train-rs -s 1 -t 0 -v 5 data/heart_scale

# predict
svm-predict-rs data/heart_scale heart_scale.model output.txt
svm-predict-rs -b 1 data/heart_scale heart_scale.model output_prob.txt

# scale
svm-scale-rs -l 0 -u 1 data/heart_scale > scaled.txt

License

BSD-3-Clause, same family as original LIBSVM. See LICENSE.

libsvm-rs 0.6.0