libsvm-rs
FFI-free, Rust-native LIBSVM-compatible SVM training and prediction.
libsvm-rs is a pure Rust implementation of
LIBSVM for projects that want LIBSVM
model/data compatibility without linking to the C/C++ library. It supports the
same SVM families, kernels, sparse text format, model files, and command-line
workflow as upstream LIBSVM, while keeping the implementation auditable from
Rust.
The goal is numerical equivalence and operational compatibility, not bitwise identity. The included verification pipeline compares this implementation against a pinned upstream LIBSVM reference across classification, regression, one-class, probability, and precomputed-kernel cases.
Why libsvm-rs?
| If you need... | libsvm-rs gives you... |
|---|---|
| Rust-native deployment | No C runtime or libsvm-sys2 FFI dependency |
| LIBSVM interoperability | Reads and writes LIBSVM sparse problem files and model files |
| Familiar command-line tools | svm-train-rs, svm-predict-rs, and svm-scale-rs |
| Auditable implementation | Solver, kernels, prediction, probability, and I/O are implemented in Rust |
| Parity evidence | Reproducible differential tests against pinned upstream LIBSVM |
| Small dependency surface | One runtime dependency: thiserror |
How is this different from the libsvm crate?
The existing libsvm crate provides
high-level Rust bindings to the C LIBSVM library through libsvm-sys2.
That is the right choice if you explicitly want the original C implementation
from Rust.
libsvm-rs is different: it reimplements the LIBSVM algorithms and file formats
in Rust. Use it when you want Rust-native deployment, easier cross-compilation,
or an implementation that can be inspected and tested without crossing an FFI
boundary.
Performance Snapshot
Native CLI benchmarks compare Rust binaries (svm-*-rs) against the vendored C
LIBSVM reference on the same datasets and parameters. Ratios below 1.0 mean
Rust was faster in the measured run.
| Operation | Cases | Rust/C median ratio | Result |
|---|---|---|---|
predict |
40 | 0.804 |
Rust faster on median |
predict_probability |
30 | 0.834 |
Rust faster on median |
train |
40 | 0.916 |
Rust slightly faster on median |
train_probability |
30 | 1.025 |
Rust slightly slower on median |
Rust is not uniformly faster than C LIBSVM. The included native benchmarks show
strong prediction performance and comparable training performance, with some
slower probability-training cases. See
reference/benchmark_report.md and
examples/comparison_summary.json.
Parity Status
Current full differential suite against pinned upstream LIBSVM v337:
| Scope | Pass | Warn | Fail | Skip |
|---|---|---|---|---|
| 250 configurations | 236 | 4 | 0 | 10 |
The warnings are documented numerical near-parity cases, not prediction-logic
failures. See reference/differential_report.md
and reference/tolerance_policy.md.
Security Considerations
libsvm-rs treats .libsvm problem files and .model files as untrusted text
input by default.
- Default loaders enforce byte, line-length, support-vector, class-count, and
feature-index caps through
LoadOptions. - Problem files reject malformed
index:valuetokens, non-ascending feature indices, over-limit feature indices, oversized input, oversized lines, and embedded NUL bytes. - Model files additionally validate header consistency before support-vector
allocation:
nr_class,total_sv,rho,label,nr_sv,probA,probB, probability-density marks, and precomputed-kernel rows must agree. - Public loader paths return structured errors for malformed text input within the configured caps; they should not panic on adversarial files.
The loaders do not prove that a model is statistically meaningful, was trained
from a specific dataset, is safe to use for a regulated decision, or is
cryptographically authentic. Constant-time arithmetic, side-channel resistance,
sandboxed execution, and model signing are out of scope for this crate. Use
LoadOptions::trusted_input() only for files whose source and size are already
controlled.
When to Use It
- You are building a Rust service or CLI and want SVM training/prediction without a C build or runtime dependency.
- You already have LIBSVM-format data or models and want to keep that workflow.
- You need C-SVC, nu-SVC, one-class SVM, epsilon-SVR, or nu-SVR with standard LIBSVM kernels.
- You want a small, inspectable Rust implementation for scientific or embedded deployment work.
When Not to Use It
- If you need exact bit-for-bit identity with upstream LIBSVM, use upstream LIBSVM directly.
- If you only need Python workflows, scikit-learn already wraps LIBSVM for
SVC,SVR, andOneClassSVM. - For very large linear problems, prefer
liblinear,LinearSVC, SGD-style methods, or kernel approximations. - If you need GPU training or online/incremental updates, this crate does not provide them.
Features
- All 5 SVM types: C-SVC, nu-SVC, one-class SVM, epsilon-SVR, nu-SVR (
-s 0..4) - All 5 kernel types: linear, polynomial, RBF, sigmoid, precomputed (
-t 0..4) - Model file compatibility: reads and writes LIBSVM text format with
%.17gprecision, so models trained with the C library can be loaded in Rust and vice versa - Probability estimation (
-b 1): Platt scaling for binary classification, pairwise coupling for multiclass, Laplace-corrected predictions for regression, density-based marks for one-class - Cross-validation: stratified k-fold for classification (preserves class proportions), simple k-fold for regression/one-class
- CLI tools:
svm-train-rs,svm-predict-rs,svm-scale-rs— drop-in replacements matching upstream flag syntax - FFI-free implementation: no C runtime dependency and no
libsvm-sys2wrapper layer - Minimal dependencies: one runtime dependency (
thiserror);rayonis feature-gated for future parallel cross-validation - Verification pipeline: upstream parity locked to LIBSVM v337 with 250-configuration differential testing across all SVM types, kernel types, datasets, and parameter variations
Installation
As a library
[]
= "0.8"
CLI tools
# Clone and build from source (CLIs are workspace members, not published separately)
# Binaries are in target/release/
Quick Start
Library API
use ;
use ;
use svm_train;
use ;
use Path;
// Load training data in LIBSVM sparse format
let problem = load_problem.unwrap;
// Configure parameters (defaults match LIBSVM)
let mut param = default;
param.svm_type = CSvc;
param.kernel_type = Rbf;
param.gamma = 1.0 / 13.0; // 1/num_features
param.c = 1.0;
// Train
let model = svm_train;
// Predict a single instance
let label = predict;
println!;
// Get decision values (useful for ranking or custom thresholds)
let mut dec_values = vec!;
let label = predict_values;
println!;
// Save/load models (compatible with C LIBSVM format)
save_model.unwrap;
let loaded = load_model.unwrap;
Cross-Validation
use svm_cross_validation;
let targets = svm_cross_validation; // 5-fold CV
let accuracy = targets.iter.zip
.filter
.count as f64 / problem.labels.len as f64;
println!;
Probability Estimation
use predict_probability;
// Enable probability estimation during training
let mut param = default;
param.probability = true;
param.kernel_type = Rbf;
param.gamma = 1.0 / 13.0;
let model = svm_train;
if let Some = predict_probability
Extended Examples
For structured, runnable example suites see:
examples/README.md— index of all examplesexamples/basics/— minimal starter examplesexamples/api/— persistence, CV/grid-search, Iris workflowexamples/integrations/— prediction server + wasm inference integrationsexamples/scientific/— benchmark-heavy Rust-vs-C++ demos
CLI Usage
The CLI tools accept the same flags as upstream LIBSVM:
svm-train-rs
# Default C-SVC with RBF kernel
# nu-SVC with linear kernel, 5-fold cross-validation
# epsilon-SVR with RBF, custom C and gamma
# With probability estimation
# With class weights for imbalanced data
# Quiet mode (suppress training progress)
svm-predict-rs
# Standard prediction
# With probability estimates
# Quiet mode
svm-scale-rs
# Scale features to [0, 1]
# Scale to [-1, 1] (default)
# Save scaling parameters for later use on test data
# Scale specific feature range
Verification Pipeline
Overview
Upstream parity is locked to LIBSVM v337 (December 2025) via reference/libsvm_upstream_lock.json. The differential verification suite builds the upstream C binary from a pinned Git commit, runs both implementations on identical datasets with identical parameters, and compares:
- Prediction labels: exact match for classification, tolerance-bounded for regression
- Decision values: relative and absolute tolerance checks
- Model structure: number of SVs, rho values, sv_coef values
- Probability outputs: probA/probB parameters, probability predictions
Running Verification
# 1. Validate that the upstream lock file is consistent
# 2. Build the pinned upstream C reference binary and record provenance
# 3. Run differential verification
# Quick scope: 45 configs (canonical datasets × SVM types × kernel types)
# Full scope: 250 configs (canonical + generated + tuned parameters)
DIFF_SCOPE=full
# Strict mode: disable the targeted SVR warning downgrade
DIFF_ENABLE_TARGETED_SVR_WARN=0 DIFF_SCOPE=full
# Sensitivity study: override the global non-probability relative tolerance
DIFF_NONPROB_REL_TOL=2e-5 DIFF_SCOPE=full
# 4. Run coverage gate (checks line and function coverage thresholds)
# 5. Run Rust-vs-C performance benchmarks
BENCH_WARMUP=3 BENCH_RUNS=30
Understanding Differential Results
| Verdict | Meaning |
|---|---|
pass |
No parity issues detected under configured tolerances |
warn |
Non-fatal differences detected under explicit, documented policy rules |
fail |
Deterministic parity break — label mismatch or model divergence outside thresholds |
skip |
Configuration not executed (usually because training fails in both implementations) |
Current status (full scope, 250 configs): 236 pass, 4 warn, 0 fail, 10 skip.
The 4 warnings are:
housing_scale_s3_t2_tuned— epsilon-SVR near-parity training drift (bounded, cross-predict verified)gen_regression_sparse_scale_s4_t3_tuned— probability header (probA) driftgen_extreme_scale_scale_s0_t1_default— rho-only near-equivalence driftgen_extreme_scale_scale_s2_t1_default— one-class near-boundary label drift
All 10 skips are nu_svc on synthetic gen_binary_imbalanced.scale where both C and Rust fail training identically.
The active tolerance policy is differential-v3 (documented in reference/tolerance_policy.md).
Targeted Warning Policy
The epsilon-SVR warning for housing_scale_s3_t2_tuned has an intentionally narrow guard:
- Applies to one case ID only
- Non-probability drift bounds must hold:
max_rel <= 6e-5,max_abs <= 6e-4 - Model drift bounds must hold:
rho_rel <= 1e-5,max sv_coef abs diff <= 4e-3 - Cross-predict parity must hold in both directions:
- Rust predictor on C model matches C predictor on C model
- C predictor on Rust model matches Rust predictor on Rust model
This confirms the drift comes from training numerics, not prediction logic.
Verification Artifacts
| Category | Files |
|---|---|
| Lock & provenance | reference/libsvm_upstream_lock.json, reference/reference_provenance.json, reference/reference_build_report.md |
| Differential | reference/differential_results.json, reference/differential_report.md |
| Tolerance | reference/tolerance_policy.md |
| Coverage | reference/coverage_report.md |
| Performance | reference/benchmark_results.json, reference/benchmark_report.md |
| Datasets | reference/dataset_manifest.json, data/generated/ |
| Security | deny.toml, crates/libsvm/fuzz/README.md, CHANGELOG.md |
Rust vs C++ Timing Figure
Global timing comparison figure (train + predict, per-case ratios, and ratio distributions):

Statistical summary companion:
examples/comparison_summary.json
To regenerate performance data with stronger statistical confidence before plotting:
BENCH_WARMUP=3 BENCH_RUNS=20
Parity Claim
No hard differential failures under the documented default policy, with a small set of explicitly justified warnings. This is strong parity evidence, but not bitwise identity across all modes. Residual drift comes from training-side numerics (floating-point accumulation order, shrinking heuristic timing), not from prediction logic.
Architecture
crates/libsvm/src/
lib.rs — Module declarations and re-exports
types.rs — SvmNode, SvmProblem, SvmParameter, SvmModel, enums
error.rs — SvmError enum (thiserror)
io.rs — Problem/model file I/O (LIBSVM text format)
kernel.rs — Kernel evaluation (dot, powi, k_function, Kernel struct)
cache.rs — LRU kernel cache (Qfloat = f32)
qmatrix.rs — QMatrix trait + SvcQ, OneClassQ, SvrQ implementations
solver.rs — SMO solver (Standard + Nu variants, WSS3, shrinking)
train.rs — svm_train, solve dispatchers, class grouping
predict.rs — predict, predict_values, predict_probability
probability.rs — Platt scaling, multiclass probability, CV-based estimation
cross_validation.rs — svm_cross_validation (stratified + simple)
metrics.rs — accuracy_percentage, regression_metrics (MSE + R²)
util.rs — Shared helpers (group_classes, parse_feature_index, PRNG)
bins/
svm-train-rs/ — CLI matching C svm-train (all flags including -wi class weights)
svm-predict-rs/ — CLI matching C svm-predict (-b probability, -q quiet)
svm-scale-rs/ — CLI matching C svm-scale (3-pass, save/restore params)
Workspace Layout
The project uses a Cargo workspace:
crates/libsvm/— the library crate (libsvm-rson crates.io)bins/svm-train-rs/,bins/svm-predict-rs/,bins/svm-scale-rs/— CLI binariesvendor/libsvm/— upstream C source (for reference and vendored tests)data/— test datasets (heart_scale, iris.scale, housing_scale, generated synthetics)reference/— verification artifacts (differential results, provenance, benchmarks)scripts/— verification and testing scripts
Module Details
types.rs — Core Data Types
Types matching LIBSVM's svm.h:
| Type | Description | C++ equivalent |
|---|---|---|
SvmType |
Enum: CSvc, NuSvc, OneClass, EpsilonSvr, NuSvr | svm_type int constants |
KernelType |
Enum: Linear, Polynomial, Rbf, Sigmoid, Precomputed | kernel_type int constants |
SvmNode |
Sparse feature {index: i32, value: f64} |
struct svm_node |
SvmProblem |
Training data: labels: Vec<f64>, instances: Vec<Vec<SvmNode>> |
struct svm_problem |
SvmParameter |
All training parameters (defaults match LIBSVM) | struct svm_parameter |
SvmModel |
Trained model with SVs, coefficients, rho | struct svm_model |
Key design choice: SvmNode.index is i32 (not usize) to match LIBSVM's C int and preserve file format compatibility. Sentinel nodes (index == -1) used in C to terminate instance arrays are not stored — Rust's Vec::len() tracks instance length instead.
SvmParameter::default() produces the same defaults as LIBSVM: svm_type = CSvc, kernel_type = Rbf, degree = 3, gamma = 0 (auto-detected as 1/num_features), coef0 = 0, c = 1, nu = 0.5, eps = 0.001, cache_size = 100 MB, shrinking = true.
kernel.rs — Kernel Evaluation
Two evaluation paths exist because the training kernel needs swappable data references for shrinking, while prediction only needs stateless evaluation:
-
k_function(x, y, param)— Standalone stateless evaluation for prediction. Operates on sparse&[SvmNode]slices. Handles all 5 kernel types including precomputed (looks upx[0].valueas the precomputed kernel value). -
Kernelstruct — Training evaluator. StoresVec<&'a [SvmNode]>(borrowed slices into the original problem data) so the solver can swap data point references during shrinking without copying data. The C++ achieves the same viaconst_caston aconst svm_node **xpointer array.
Supported kernels:
| Type | Formula |
|---|---|
| Linear | dot(x, y) |
| Polynomial | (γ·dot(x,y) + coef0)^degree |
| RBF | exp(-γ·‖x-y‖²) |
| Sigmoid | tanh(γ·dot(x,y) + coef0) |
| Precomputed | x[j].value (user supplies full kernel matrix) |
RBF optimization: x_square[i] = dot(x[i], x[i]) is precomputed once during Kernel::new(), so the RBF kernel becomes:
K(i,j) = exp(-γ * (x_sq[i] + x_sq[j] - 2·dot(x[i], x[j])))
This avoids recomputing ‖x_i‖² on every kernel evaluation — a significant win since the kernel is evaluated millions of times during training.
Sparse dot product: The dot() function walks two sparse vectors simultaneously using a merge-join pattern on sorted indices, skipping non-overlapping features in O(nnz_x + nnz_y) time.
cache.rs — LRU Kernel Cache
LRU cache storing rows of the Q matrix as Qfloat (f32) values. Using f32 instead of f64 halves memory usage while providing sufficient precision for the cached kernel values (the solver accumulates gradients in f64).
Data structure: Index-based circular doubly-linked list (no unsafe). Each training instance gets a cache node. Node l (where l = number of instances) is the sentinel head of the LRU list. Nodes not currently in the LRU list have prev == NONE (usize::MAX).
Cache operations:
-
get_data(index, len): Returns(&mut [Qfloat], start)wherestartis how many elements were already cached. The caller (QMatrix) fillsdata[start..len]with fresh kernel evaluations. If the row isn't cached, allocates space (evicting LRU rows if needed) and returnsstart = 0. -
swap_index(i, j): Critical for solver shrinking. Three steps:- Swap row data and LRU positions for rows i and j
- Iterate all cached rows and swap column entries at positions i,j
- If a row covers the lower index but not the higher, evict it ("give up")
The column-swap loop in step 2 is essential for correctness — without it, the solver reads stale kernel values after shrinking swaps indices. This was a subtle bug during development that caused silent numerical drift.
qmatrix.rs — Q Matrix Implementations
The QMatrix trait bridges the Kernel and Solver, abstracting how kernel values are computed and cached for different SVM formulations:
Why &mut self on get_q: The C++ uses const methods with mutable cache fields. Rust doesn't allow this pattern with &self — the cache needs mutation. The Solver therefore owns Box<dyn QMatrix> and copies Q row data into its own buffers before the borrow ends, avoiding lifetime conflicts.
SvcQ — Classification Q Matrix
For C-SVC and nu-SVC:
Q[i][j] = y[i] * y[j] * K(x[i], x[j])stored asQfloat(f32)QD[i] = K(x[i], x[i])— always 1.0 for RBF kernelswap_indexswaps cache rows, kernel data references, label signs, and diagonal entries
OneClassQ — One-Class Q Matrix
For one-class SVM:
Q[i][j] = K(x[i], x[j])— no label scaling (all samples are treated as positive)- Same cache and swap logic as SvcQ, minus the label array
SvrQ — Regression Q Matrix
For epsilon-SVR and nu-SVR. The dual formulation has 2l variables (l = number of training samples):
- Variables
0..lcorrespond toα_iwithsign[k] = +1,index[k] = k - Variables
l..2lcorrespond toα*_iwithsign[k+l] = -1,index[k+l] = k
The kernel cache stores only l rows of actual kernel values (since K(x[i], x[j]) doesn't depend on whether we're looking at α or α*). get_q fetches the real kernel row, then reorders and applies signs:
buf[j] = sign[i] * sign[j] * data[index[j]]
Double buffer: Two buffers are needed because the solver requests Q_i and Q_j simultaneously during the alpha update step. If both wrote to the same buffer, the second get_q call would overwrite the first. swap_index only swaps sign, index, and qd — the kernel cache maps to real data indices via index[].
solver.rs — SMO Solver
The computational core of the library, translating Solver and Solver_NU from svm.cpp (lines 362–1265, ~900 lines of C++).
Solver variant: SolverVariant::Standard vs SolverVariant::Nu. 95% of the code is shared — only 4 methods differ:
| Method | Standard | Nu |
|---|---|---|
select_working_set |
Single I_up/I_low partition | Separate pos/neg class selection |
calculate_rho |
Average over free variables | (r1 - r2) / 2 from class-separated free vars |
do_shrinking |
Single Gmax bound | Separate pos/neg Gmax bounds |
be_shrunk |
Check against global bound | Check against class-specific bound |
Key constants:
| Constant | Value | Purpose |
|---|---|---|
TAU |
1e-12 | Floor for quad_coef to prevent division by zero in WSS |
INF |
f64::INFINITY |
Initial gradient bounds |
EPS |
From parameter | KKT violation tolerance (default 0.001) |
Initialization
- Copy input arrays (
alpha,y,p) into owned vectors - Compute
alpha_statusfor each variable:LowerBound(α=0),UpperBound(α=C), orFree - Initialize gradient:
G[i] = p[i] + Σ_j(alpha[j] * Q[i][j])for all non-lower-bound j - Initialize
G_bar:G_bar[i] = Σ_j(C_j * Q[i][j])for all upper-bound j
G_bar accelerates gradient reconstruction during unshrinking — when a variable transitions from upper-bound to free, G_bar provides the full gradient contribution without re-scanning all variables.
Main Loop
while iter < max_iter:
if counter expires: do_shrinking()
(i, j) = select_working_set() // returns None if optimal
if None:
reconstruct_gradient() // unshrink all variables
active_size = l // restore full problem
(i, j) = select_working_set() // retry with all variables
if None: break // truly optimal
update_alpha_pair(i, j) // analytic 2-variable solve
max_iter:max(10_000_000, 100*l)— same as C++- Shrink counter:
min(l, 1000)iterations between shrinking passes
Working Set Selection (WSS3)
Implements the WSS3 heuristic from Fan, Chen, and Lin (2005):
Standard variant — two-pass selection:
- Find i: maximizes
-y_i * G[i]among I_up (variables that can increase) - Find j: among I_low (variables that can decrease), minimizes the second-order approximation of objective decrease:
-(grad_diff²) / quad_coefwherequad_coef = QD[i] + QD[j] - 2·y_i·y_j·Q_ij
Nu variant: Partitions variables into positive and negative classes. Finds i_p/j_p (best positive pair) and i_n/j_n (best negative pair), then selects the pair with larger violation.
Returns None when the maximum KKT violation Gmax_i + Gmax_j < eps, indicating optimality.
Alpha Update
Analytic solution of the 2-variable sub-problem with box constraints [0, C]:
- Different labels (
y[i] ≠ y[j]): constraint linealpha[i] - alpha[j] = constant - Same labels (
y[i] = y[j]): constraint linealpha[i] + alpha[j] = constant
Both branches clip alpha to the feasible box and update gradients for all active variables:
G[k] += Q_i[k] * Δα_i + Q_j[k] * Δα_j for all k in active set
G_bar is updated incrementally when alpha_status changes (transition to/from upper bound).
Shrinking
Variables firmly at their bounds that are unlikely to change in subsequent iterations get "shrunk" — swapped to the end of the active set so they're excluded from WSS and gradient updates:
be_shrunk(i)checks if-y_i * G[i]exceeds the current violation bound by a sufficient margin- Active-from-back swap finds the first non-shrinkable variable from the end
Unshrink trigger: When Gmax1 + Gmax2 <= eps * 10 (approaching convergence), the solver reconstructs the full gradient for all variables and resets active_size = l. This ensures the final optimality check considers all variables.
Gradient reconstruction: Uses G_bar to efficiently compute the full gradient contribution from upper-bound variables without re-evaluating kernel values.
train.rs — Training Pipeline
Orchestrates the solver for all SVM types.
Solve Dispatchers
| Function | SVM Type | Solver Variant | Key Setup |
|---|---|---|---|
solve_c_svc |
C-SVC | Standard | p[i] = -1 for all i; after solving, alpha[i] *= y[i] to get signed coefficients |
solve_nu_svc |
nu-SVC | Nu | Alpha initialized proportionally from nu budget; after solving, rescale by 1/r |
solve_one_class |
One-class | Standard | alpha = [1,1,...,frac,0,...,0] where nu*l alphas are set to 1.0 |
solve_epsilon_svr |
epsilon-SVR | Standard | 2l variables; p[i] = ε - y_i, p[i+l] = ε + y_i |
solve_nu_svr |
nu-SVR | Nu | 2l variables; alpha initialized uniformly at C·nu·l / 2 |
Each dispatcher sets up the dual problem (initial alpha values, gradient vector p, bounds C_i), calls Solver::solve(), then extracts the solution (nonzero alphas become support vector coefficients, rho becomes the bias).
Class Grouping
svm_group_classes builds:
label[]: unique class labels in first-occurrence order (with -1/+1 swap to match C++ convention)count[]: number of samples per classstart[]: cumulative start index for each class in the permuted arrayperm[]: permutation mapping grouped indices back to original dataset indices
This grouping enables efficient one-vs-one pair extraction without copying data.
svm_train — Main Entry Point
Two branches:
-
Regression / one-class: Single call to
svm_train_one(), extract nonzero alphas as support vectors. -
Classification (one-vs-one):
- Group classes via
svm_group_classes - Compute weighted C per class (base C × class weight)
- For each of k*(k-1)/2 class pairs (i,j):
- Build sub-problem with only classes i and j
- Train binary classifier
- Mark nonzero support vectors
- Assemble
sv_coefmatrix:sv_coef[j-1][nz_start[i]..] = class_i coefficients
- Group classes via
Gamma auto-detection: If gamma == 0 (default), sets gamma = 1/max_feature_index where max_feature_index is the highest feature index seen in the training data.
Probability training: When param.probability == true, training invokes cross-validation internally:
- Binary classification: 5-fold CV to collect decision values, then fits a sigmoid (Platt scaling) via
sigmoid_train - Multiclass: pairwise probability estimation via
svm_binary_svc_probability - Regression: 5-fold CV to estimate sigma for Laplace correction
- One-class: 5-fold CV to estimate the positive density fraction
predict.rs — Prediction
predict_values
Core prediction function for all SVM types:
-
Classification (C-SVC, nu-SVC): Computes k*(k-1)/2 pairwise decision values
dec_value[p] = Σ sv_coef · K(x, sv) - rho[p], then runs one-vs-one voting. Returns the class with the most votes (ties broken by label ordering). -
Regression (epsilon-SVR, nu-SVR):
prediction = Σ sv_coef[i] * K(x, sv[i]) - rho[0] -
One-class: Returns
+1ifΣ sv_coef[i] * K(x, sv[i]) - rho[0] > 0, else-1.
predict_probability
Wraps predict_values with probability calibration:
- Binary classification: Applies Platt sigmoid
P(y=1|f) = 1 / (1 + exp(A·f + B))using probA/probB from the model. - Multiclass: Computes pairwise probabilities via Platt sigmoid, then solves the coupled probability system via
multiclass_probability()(iterative optimization matching LIBSVM's algorithm). - Regression: Returns Laplace-corrected probability
P = 1 / (2·sigma) · exp(-|y-f|/sigma). - One-class: Returns a density-based probability mark.
probability.rs — Probability Estimation
sigmoid_train / sigmoid_predict
Fits a sigmoid P(y=1|f) = 1 / (1 + exp(A·f + B)) to decision values using the algorithm from Platt (2000) with improvements from Lin, Lin, and Weng (2007). Uses Newton's method with backtracking line search to minimize the negative log-likelihood.
multiclass_probability
Given k*(k-1)/2 pairwise probabilities r[i][j], solves for class probabilities p[1]..p[k] that satisfy:
p[i] = Σ_{j≠i} r[i][j] · p[j] / (r[i][j] · p[j] + r[j][i] · p[i])
Uses an iterative method (100 max iterations, convergence at 1e-5) matching LIBSVM's multiclass_probability().
svm_binary_svc_probability
Runs internal 5-fold cross-validation, collects decision values for all training instances, then calls sigmoid_train to fit A and B parameters. Uses a deterministic PRNG seeded from LIBSVM's C rand() equivalent for fold shuffling.
cross_validation.rs — Cross-Validation
svm_cross_validation(problem, param, nr_fold) splits data, trains on k-1 folds, predicts on the held-out fold:
- Classification: Stratified folding — each fold has approximately the same class proportions as the full dataset. Uses the same deterministic shuffling as C LIBSVM.
- Regression / one-class: Simple sequential splitting.
Returns a Vec<f64> of predicted values aligned with the original problem ordering, so the caller can compute accuracy (classification) or MSE (regression).
io.rs — File I/O
Problem Files
LIBSVM sparse format: label index:value index:value ... per line. Features must have ascending indices. Missing features are implicitly zero (sparse representation).
For precomputed kernels: label 0:sample_id 1:K(x,x1) 2:K(x,x2) ... where index 0 holds the 1-based sample identifier.
Model Files
Two sections:
- Header: Key-value pairs (
svm_type,kernel_type,nr_class,total_sv,rho,label,nr_sv, optionallyprobA,probB) - SV section: After the
SVmarker, one line per support vector:coef1 [coef2 ...] index:value index:value ...
Float formatting uses %.17g-equivalent precision for model files (ensuring round-trip fidelity) and %g-equivalent for problem files. The Gfmt struct replicates C's %g format: dynamically chooses fixed vs scientific notation, strips trailing zeros, handles edge cases like -0 formatting.
Design Decisions
| Decision | Rationale |
|---|---|
SvmNode.index is i32 |
Matches C int for format compatibility; sentinel -1 not stored |
Qfloat = f32 |
Matches original, halves cache memory; gradients accumulated in f64 |
| Ownership replaces manual memory | No free_sv flag; SvmModel owns its support vectors |
Kernel.x is Vec<&[SvmNode]> |
Swappable references during shrinking without data copies |
| Solver uses enum variant, not traits | 95% shared code; cleaner than trait objects for 4 method overrides |
QMatrix::get_q takes &mut self |
Replaces C++ const method + mutable cache fields |
%.17g float formatting |
Bit-exact model file compatibility with C LIBSVM |
| Zero runtime deps | Only thiserror; rayon feature-gated for opt-in parallelism |
| Manual CLI arg parsing | No clap dependency; handles -wi composite flags matching C behavior |
| Deterministic PRNG | Replicates C's rand() for identical fold shuffling in CV/probability |
Numerical Equivalence Notes
- Gradient accumulation: Uses f64 for gradients (matching C++), f32 for cached Q values
- Quad_coef floor:
TAU = 1e-12prevents division by zero in working set selection - Alpha comparison:
alpha[i].abs() > 0.0for SV detection (exact zero check, matching C++) - Class label cast:
labels[i] as i32matches C's(int)prob->y[i]truncation behavior - Float formatting:
Gfmtstruct produces identical output to C'sprintf("%g", x)for all tested values - Target: numerical equivalence within ~1e-8 relative tolerance, not bitwise identity
- Known sources of drift: floating-point accumulation order differences between compilers, shrinking heuristic timing, and gradient reconstruction precision
Test Coverage
| Category | Tests | Description |
|---|---|---|
| Cache | 7 | LRU eviction, extend, swap with column updates |
| Kernel | 8 | All kernel types, struct/standalone agreement, sparse dot product |
| QMatrix | 4 | SvcQ sign/symmetry, OneClassQ, SvrQ double buffer |
| I/O | 8 | Problem parsing, model roundtrip, C format compatibility |
| Types | 8 | Parameter validation, nu-SVC feasibility checks |
| Predict | 3 | Heart_scale accuracy, C svm-predict output comparison |
| Train | 6 | C-SVC, multiclass, nu-SVC, one-class, epsilon-SVR, nu-SVR |
| Probability | 6 | Sigmoid fitting, binary/multiclass/regression probability |
| Cross-validation | 5 | Stratified CV, classification accuracy, regression MSE |
| Property | 7 | Proptest-based invariant checks (random params/data) |
| Metrics | 6 | Regression MSE/R², classification accuracy, edge cases |
| Util | 10 | group_classes, parse_feature_index, shuffle_range |
| CLI integration | 21 | Train/predict/scale end-to-end, flag permutation fuzzing |
| Differential | 250 | Full Rust-vs-C comparison matrix (via external suite) |
| Unit/integration | ~124 |
Coverage metrics: 93.19% line coverage, 92.86% function coverage (library crate).
Dependencies
- Runtime:
thiserror(error derive macros) - Optional:
rayon(feature-gated, for future parallel cross-validation) - Dev:
float-cmp(approximate float comparison),criterion(benchmarks),proptest(property-based testing)
Known Limitations
- Not bitwise identical: Numerical results match within ~1e-8 but are not bit-for-bit identical to C LIBSVM due to floating-point accumulation order differences.
- No GPU support: All computation is CPU-based.
- No incremental/online learning: Full retraining required for new data (same as upstream LIBSVM).
- Precomputed kernels require full matrix: The full n×n kernel matrix must be provided in memory.
Contributing
- Fork the repository
- Create a feature branch
- Run the test suite:
cargo test --workspace - Run the differential suite to verify parity:
python3 scripts/run_differential_suite.py - Submit a pull request
License
BSD-3-Clause, same family as original LIBSVM. See LICENSE.
References
- Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27.
- Platt, J.C. (2000). Probabilities for SV machines. In Advances in Large Margin Classifiers, MIT Press.
- Lin, H.-T., Lin, C.-J., and Weng, R.C. (2007). A note on Platt's probabilistic outputs for support vector machines. Machine Learning, 68(3):267–276.
- Fan, R.-E., Chen, P.-H., and Lin, C.-J. (2005). Working set selection using second order information for training support vector machines. JMLR, 6:1889–1918.