libsvm-rs

libsvm-rs is a pure Rust implementation of LIBSVM (v337, December 2025) focused on scientific reproducibility and upstream parity. It preserves LIBSVM model format and CLI behavior, ships reproducible Rust-vs-C/C++ comparison pipelines (including figures and benchmark summaries), and targets numerical equivalence (typically around 1e-8 tolerance) rather than bitwise identity against the upstream reference implementation.

Features

All 5 SVM types: C-SVC, nu-SVC, one-class SVM, epsilon-SVR, nu-SVR (-s 0..4)
All 5 kernel types: linear, polynomial, RBF, sigmoid, precomputed (-t 0..4)
Model file compatibility: reads and writes LIBSVM text format with %.17g precision, so models trained with the C library can be loaded in Rust and vice versa
Probability estimation (-b 1): Platt scaling for binary classification, pairwise coupling for multiclass, Laplace-corrected predictions for regression, density-based marks for one-class
Cross-validation: stratified k-fold for classification (preserves class proportions), simple k-fold for regression/one-class
CLI tools: svm-train-rs, svm-predict-rs, svm-scale-rs — drop-in replacements matching upstream flag syntax
Minimal dependencies: zero runtime deps beyond thiserror; rayon is feature-gated for future parallel cross-validation
Verification pipeline: upstream parity locked to LIBSVM v337 with 250-configuration differential testing across all SVM types, kernel types, datasets, and parameter variations

Installation

As a library

[dependencies]
libsvm-rs = "0.6"

CLI tools

# Clone and build from source (CLIs are workspace members, not published separately)
git clone https://github.com/ricardofrantz/libsvm-rs.git
cd libsvm-rs
cargo build --release -p svm-train-rs -p svm-predict-rs -p svm-scale-rs
# Binaries are in target/release/

Quick Start

Library API

use libsvm_rs::io::{load_problem, save_model, load_model};
use libsvm_rs::predict::{predict, predict_values};
use libsvm_rs::train::svm_train;
use libsvm_rs::{KernelType, SvmParameter, SvmType};
use std::path::Path;

// Load training data in LIBSVM sparse format
let problem = load_problem(Path::new("data/heart_scale")).unwrap();

// Configure parameters (defaults match LIBSVM)
let mut param = SvmParameter::default();
param.svm_type = SvmType::CSvc;
param.kernel_type = KernelType::Rbf;
param.gamma = 1.0 / 13.0;  // 1/num_features
param.c = 1.0;

// Train
let model = svm_train(&problem, &param);

// Predict a single instance
let label = predict(&model, &problem.instances[0]);
println!("predicted label: {label}");

// Get decision values (useful for ranking or custom thresholds)
let mut dec_values = vec![0.0f64; model.nr_class * (model.nr_class - 1) / 2];
let label = predict_values(&model, &problem.instances[0], &mut dec_values);
println!("decision values: {dec_values:?}");

// Save/load models (compatible with C LIBSVM format)
save_model(Path::new("heart_scale.model"), &model).unwrap();
let loaded = load_model(Path::new("heart_scale.model")).unwrap();

Cross-Validation

use libsvm_rs::cross_validation::svm_cross_validation;

let targets = svm_cross_validation(&problem, &param, 5);  // 5-fold CV
let accuracy = targets.iter().zip(&problem.labels)
    .filter(|(pred, actual)| (*pred - *actual).abs() < 1e-10)
    .count() as f64 / problem.labels.len() as f64;
println!("CV accuracy: {:.2}%", accuracy * 100.0);

Probability Estimation

use libsvm_rs::predict::predict_probability;

// Enable probability estimation during training
let mut param = SvmParameter::default();
param.probability = true;
param.kernel_type = KernelType::Rbf;
param.gamma = 1.0 / 13.0;

let model = svm_train(&problem, &param);
if let Some((label, probs)) = predict_probability(&model, &problem.instances[0]) {
    println!("label: {label}, class probabilities: {probs:?}");
}

Extended Examples

For structured, runnable example suites see:

examples/README.md — index of all examples
examples/basics/ — minimal starter examples
examples/api/ — persistence, CV/grid-search, Iris workflow
examples/integrations/ — prediction server + wasm inference integrations
examples/scientific/ — benchmark-heavy Rust-vs-C++ demos

CLI Usage

The CLI tools accept the same flags as upstream LIBSVM:

svm-train-rs

# Default C-SVC with RBF kernel
svm-train-rs data/heart_scale

# nu-SVC with linear kernel, 5-fold cross-validation
svm-train-rs -s 1 -t 0 -v 5 data/heart_scale

# epsilon-SVR with RBF, custom C and gamma
svm-train-rs -s 3 -t 2 -c 10 -g 0.1 data/heart_scale

# With probability estimation
svm-train-rs -b 1 data/heart_scale

# With class weights for imbalanced data
svm-train-rs -w1 2.0 -w-1 0.5 data/heart_scale

# Quiet mode (suppress training progress)
svm-train-rs -q data/heart_scale

svm-predict-rs

# Standard prediction
svm-predict-rs test_data model_file output_file

# With probability estimates
svm-predict-rs -b 1 test_data model_file output_file

# Quiet mode
svm-predict-rs -q test_data model_file output_file

svm-scale-rs

# Scale features to [0, 1]
svm-scale-rs -l 0 -u 1 data/heart_scale > scaled.txt

# Scale to [-1, 1] (default)
svm-scale-rs data/heart_scale > scaled.txt

# Save scaling parameters for later use on test data
svm-scale-rs -s range.txt data/heart_scale > train_scaled.txt
svm-scale-rs -r range.txt data/test_data > test_scaled.txt

# Scale specific feature range
svm-scale-rs -l 0 -u 1 -y -1 1 data/heart_scale > scaled.txt

Verification Pipeline

Overview

Upstream parity is locked to LIBSVM v337 (December 2025) via reference/libsvm_upstream_lock.json. The differential verification suite builds the upstream C binary from a pinned Git commit, runs both implementations on identical datasets with identical parameters, and compares:

Prediction labels: exact match for classification, tolerance-bounded for regression
Decision values: relative and absolute tolerance checks
Model structure: number of SVs, rho values, sv_coef values
Probability outputs: probA/probB parameters, probability predictions

Running Verification

# 1. Validate that the upstream lock file is consistent
bash scripts/check_libsvm_reference_lock.sh

# 2. Build the pinned upstream C reference binary and record provenance
bash scripts/setup_reference_libsvm.sh

# 3. Run differential verification
#    Quick scope: 45 configs (canonical datasets × SVM types × kernel types)
python3 scripts/run_differential_suite.py

#    Full scope: 250 configs (canonical + generated + tuned parameters)
DIFF_SCOPE=full python3 scripts/run_differential_suite.py

#    Strict mode: disable the targeted SVR warning downgrade
DIFF_ENABLE_TARGETED_SVR_WARN=0 DIFF_SCOPE=full python3 scripts/run_differential_suite.py

#    Sensitivity study: override the global non-probability relative tolerance
DIFF_NONPROB_REL_TOL=2e-5 DIFF_SCOPE=full python3 scripts/run_differential_suite.py

# 4. Run coverage gate (checks line and function coverage thresholds)
bash scripts/check_coverage_thresholds.sh

# 5. Run Rust-vs-C performance benchmarks
BENCH_WARMUP=3 BENCH_RUNS=30 python3 scripts/benchmark_compare.py

Understanding Differential Results

Verdict	Meaning
`pass`	No parity issues detected under configured tolerances
`warn`	Non-fatal differences detected under explicit, documented policy rules
`fail`	Deterministic parity break — label mismatch or model divergence outside thresholds
`skip`	Configuration not executed (usually because training fails in both implementations)

Current status (full scope, 250 configs): 236 pass, 4 warn, 0 fail, 10 skip.

The 4 warnings are:

housing_scale_s3_t2_tuned — epsilon-SVR near-parity training drift (bounded, cross-predict verified)
gen_regression_sparse_scale_s4_t3_tuned — probability header (probA) drift
gen_extreme_scale_scale_s0_t1_default — rho-only near-equivalence drift
gen_extreme_scale_scale_s2_t1_default — one-class near-boundary label drift

All 10 skips are nu_svc on synthetic gen_binary_imbalanced.scale where both C and Rust fail training identically.

The active tolerance policy is differential-v3 (documented in reference/tolerance_policy.md).

Targeted Warning Policy

The epsilon-SVR warning for housing_scale_s3_t2_tuned has an intentionally narrow guard:

Applies to one case ID only
Non-probability drift bounds must hold: max_rel <= 6e-5, max_abs <= 6e-4
Model drift bounds must hold: rho_rel <= 1e-5, max sv_coef abs diff <= 4e-3
Cross-predict parity must hold in both directions:
- Rust predictor on C model matches C predictor on C model
- C predictor on Rust model matches Rust predictor on Rust model

This confirms the drift comes from training numerics, not prediction logic.

Verification Artifacts

Category	Files
Lock & provenance	`reference/libsvm_upstream_lock.json`, `reference/reference_provenance.json`, `reference/reference_build_report.md`
Differential	`reference/differential_results.json`, `reference/differential_report.md`
Tolerance	`reference/tolerance_policy.md`
Coverage	`reference/coverage_report.md`
Performance	`reference/benchmark_results.json`, `reference/benchmark_report.md`
Datasets	`reference/dataset_manifest.json`, `data/generated/`
Security	`SECURITY_AUDIT.md`

Rust vs C++ Timing Figure

Global timing comparison figure (train + predict, per-case ratios, and ratio distributions):

Rust vs C++ timing comparison

Statistical summary companion:

examples/comparison_summary.json

To regenerate performance data with stronger statistical confidence before plotting:

BENCH_WARMUP=3 BENCH_RUNS=20 python3 scripts/benchmark_compare.py
python3 examples/common/make_comparison_figure.py --root . --out examples/comparison.png --summary examples/comparison_summary.json

Parity Claim

No hard differential failures under the documented default policy, with a small set of explicitly justified warnings. This is strong parity evidence, but not bitwise identity across all modes. Residual drift comes from training-side numerics (floating-point accumulation order, shrinking heuristic timing), not from prediction logic.

Architecture

crates/libsvm/src/
  lib.rs              — Module declarations and re-exports
  types.rs            — SvmNode, SvmProblem, SvmParameter, SvmModel, enums
  error.rs            — SvmError enum (thiserror)
  io.rs               — Problem/model file I/O (LIBSVM text format)
  kernel.rs           — Kernel evaluation (dot, powi, k_function, Kernel struct)
  cache.rs            — LRU kernel cache (Qfloat = f32)
  qmatrix.rs          — QMatrix trait + SvcQ, OneClassQ, SvrQ implementations
  solver.rs           — SMO solver (Standard + Nu variants, WSS3, shrinking)
  train.rs            — svm_train, solve dispatchers, class grouping
  predict.rs          — predict, predict_values, predict_probability
  probability.rs      — Platt scaling, multiclass probability, CV-based estimation
  cross_validation.rs — svm_cross_validation (stratified + simple)

bins/
  svm-train-rs/       — CLI matching C svm-train (all flags including -wi class weights)
  svm-predict-rs/     — CLI matching C svm-predict (-b probability, -q quiet)
  svm-scale-rs/       — CLI matching C svm-scale (3-pass, save/restore params)

Workspace Layout

The project uses a Cargo workspace:

crates/libsvm/ — the library crate (libsvm-rs on crates.io)
bins/svm-train-rs/, bins/svm-predict-rs/, bins/svm-scale-rs/ — CLI binaries
vendor/libsvm/ — upstream C source (for reference and vendored tests)
data/ — test datasets (heart_scale, iris.scale, housing_scale, generated synthetics)
reference/ — verification artifacts (differential results, provenance, benchmarks)
scripts/ — verification and testing scripts

Module Details

types.rs — Core Data Types

Types matching LIBSVM's svm.h:

Type	Description	C++ equivalent
`SvmType`	Enum: CSvc, NuSvc, OneClass, EpsilonSvr, NuSvr	`svm_type` int constants
`KernelType`	Enum: Linear, Polynomial, Rbf, Sigmoid, Precomputed	`kernel_type` int constants
`SvmNode`	Sparse feature `{index: i32, value: f64}`	`struct svm_node`
`SvmProblem`	Training data: `labels: Vec<f64>`, `instances: Vec<Vec<SvmNode>>`	`struct svm_problem`
`SvmParameter`	All training parameters (defaults match LIBSVM)	`struct svm_parameter`
`SvmModel`	Trained model with SVs, coefficients, rho	`struct svm_model`

Key design choice: SvmNode.index is i32 (not usize) to match LIBSVM's C int and preserve file format compatibility. Sentinel nodes (index == -1) used in C to terminate instance arrays are not stored — Rust's Vec::len() tracks instance length instead.

SvmParameter::default() produces the same defaults as LIBSVM: svm_type = CSvc, kernel_type = Rbf, degree = 3, gamma = 0 (auto-detected as 1/num_features), coef0 = 0, c = 1, nu = 0.5, eps = 0.001, cache_size = 100 MB, shrinking = true.

kernel.rs — Kernel Evaluation

Two evaluation paths exist because the training kernel needs swappable data references for shrinking, while prediction only needs stateless evaluation:

k_function(x, y, param) — Standalone stateless evaluation for prediction. Operates on sparse &[SvmNode] slices. Handles all 5 kernel types including precomputed (looks up x[0].value as the precomputed kernel value).
Kernel struct — Training evaluator. Stores Vec<&'a [SvmNode]> (borrowed slices into the original problem data) so the solver can swap data point references during shrinking without copying data. The C++ achieves the same via const_cast on a const svm_node **x pointer array.

Supported kernels:

Type	Formula
Linear	`dot(x, y)`
Polynomial	`(γ·dot(x,y) + coef0)^degree`
RBF	`exp(-γ·‖x-y‖²)`
Sigmoid	`tanh(γ·dot(x,y) + coef0)`
Precomputed	`x[j].value` (user supplies full kernel matrix)

RBF optimization: x_square[i] = dot(x[i], x[i]) is precomputed once during Kernel::new(), so the RBF kernel becomes:

K(i,j) = exp(-γ * (x_sq[i] + x_sq[j] - 2·dot(x[i], x[j])))

This avoids recomputing ‖x_i‖² on every kernel evaluation — a significant win since the kernel is evaluated millions of times during training.

Sparse dot product: The dot() function walks two sparse vectors simultaneously using a merge-join pattern on sorted indices, skipping non-overlapping features in O(nnz_x + nnz_y) time.

cache.rs — LRU Kernel Cache

LRU cache storing rows of the Q matrix as Qfloat (f32) values. Using f32 instead of f64 halves memory usage while providing sufficient precision for the cached kernel values (the solver accumulates gradients in f64).

Data structure: Index-based circular doubly-linked list (no unsafe). Each training instance gets a cache node. Node l (where l = number of instances) is the sentinel head of the LRU list. Nodes not currently in the LRU list have prev == NONE (usize::MAX).

Cache operations:

get_data(index, len): Returns (&mut [Qfloat], start) where start is how many elements were already cached. The caller (QMatrix) fills data[start..len] with fresh kernel evaluations. If the row isn't cached, allocates space (evicting LRU rows if needed) and returns start = 0.
swap_index(i, j): Critical for solver shrinking. Three steps:
1. Swap row data and LRU positions for rows i and j
2. Iterate all cached rows and swap column entries at positions i,j
3. If a row covers the lower index but not the higher, evict it ("give up")
The column-swap loop in step 2 is essential for correctness — without it, the solver reads stale kernel values after shrinking swaps indices. This was a subtle bug during development that caused silent numerical drift.

qmatrix.rs — Q Matrix Implementations

The QMatrix trait bridges the Kernel and Solver, abstracting how kernel values are computed and cached for different SVM formulations:

pub trait QMatrix {
    fn get_q(&mut self, i: usize, len: usize) -> &[Qfloat];
    fn get_qd(&self) -> &[f64];
    fn swap_index(&mut self, i: usize, j: usize);
}

Why &mut self on get_q: The C++ uses const methods with mutable cache fields. Rust doesn't allow this pattern with &self — the cache needs mutation. The Solver therefore owns Box<dyn QMatrix> and copies Q row data into its own buffers before the borrow ends, avoiding lifetime conflicts.

SvcQ — Classification Q Matrix

For C-SVC and nu-SVC:

Q[i][j] = y[i] * y[j] * K(x[i], x[j]) stored as Qfloat (f32)
QD[i] = K(x[i], x[i]) — always 1.0 for RBF kernel
swap_index swaps cache rows, kernel data references, label signs, and diagonal entries

OneClassQ — One-Class Q Matrix

For one-class SVM:

Q[i][j] = K(x[i], x[j]) — no label scaling (all samples are treated as positive)
Same cache and swap logic as SvcQ, minus the label array

SvrQ — Regression Q Matrix

For epsilon-SVR and nu-SVR. The dual formulation has 2l variables (l = number of training samples):

Variables 0..l correspond to α_i with sign[k] = +1, index[k] = k
Variables l..2l correspond to α*_i with sign[k+l] = -1, index[k+l] = k

The kernel cache stores only l rows of actual kernel values (since K(x[i], x[j]) doesn't depend on whether we're looking at α or α*). get_q fetches the real kernel row, then reorders and applies signs:

buf[j] = sign[i] * sign[j] * data[index[j]]

Double buffer: Two buffers are needed because the solver requests Q_i and Q_j simultaneously during the alpha update step. If both wrote to the same buffer, the second get_q call would overwrite the first. swap_index only swaps sign, index, and qd — the kernel cache maps to real data indices via index[].

solver.rs — SMO Solver

The computational core of the library, translating Solver and Solver_NU from svm.cpp (lines 362–1265, ~900 lines of C++).

Solver variant: SolverVariant::Standard vs SolverVariant::Nu. 95% of the code is shared — only 4 methods differ:

Method	Standard	Nu
`select_working_set`	Single I_up/I_low partition	Separate pos/neg class selection
`calculate_rho`	Average over free variables	`(r1 - r2) / 2` from class-separated free vars
`do_shrinking`	Single Gmax bound	Separate pos/neg Gmax bounds
`be_shrunk`	Check against global bound	Check against class-specific bound

Key constants:

Constant	Value	Purpose
`TAU`	1e-12	Floor for `quad_coef` to prevent division by zero in WSS
`INF`	`f64::INFINITY`	Initial gradient bounds
`EPS`	From parameter	KKT violation tolerance (default 0.001)

Initialization

Copy input arrays (alpha, y, p) into owned vectors
Compute alpha_status for each variable: LowerBound (α=0), UpperBound (α=C), or Free
Initialize gradient: G[i] = p[i] + Σ_j(alpha[j] * Q[i][j]) for all non-lower-bound j
Initialize G_bar: G_bar[i] = Σ_j(C_j * Q[i][j]) for all upper-bound j

G_bar accelerates gradient reconstruction during unshrinking — when a variable transitions from upper-bound to free, G_bar provides the full gradient contribution without re-scanning all variables.

Main Loop

while iter < max_iter:
    if counter expires: do_shrinking()
    (i, j) = select_working_set()       // returns None if optimal
    if None:
        reconstruct_gradient()           // unshrink all variables
        active_size = l                  // restore full problem
        (i, j) = select_working_set()    // retry with all variables
        if None: break                   // truly optimal
    update_alpha_pair(i, j)              // analytic 2-variable solve

max_iter: max(10_000_000, 100*l) — same as C++
Shrink counter: min(l, 1000) iterations between shrinking passes

Working Set Selection (WSS3)

Implements the WSS3 heuristic from Fan, Chen, and Lin (2005):

Standard variant — two-pass selection:

Find i: maximizes -y_i * G[i] among I_up (variables that can increase)
Find j: among I_low (variables that can decrease), minimizes the second-order approximation of objective decrease: -(grad_diff²) / quad_coef where quad_coef = QD[i] + QD[j] - 2·y_i·y_j·Q_ij

Nu variant: Partitions variables into positive and negative classes. Finds i_p/j_p (best positive pair) and i_n/j_n (best negative pair), then selects the pair with larger violation.

Returns None when the maximum KKT violation Gmax_i + Gmax_j < eps, indicating optimality.

Alpha Update

Analytic solution of the 2-variable sub-problem with box constraints [0, C]:

Different labels (y[i] ≠ y[j]): constraint line alpha[i] - alpha[j] = constant
Same labels (y[i] = y[j]): constraint line alpha[i] + alpha[j] = constant

Both branches clip alpha to the feasible box and update gradients for all active variables:

G[k] += Q_i[k] * Δα_i + Q_j[k] * Δα_j    for all k in active set

G_bar is updated incrementally when alpha_status changes (transition to/from upper bound).

Shrinking

Variables firmly at their bounds that are unlikely to change in subsequent iterations get "shrunk" — swapped to the end of the active set so they're excluded from WSS and gradient updates:

be_shrunk(i) checks if -y_i * G[i] exceeds the current violation bound by a sufficient margin
Active-from-back swap finds the first non-shrinkable variable from the end

Unshrink trigger: When Gmax1 + Gmax2 <= eps * 10 (approaching convergence), the solver reconstructs the full gradient for all variables and resets active_size = l. This ensures the final optimality check considers all variables.

Gradient reconstruction: Uses G_bar to efficiently compute the full gradient contribution from upper-bound variables without re-evaluating kernel values.

train.rs — Training Pipeline

Orchestrates the solver for all SVM types.

Solve Dispatchers

Function	SVM Type	Solver Variant	Key Setup
`solve_c_svc`	C-SVC	Standard	`p[i] = -1` for all i; after solving, `alpha[i] *= y[i]` to get signed coefficients
`solve_nu_svc`	nu-SVC	Nu	Alpha initialized proportionally from nu budget; after solving, rescale by `1/r`
`solve_one_class`	One-class	Standard	`alpha = [1,1,...,frac,0,...,0]` where `nu*l` alphas are set to 1.0
`solve_epsilon_svr`	epsilon-SVR	Standard	2l variables; `p[i] = ε - y_i`, `p[i+l] = ε + y_i`
`solve_nu_svr`	nu-SVR	Nu	2l variables; alpha initialized uniformly at `C·nu·l / 2`

Each dispatcher sets up the dual problem (initial alpha values, gradient vector p, bounds C_i), calls Solver::solve(), then extracts the solution (nonzero alphas become support vector coefficients, rho becomes the bias).

Class Grouping

svm_group_classes builds:

label[]: unique class labels in first-occurrence order (with -1/+1 swap to match C++ convention)
count[]: number of samples per class
start[]: cumulative start index for each class in the permuted array
perm[]: permutation mapping grouped indices back to original dataset indices

This grouping enables efficient one-vs-one pair extraction without copying data.

svm_train — Main Entry Point

Two branches:

Regression / one-class: Single call to svm_train_one(), extract nonzero alphas as support vectors.
Classification (one-vs-one):
- Group classes via svm_group_classes
- Compute weighted C per class (base C × class weight)
- For each of k*(k-1)/2 class pairs (i,j):
  - Build sub-problem with only classes i and j
  - Train binary classifier
  - Mark nonzero support vectors
- Assemble sv_coef matrix: sv_coef[j-1][nz_start[i]..] = class_i coefficients

Gamma auto-detection: If gamma == 0 (default), sets gamma = 1/max_feature_index where max_feature_index is the highest feature index seen in the training data.

Probability training: When param.probability == true, training invokes cross-validation internally:

Binary classification: 5-fold CV to collect decision values, then fits a sigmoid (Platt scaling) via sigmoid_train
Multiclass: pairwise probability estimation via svm_binary_svc_probability
Regression: 5-fold CV to estimate sigma for Laplace correction
One-class: 5-fold CV to estimate the positive density fraction

predict.rs — Prediction

predict_values

Core prediction function for all SVM types:

Classification (C-SVC, nu-SVC): Computes k*(k-1)/2 pairwise decision values dec_value[p] = Σ sv_coef · K(x, sv) - rho[p], then runs one-vs-one voting. Returns the class with the most votes (ties broken by label ordering).
Regression (epsilon-SVR, nu-SVR): prediction = Σ sv_coef[i] * K(x, sv[i]) - rho[0]
One-class: Returns +1 if Σ sv_coef[i] * K(x, sv[i]) - rho[0] > 0, else -1.

predict_probability

Wraps predict_values with probability calibration:

Binary classification: Applies Platt sigmoid P(y=1|f) = 1 / (1 + exp(A·f + B)) using probA/probB from the model.
Multiclass: Computes pairwise probabilities via Platt sigmoid, then solves the coupled probability system via multiclass_probability() (iterative optimization matching LIBSVM's algorithm).
Regression: Returns Laplace-corrected probability P = 1 / (2·sigma) · exp(-|y-f|/sigma).
One-class: Returns a density-based probability mark.

probability.rs — Probability Estimation

sigmoid_train / sigmoid_predict

Fits a sigmoid P(y=1|f) = 1 / (1 + exp(A·f + B)) to decision values using the algorithm from Platt (2000) with improvements from Lin, Lin, and Weng (2007). Uses Newton's method with backtracking line search to minimize the negative log-likelihood.

multiclass_probability

Given k*(k-1)/2 pairwise probabilities r[i][j], solves for class probabilities p[1]..p[k] that satisfy:

p[i] = Σ_{j≠i} r[i][j] · p[j]  /  (r[i][j] · p[j] + r[j][i] · p[i])

Uses an iterative method (100 max iterations, convergence at 1e-5) matching LIBSVM's multiclass_probability().

svm_binary_svc_probability

Runs internal 5-fold cross-validation, collects decision values for all training instances, then calls sigmoid_train to fit A and B parameters. Uses a deterministic PRNG seeded from LIBSVM's C rand() equivalent for fold shuffling.

cross_validation.rs — Cross-Validation

svm_cross_validation(problem, param, nr_fold) splits data, trains on k-1 folds, predicts on the held-out fold:

Classification: Stratified folding — each fold has approximately the same class proportions as the full dataset. Uses the same deterministic shuffling as C LIBSVM.
Regression / one-class: Simple sequential splitting.

Returns a Vec<f64> of predicted values aligned with the original problem ordering, so the caller can compute accuracy (classification) or MSE (regression).

io.rs — File I/O

Problem Files

LIBSVM sparse format: label index:value index:value ... per line. Features must have ascending indices. Missing features are implicitly zero (sparse representation).

For precomputed kernels: label 0:sample_id 1:K(x,x1) 2:K(x,x2) ... where index 0 holds the 1-based sample identifier.

Model Files

Two sections:

Header: Key-value pairs (svm_type, kernel_type, nr_class, total_sv, rho, label, nr_sv, optionally probA, probB)
SV section: After the SV marker, one line per support vector: coef1 [coef2 ...] index:value index:value ...

Float formatting uses %.17g-equivalent precision for model files (ensuring round-trip fidelity) and %g-equivalent for problem files. The Gfmt struct replicates C's %g format: dynamically chooses fixed vs scientific notation, strips trailing zeros, handles edge cases like -0 formatting.

Design Decisions

Decision	Rationale
`SvmNode.index` is `i32`	Matches C `int` for format compatibility; sentinel `-1` not stored
`Qfloat = f32`	Matches original, halves cache memory; gradients accumulated in f64
Ownership replaces manual memory	No `free_sv` flag; `SvmModel` owns its support vectors
`Kernel.x` is `Vec<&[SvmNode]>`	Swappable references during shrinking without data copies
Solver uses enum variant, not traits	95% shared code; cleaner than trait objects for 4 method overrides
`QMatrix::get_q` takes `&mut self`	Replaces C++ `const` method + `mutable` cache fields
`%.17g` float formatting	Bit-exact model file compatibility with C LIBSVM
Zero runtime deps	Only `thiserror`; `rayon` feature-gated for opt-in parallelism
Manual CLI arg parsing	No `clap` dependency; handles `-wi` composite flags matching C behavior
Deterministic PRNG	Replicates C's `rand()` for identical fold shuffling in CV/probability

Numerical Equivalence Notes

Gradient accumulation: Uses f64 for gradients (matching C++), f32 for cached Q values
Quad_coef floor: TAU = 1e-12 prevents division by zero in working set selection
Alpha comparison: alpha[i].abs() > 0.0 for SV detection (exact zero check, matching C++)
Class label cast: labels[i] as i32 matches C's (int)prob->y[i] truncation behavior
Float formatting: Gfmt struct produces identical output to C's printf("%g", x) for all tested values
Target: numerical equivalence within ~1e-8 relative tolerance, not bitwise identity
Known sources of drift: floating-point accumulation order differences between compilers, shrinking heuristic timing, and gradient reconstruction precision

Test Coverage

Category	Tests	Description
Cache	7	LRU eviction, extend, swap with column updates
Kernel	8	All kernel types, struct/standalone agreement, sparse dot product
QMatrix	4	SvcQ sign/symmetry, OneClassQ, SvrQ double buffer
I/O	8	Problem parsing, model roundtrip, C format compatibility
Types	8	Parameter validation, nu-SVC feasibility checks
Predict	3	Heart_scale accuracy, C svm-predict output comparison
Train	6	C-SVC, multiclass, nu-SVC, one-class, epsilon-SVR, nu-SVR
Probability	6	Sigmoid fitting, binary/multiclass/regression probability
Cross-validation	5	Stratified CV, classification accuracy, regression MSE
Property	7	Proptest-based invariant checks (random params/data)
CLI integration	6	Train/predict/scale end-to-end via subprocess
Differential	250	Full Rust-vs-C comparison matrix (via external suite)
Unit/integration	~98

Coverage metrics: 93.19% line coverage, 92.86% function coverage (library crate).

Dependencies

Runtime: thiserror (error derive macros)
Optional: rayon (feature-gated, for future parallel cross-validation)
Dev: float-cmp (approximate float comparison), criterion (benchmarks), proptest (property-based testing)

Known Limitations

Not bitwise identical: Numerical results match within ~1e-8 but are not bit-for-bit identical to C LIBSVM due to floating-point accumulation order differences.
No GPU support: All computation is CPU-based.
No incremental/online learning: Full retraining required for new data (same as upstream LIBSVM).
Precomputed kernels require full matrix: The full n×n kernel matrix must be provided in memory.

Contributing

Fork the repository
Create a feature branch
Run the test suite: cargo test --workspace
Run the differential suite to verify parity: python3 scripts/run_differential_suite.py
Submit a pull request

License

BSD-3-Clause, same family as original LIBSVM. See LICENSE.

References

Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27.
Platt, J.C. (2000). Probabilities for SV machines. In Advances in Large Margin Classifiers, MIT Press.
Lin, H.-T., Lin, C.-J., and Weng, R.C. (2007). A note on Platt's probabilistic outputs for support vector machines. Machine Learning, 68(3):267–276.
Fan, R.-E., Chen, P.-H., and Lin, C.-J. (2005). Working set selection using second order information for training support vector machines. JMLR, 6:1889–1918.

libsvm-rs 0.7.0