rvf-federation
Privacy-preserving federated transfer learning for the RVF format.
= "0.1"
RuVector users independently accumulate learning patterns -- SONA weight trajectories, policy kernel configurations, domain expansion priors, HNSW tuning parameters. Today that learning is siloed. rvf-federation implements the inter-user federation layer defined in ADR-057: it strips PII, injects differential privacy noise, packages transferable learning as RVF segments, and merges incoming learning with formal privacy guarantees.
| rvf-federation | Siloed learning | Manual sharing | |
|---|---|---|---|
| Privacy | 3-stage PII stripping + calibrated DP noise | N/A -- nothing leaves the machine | Trust the sender |
| Knowledge reuse | New users bootstrap from community priors | Every deployment starts cold | Copy-paste config files |
| Integrity | Witness chain + Ed25519/ML-DSA-65 signatures | N/A | No verification |
| Aggregation | FedAvg, FedProx, Byzantine-tolerant averaging | N/A | Manual merge |
| Privacy accounting | RDP composition with formal epsilon budget | N/A | N/A |
Quick Start
use ;
// 1. Build an export from local learning
let priors = TransferPriorSet ;
// 2. Configure differential privacy (epsilon=1.0, delta=1e-5)
let mut dp = gaussian.unwrap;
// 3. Build: PII strip -> DP noise -> assemble manifest
let export = new
.with_policy
.add_priors
.add_string_field
.build
.unwrap;
assert_eq!;
assert!; // PII was stripped
assert!; // DP noise was applied
Key Features
| Feature | What It Does | Why It Matters |
|---|---|---|
| PII stripping | 3-stage pipeline: detect, redact, attest | No personal data leaves the local machine |
| Differential privacy | Gaussian/Laplace noise with RDP accounting | Formal mathematical privacy guarantee per export |
| Gradient clipping | Bound L2 norms before aggregation | Limits any single user's influence on the aggregate |
| FedAvg / FedProx | Federated averaging with optional proximal term | Industry-standard aggregation (McMahan et al. 2017) |
| Byzantine tolerance | Outlier detection by L2-norm z-score | Malicious contributions are excluded automatically |
| Version-aware merging | Dampened confidence for cross-version imports | Older learning still helps, with reduced weight |
| Selective sharing | Allowlist/denylist for segments and domains | Users control exactly what they share |
Architecture
Local Engine Remote
+------------------+ +------------+ +---------+ +----------+
| TransferPriors |--->| |--->| |---->| |
| PolicyKernels | | PII Strip | | DP | | RVF | Registry
| CostCurves | | (3-stage) | | Noise | | Export |----> (GCS)
| LoRA Weights | | | | | | Builder | |
+------------------+ +------------+ +---------+ +----------+ |
v
+------------------+ +------------+ +---------+ +----------+ +--------+
| Merged Learning |<---| Version- |<---| Import |<----| Validate |<-| Import |
| (local engines) | | Aware | | Merger | | (sig + | | (pull) |
| | | Merge | | | | witness) | +--------+
+------------------+ +------------+ +---------+ +----------+
Modules
| Module | Description |
|---|---|
types |
Four new RVF segment payload types (0x33-0x36) plus federation data structures |
error |
15 error variants covering privacy, validation, aggregation, and I/O failures |
pii_strip |
Three-stage PII stripping pipeline with 12 built-in detection rules |
diff_privacy |
Gaussian/Laplace noise engines, gradient clipping, RDP privacy accountant |
federation |
ExportBuilder and ImportMerger implementing the ADR-057 transfer protocol |
aggregate |
FederatedAggregator with FedAvg, FedProx, and Byzantine-tolerant strategies |
policy |
FederationPolicy for selective sharing with allowlists, denylists, and rate limits |
Segment Types
Four new RVF segment types extend the 0x30-0x32 domain expansion range:
| Code | Name | Purpose |
|---|---|---|
0x33 |
FederatedManifest |
Describes the export: contributor pseudonym, timestamp, included segments, privacy budget spent |
0x34 |
DiffPrivacyProof |
Privacy attestation: epsilon/delta, mechanism, sensitivity, clipping norm, noise scale |
0x35 |
RedactionLog |
PII stripping attestation: redaction counts by category, pre-redaction content hash, rules fired |
0x36 |
AggregateWeights |
Federated-averaged LoRA deltas with participation count, round number, confidence scores |
Readers that do not recognize these segment types skip them per the RVF forward-compatibility rule. Existing TransferPrior (0x30), PolicyKernel (0x31), CostCurve (0x32), Witness, and Crypto segments are reused as-is.
PII Stripping Pipeline
PiiStripper runs a three-stage pipeline on every string field before it leaves the local machine.
Stage 1 -- Detection. Twelve built-in regex rules scan for:
- Unix and Windows file paths (
/home/user/...,C:\Users\...) - IPv4 and IPv6 addresses
- Email addresses
- API keys (
sk-...,AKIA...,ghp_..., Bearer tokens) - Environment variable references (
$HOME,%USERPROFILE%) - Usernames (
@handle)
Custom rules can be registered with add_rule().
Stage 2 -- Redaction. Detected PII is replaced with deterministic pseudonyms (<PATH_1>, <IP_2>, <REDACTED_KEY>). The same original value always maps to the same pseudonym within a single export, preserving structural relationships without revealing content.
Stage 3 -- Attestation. A RedactionLog (0x35) segment is generated containing redaction counts by category, the SHAKE-256 hash of the pre-redaction content (proves scanning happened without revealing it), and the rules that fired.
use PiiStripper;
let mut stripper = new;
let fields = vec!;
let = stripper.strip_fields;
assert_eq!;
assert!;
assert!; // clean fields pass through
Differential Privacy
Noise Mechanisms
| Mechanism | Privacy Model | Noise Distribution | Use Case |
|---|---|---|---|
| Gaussian | (epsilon, delta)-DP | N(0, sigma^2) where sigma = S * sqrt(2 ln(1.25/delta)) / epsilon | Default; tighter for large parameter counts |
| Laplace | Pure epsilon-DP | Laplace(0, S/epsilon) | Stronger guarantee; no delta term |
Gradient Clipping
Before noise injection, all parameter vectors are clipped to a configurable L2 norm bound. This limits the sensitivity of the aggregation to any single user's contribution.
Privacy Accountant
PrivacyAccountant tracks cumulative privacy loss using Renyi Differential Privacy (RDP) composition across 16 alpha orders. RDP composition is tighter than naive (epsilon, delta)-DP composition, meaning more exports fit within the same budget.
use PrivacyAccountant;
let mut accountant = new; // budget: eps=10, delta=1e-5
accountant.record_gaussian;
assert!;
assert!;
Federation Strategies
| Strategy | Algorithm | Weighting | When to Use |
|---|---|---|---|
FedAvg |
Federated Averaging (McMahan et al.) | Trajectory count | Default; most scenarios |
FedProx |
Proximal regularization | Trajectory count + mu penalty | Heterogeneous data distributions |
WeightedAverage |
Simple weighted mean | Quality/reputation score | When contributor reputation varies widely |
| Byzantine detection | L2-norm z-score filtering | Outliers > 2 std removed | Always runs before aggregation |
use ;
use Contribution;
let mut agg = new
.with_min_contributions
.with_byzantine_threshold;
agg.add_contribution;
agg.add_contribution;
let result = agg.aggregate.unwrap;
assert_eq!;
assert_eq!;
Performance Benchmarks
Measured on an AMD64 Linux system with Criterion.
| Benchmark | Time |
|---|---|
| PII detect (single string) | 756 ns |
| PII strip (10 fields) | 44 us |
| PII strip (100 fields) | 303 us |
| Gaussian noise (100 params) | 4.7 us |
| Gaussian noise (10k params) | 334 us |
| Gradient clipping (1k params) | 487 ns |
| Privacy accountant (100 rounds) | 1.0 us |
| FedAvg (10 contrib, 100 dim) | 3.9 us |
| FedAvg (100 contrib, 1k dim) | 365 us |
| Byzantine detection (50 contrib) | 12 us |
| Full export pipeline | 1.2 ms |
| Merge 100 priors | 28 us |
Feature Flags
| Flag | Default | What It Enables |
|---|---|---|
std |
Yes | Standard library support (required) |
serde |
No | Derive Serialize/Deserialize on all public types |
[]
= { = "0.1", = ["serde"] }
API Overview
Core Types
| Type | Description |
|---|---|
FederatedManifest |
Export metadata: contributor pseudonym, domain, timestamp, privacy budget spent |
DiffPrivacyProof |
Privacy attestation: epsilon, delta, mechanism, sensitivity, noise scale |
RedactionLog |
PII stripping attestation: entries by category, pre-redaction hash, field count |
AggregateWeights |
Federated-averaged LoRA deltas with round number, participation count, confidences |
BetaParams |
Beta distribution parameters for Thompson Sampling priors (merge, dampen, mean) |
Transfer Types
| Type | Description |
|---|---|
TransferPriorEntry |
Single context bucket prior: bucket ID, arm ID, Beta params, observation count |
TransferPriorSet |
Collection of priors from a trained domain with cost EMA |
PolicyKernelSnapshot |
Snapshot of tunable policy knob values with fitness score |
CostCurveSnapshot |
Ordered (step, cost) points with acceleration factor |
Aggregation Types
| Type | Description |
|---|---|
FederatedAggregator |
Aggregation server: collects contributions, detects outliers, produces AggregateWeights |
AggregationStrategy |
FedAvg, FedProx { mu }, or WeightedAverage |
Contribution |
Single participant's weight vector with quality and trajectory metadata |
Protocol Types
| Type | Description |
|---|---|
ExportBuilder |
Builder pattern: add priors/kernels/weights, PII-strip, DP-noise, produce FederatedExport |
ImportMerger |
Validate imports, merge priors with version-aware dampening, merge weights |
FederatedExport |
Completed export: manifest + redaction log + privacy proof + learning data |
FederationPolicy |
Selective sharing: allowlists, denylists, quality gate, rate limit, privacy budget |
PiiStripper |
Three-stage PII pipeline: detect, redact, attest |
DiffPrivacyEngine |
Noise injection with Gaussian or Laplace mechanism and gradient clipping |
PrivacyAccountant |
RDP-based cumulative privacy loss tracker |
Error Types
FederationError covers 15 variants:
| Variant | Trigger |
|---|---|
PrivacyBudgetExhausted |
Cumulative epsilon exceeds limit |
InvalidEpsilon |
Epsilon <= 0 |
InvalidDelta |
Delta outside (0, 1) |
SegmentValidation |
Malformed segment data |
VersionMismatch |
Incompatible format version |
SignatureVerification |
Ed25519/ML-DSA-65 signature check failed |
WitnessChainBroken |
Witness chain has a gap or tampered entry |
InsufficientObservations |
Prior has too few observations for export |
QualityBelowThreshold |
Trajectory quality below policy minimum |
RateLimited |
Export rate limit exceeded |
PiiLeakDetected |
PII found after stripping (defense-in-depth) |
ByzantineOutlier |
Contribution flagged as adversarial |
InsufficientContributions |
Not enough participants for aggregation round |
Serialization |
Encoding/decoding failure |
Io |
I/O operation failure |
Related Crates
| Crate | Relationship |
|---|---|
rvf-types |
Core RVF segment definitions; rvf-federation defines its own payload types to avoid circular deps |
ruvector-domain-expansion |
Source of TransferPrior, PolicyKernel, CostCurve; federation exports these as RVF segments |
sona |
SONA learning engine; FederatedCoordinator handles intra-deployment aggregation, rvf-federation handles inter-user |
rvf-crypto |
Ed25519 signatures and SHAKE-256 hashing used for witness chains and segment integrity |
Testing
54 tests across all modules:
Benchmarks:
License
MIT OR Apache-2.0
Part of RuVector -- the self-learning vector database.