# Saorsa RSPS
[](https://github.com/dirvine/saorsa-rsps-foundation/actions/workflows/rust.yml)
[](https://docs.rs/saorsa-rsps)
[](https://crates.io/crates/saorsa-rsps)
Root-Scoped Provider Summaries using Golomb Coded Sets (GCS) for efficient DHT lookups and cache management in P2P networks.
## What This Solves
In decentralized networks like IPFS, BitTorrent, and other DHT-based P2P systems, finding content is expensive. Traditional approaches require:
- **Broadcasting provider records** for every piece of content to the entire DHT network, consuming massive bandwidth
- **Flooding the network** with discovery requests when searching for related content
- **Storing individual provider records** for millions of Content IDs (CIDs), overwhelming DHT nodes with storage and lookup overhead
**Saorsa RSPS** solves this by introducing **hierarchical content organization** with **ultra-compact summaries**:
### The Problem: DHT Provider Record Explosion
When you store a large dataset (like a website, software repository, or media collection) in a P2P network, each file chunk gets its own CID. A typical website might have thousands of CIDs, a software repository tens of thousands. Publishing provider records for each CID individually to the DHT:
- Creates **millions of DHT messages** for large content
- **Overwhelms DHT nodes** with storage requirements
- Makes **content discovery slow** due to network-wide searches
- **Wastes bandwidth** with redundant provider advertisements
### The Solution: Root-Scoped Provider Summaries
Instead of advertising individual CIDs, RSPS lets you:
1. **Group related content** under a single "root CID" (like a directory, repository, or collection)
2. **Create a compact summary** using Golomb Coded Sets that represents thousands of CIDs in just a few KB
3. **Advertise only the summary** to the DHT, reducing messages by 1000x or more
4. **Enable fast batch discovery** - one lookup tells you if ANY of thousands of CIDs might be available
### Real-World Use Cases
- **Content Distribution**: Efficiently advertise that you host an entire website/app without flooding the DHT
- **Software Repositories**: Let peers discover if you have specific versions/packages without individual lookups
- **Media Collections**: Advertise entire albums, movie series, or dataset collections as single summaries
- **Version Control**: Organize git-like repositories with hierarchical content discovery
- **Caching Networks**: Smart cache admission - only cache content that's part of advertised collections
### Performance Benefits
- **20-30% more compact** than Bloom filters for the same false positive rate
- **1000x reduction** in DHT provider record messages for large content collections
- **Sub-millisecond membership testing** for thousands of CIDs
- **Minimal memory overhead** - entire summaries fit in L1 cache
- **Network-efficient serialization** - summaries transport in single UDP packets
## Features
- **Golomb Coded Sets**: Space-efficient Content ID (CID) summaries with configurable false positive rates
- **Root-anchored Cache**: Cache admission policies anchored to root CIDs for hierarchical data organization
- **TTL Management**: Sophisticated time-to-live management with hit tracking and witness receipts
- **VRF Pseudonyms**: Verifiable Random Function pseudonyms for witness receipt systems
- **Async/Await Support**: Full async/await support with Tokio
- **High Performance**: Optimized for P2P DHT operations with minimal memory overhead
## Quick Start
Add this to your `Cargo.toml`:
```toml
[dependencies]
saorsa-rsps = "0.1.0"
```
## Example Usage
```rust
use saorsa_rsps::{Rsps, RspsConfig, Cid};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let root_cid = [1u8; 32];
let cids = vec![
[2u8; 32],
[3u8; 32],
[4u8; 32],
];
let config = RspsConfig::default();
// Create RSPS for a root with associated CIDs
let rsps = Rsps::new(root_cid, 1, &cids, &config)?;
// Check if a CID might be under this root
let test_cid = [2u8; 32];
if rsps.contains(&test_cid) {
println!("CID might be under this root");
}
// Get digest for DHT advertisement
let digest = rsps.digest();
println!("RSPS digest: {:?}", digest);
Ok(())
}
```
## Components
### Golomb Coded Sets (GCS)
Efficient probabilistic data structure for representing sets with configurable false positive rates. Optimized for P2P networks where bandwidth and storage efficiency are critical.
### Cache Management
Root-anchored cache policies that organize data hierarchically under root CIDs, with sophisticated TTL management based on hit frequency and witness receipts.
### TTL Engine
Advanced time-to-live management with:
- Base TTL for new entries
- TTL extension per cache hit
- TTL extension per witness receipt
- Temporal bucketing for receipt aggregation
### Witness System
VRF-based witness receipts for distributed verification and reputation systems in P2P networks.
## Architecture
RSPS (Root-Scoped Provider Summaries) organize content hierarchically under root CIDs, enabling efficient DHT lookups for complex data structures. Each RSPS contains:
- **Root CID**: The anchor point for hierarchical organization
- **Epoch**: Version/time identifier for cache invalidation
- **GCS**: Space-efficient summary of CIDs under this root
- **Salt**: Deterministic salt for GCS construction
- **Metadata**: Creation timestamp and configuration
## Using RSPS in a Decentralized Network
This crate is designed for use in DHT-based or gossip-based networks where providers advertise summaries of content clustered under a root CID. A typical flow:
1. **Producer builds RSPS**
- Select a `root_cid` and gather the set of child CIDs.
- Build an `Rsps` with an epoch representing the dataset version.
- Serialize or extract the `digest()` for lightweight advertisement in the DHT.
2. **Advertisement**
- Publish either the RSPS bytes or its `digest` keyed by `root_cid`+`epoch` in your routing layer.
- Peers cache the announcement, possibly with TTL heuristics matching their local policy.
3. **Discovery**
- A client looking for a `cid` first fetches the RSPS for the relevant `root_cid`/`epoch`.
- Use `rsps.contains(&cid)` to check probabilistic membership.
- On positive result, proceed to fetch the content from providers under that root.
4. **Cache integration**
- Register an `Rsps` with `RootAnchoredCache` to gate cache admission: only items in the RSPS are eligible.
- Use the `TtlEngine` to manage lifetimes based on hits and witness receipts.
5. **Epoch rotation**
- When the dataset changes, increment `epoch` and republish a new RSPS.
- Consumers prefer the highest known epoch and drop stale ones per policy.
### False-positive tuning
- `RspsConfig.target_fpr` sets the target false positive rate. This crate uses Golomb–Rice coding internally, selecting a `p = 2^k` consistent with the requested FPR.
- Trade-offs:
- Lower FPR → larger RSPS, more CPU to encode/decode, less cache pollution.
- Higher FPR → smaller RSPS, faster, but occasional extra fetches.
### Why Golomb–Rice coding?
Golomb coding represents non-negative integers as a quotient (unary-coded) and a remainder (binary-coded) with respect to a parameter `p`. When `p` is a power of two (`p = 2^k`), the scheme is called Golomb–Rice coding. We choose Rice coding for RSPS because:
- Simpler and faster decode: the remainder is exactly `k = log2(p)` bits; no truncated-binary logic is needed, which keeps parsing branch-light and cache-friendly.
- Deterministic footprint vs. FPR: we pick `k = ceil(log2(1 / target_fpr))` so the parameter ties directly to the requested false-positive rate.
- Good practical compression for hashed, near-uniform deltas: the sorted hashed values modulo `n * p` produce geometric-like deltas that Rice handles efficiently.
In this crate, we enforce Rice coding by deriving `p` as a power of two and validating `p` during decode. This provides predictable encode/decode behavior and avoids edge cases that arise with general Golomb parameters.
### Serialization and transport
- `GolombCodedSet::to_bytes`/`from_bytes` serialize the GCS; an RSPS can be reconstructed from its components across nodes.
- For transport, include: `root_cid`, `epoch`, `salt`, and `gcs.to_bytes()`.
### Witness receipts
- Nodes can issue witness receipts for successful retrievals under a root to extend TTLs and inform reputation systems.
- Uses production-ready ed25519-dalek v2 for signatures and RFC 9381 ECVRF on ristretto255 for VRF pseudonyms.
- Implements domain separation to prevent cross-protocol attacks.
## Security Notes
### Cryptographic Implementation
- **Ed25519 signatures**: Uses `ed25519-dalek` v2.x, a battle-tested, misuse-resistant implementation
- **VRF pseudonyms**: RFC 9381 compliant ECVRF on ristretto255 via `vrf-r255` crate
- **Domain separation**: All cryptographic operations use distinct domain prefixes:
- VRF inputs: `b"saorsa-rsps:vrf:v1:"`
- Witness signatures: `b"saorsa-rsps:witness:v1:"`
- **Key hygiene**: Secret keys are automatically zeroized on drop
- **Separate key domains**: Ed25519 and VRF keys are kept completely separate
### Security Properties
- **No panics**: All cryptographic operations return `Result` types with proper error handling
- **Strict validation**: Input validation for all key sizes and proof lengths
- **Memory safety**: Built with Rust 2024 edition, `#![forbid(unsafe_code)]` in crypto module
- **Audit trail**: Uses well-audited cryptographic libraries with extensive test coverage
### Implementation Notes
- GCS uses Golomb–Rice coding (power-of-two parameter) to ensure decode correctness and performance
- VRF keys are separate from Ed25519 signature keys (different ciphersuites)
- Domain separation prevents attacks where signatures/proofs from one context are replayed in another
- All cryptographic operations are deterministic for the same inputs
## Performance
Saorsa RSPS is optimized for:
- **Memory Efficiency**: GCS provides space-efficient CID summaries
- **Network Efficiency**: Minimal bandwidth for DHT advertisements
- **Lookup Speed**: Fast probabilistic membership testing
- **Cache Effectiveness**: Smart TTL management with hit tracking
## Safety and Security
- Built with Rust 2024 edition for memory safety
- No `unwrap()` or `expect()` in production code
- Comprehensive error handling with `thiserror`
- Security audit workflow with `cargo-audit`
- Cryptographically secure random number generation
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the AGPL-3.0 license. See [LICENSE](LICENSE) for details.
## Related Projects
- [saorsa-fec](https://github.com/dirvine/saorsa-foundation) - Patent-free erasure coding
- [saorsa-core](https://github.com/maidsafe/p2p) - P2P networking foundation