p2panda_discovery/
lib.rs

1// SPDX-License-Identifier: MIT OR Apache-2.0
2
3//! Traits and implementation of p2panda's confidential discovery protocol.
4//!
5//! ## Motivation
6//!
7//! Discovery can be used to find nodes which share a common interest in a topic. During this
8//! process, transport information is exchanged in order to aid in the establishment of direct
9//! peer-to-peer connections. A topic in p2panda is a secret, randomly-generated hash that plays a
10//! similar role to a shared symmetric key. Topics usually represent identifiers or namespaces for
11//! data and documents associated with a specific group of people (for example a text document,
12//! chat group or image folder). For this reason, a topic should never be leaked to people outside
13//! of the intended group, whether accidentally or purposefully.
14//!
15//! Our discovery protocol implementation is designed to ensure that topics are never leaked to
16//! unintended actors. Nodes will only ever exchange data when both parties have proven their
17//! knowledege of the same topic. This mutual acknowledgement is achieved using a secure multiparty
18//! cryptographic technique known as Private Equality Testing (PET) or Private Set Intersection
19//! (PSI) which prevents unrelated topics being leaked to other parties. Example:
20//!
21//! ```text
22//! Alice's topics: T1, T2, T3, T4
23//! Bob's topics: T1, T4, T5
24//!
25//! Result after confidential protocol exchange:
26//! - Alice will learn from Bob that they are interested in T1 and T4
27//! - Bob will learn from Alice that they are interested in T1 and T4
28//! ```
29//!
30//! Having a confidential discovery protocol allows us to build systems where we confidentially
31//! sync / exchange data with nodes who have proven that they are aware of this topic. Usually
32//! privacy is further improved by keeping an allow list to limit the set of nodes we are
33//! establishing direct connections with. This limits with whom we are even exchanging over the
34//! discovery protocol. This is handled on the networking layer though (see allow lists in
35//! `p2panda-net`) and allows building more private networks of many, explicitly trusted nodes
36//! which within can confidentially exchange topics. An example would be a family, team or
37//! collective having multiple chat groups - and still nobody should ever learn about what other
38//! chat groups exists.
39//!
40//! To exchange data confidentially we require a direct, authenticated E2EE connection with the
41//! remote node and can't move data into DHTs or other globally distributed data types or services.
42//! Since every node itself can also serve as an source of new node information we achieve a fully
43//! decentralised discovery strategy, making rendezvous nodes or DNS-like lookups redundant.
44//!
45//! ## Protocol Design
46//!
47//! Our discovery consists of three systems: A discovery "peer sampling strategy" and the
48//! "protocol" itself which is the actual exchange of messages between two nodes and lastly an
49//! "address book" which is strictly speaking more of a storage backend to persist and query known
50//! node information and associated topics - but important for discovery.
51//!
52//! For our peer-sampling strategy we're using a basic Random Walk approach: Any random bootstrap
53//! node from the address book is selected to initiate depth-first network traversal with frequent
54//! resets to allow exploring the network more "broadly" and prevent getting stuck in cycles. The
55//! protocol itself is a a simple exchange of salted and hashed topic identifiers, allowing us to
56//! confidentially exchange topics beetween two nodes in linear time (growing with the number of
57//! topics).
58//!
59//! Since every topic is a cryptographically secure, randomly generated value, an attacker can not
60//! easily learn the underlying value when trying to break the hashing function, even when the
61//! hashing function in itself was not designed to prevent such attacks.
62//!
63//! Bootstrap nodes are a locally configurable set of nodes which are picked first when the
64//! discovery process starts.
65//!
66//! It's also worth mentioning that the random walk can be parallelized in applications (run
67//! multiple "walkers" at the same time) and that it is an "ambient" process, constantly running in
68//! the background, informing other parts of the stack about confidentially exchanged sets of
69//! topics and nodes.
70//!
71//! ## Inspiration
72//!
73//! To use Private Equality Testing in this "category" of peer-to-peer protocols was (to our
74//! knowledge) first suggested by [Willow](https://willowprotocol.org/) and we believe that
75//! this is the only way towards ["more fine"](https://newdesigncongress.org/en/pub/this-is-fine/)
76//! peer-to-peer systems.
77//!
78//! We've heard first about random walk algorithms used as a discovery technique in peer-to-peer
79//! systems from the
80//! [IPv8](https://py-ipv8.readthedocs.io/en/latest/further-reading/advanced_peer_discovery.html)
81//! project.
82// TODO: Move address book into `p2panda-store` when crate is ready.
83pub mod address_book;
84pub mod psi_hash;
85#[cfg(feature = "random_walk")]
86pub mod random_walk;
87#[cfg(any(test, feature = "test_utils"))]
88pub mod test_utils;
89#[cfg(test)]
90pub mod tests;
91pub mod traits;
92
93pub use traits::{DiscoveryProtocol, DiscoveryResult, DiscoveryStrategy};