tor_dirserver/mirror.rs
1//! The Tor directory mirror implementation.
2//!
3//! # Specifications
4//!
5//! * [Directory cache operation](https://spec.torproject.org/dir-spec/directory-cache-operation.html).
6//!
7//! # Rationale
8//!
9//! The network documents specified in the directory specification form a
10//! fundamental part within the Tor protocol, namely the creation and distribution
11//! of a canonical list, listing all relays present in the Tor network, thereby
12//! giving all clients a unified view of the entire Tor network, a fact that
13//! is very important for defending against partitioning attacks and other potential
14//! attacks in the domain of distributed networks.
15//!
16//! These network documents are generated, signed, and served by so called
17//! "directory authorities", a set of 10-ish highly trusted Tor relays more or
18//! less governing the entirety of the Tor network.
19//!
20//! Now here comes the bottleneck: Tor has millions of active daily users but
21//! only 10-ish relays responsible for these crucial documents. Having all
22//! clients download from those 10-ish relays would present an immense overload
23//! to those, thereby potentially shutting the entire Tor network down, if the
24//! amount of traffic to those relays is so high, that they are unable to
25//! communicate and coordinate under themselves.
26//!
27//! Fortunately, all network documents are either directly or indirectly signed
28//! by well-known keys of directory authorities, thereby making mirroring them
29//! trivially possible, due the fact that authenticity can be established outside
30//! the raw TLS connection thanks to cryptographic signatures.
31//!
32//! This is the place where directory mirrors come in hnady. Directory mirrors
33//! (previously known as "directory caches") are ordinary relays that mirror all
34//! network documents from the authorities, by implementing the respective routes
35//! for all HTTP GET endpoints from the relays.
36//!
37//! The network documents are usually served through ordinary Tor circuits,
38//! by accepting incoming connections through `RELAY_BEGIN_DIR` cells.
39//! In the past, this was done by some relays optionally enabling an additional
40//! socket on the ordinary Internet through a dedicated SocketAddr, known as
41//! "directory address". Since about 2020, this is no longer done. However,
42//! the functionality continues to persist and this module is written fairly
43//! agnostic on how it accepts such connections, as directory authorities continue
44//! to advertise their directory address.
45
46use std::{convert::Infallible, path::PathBuf};
47
48use futures::Stream;
49use tokio::io::{AsyncRead, AsyncWrite};
50use tor_dircommon::{
51 authority::AuthorityContacts,
52 config::{DirTolerance, DownloadScheduleConfig},
53};
54
55mod operation;
56
57/// Core data type of a directory mirror.
58///
59/// # External Notes
60///
61/// This structure serves as the entrence point to the [`mirror`](crate::mirror)
62/// API. It represents an instance that is launchable using [`DirMirror::serve`].
63/// Calling this method consumes the instance, as this is the common behavior
64/// for objects representing server-like things, in order to not imply that this
65/// instance serves as a mere configuration template only.
66///
67/// # Internal Notes
68///
69/// For now, this data structure only holds configuration options as an ad-hoc
70/// replacement for a yet missing hypothetical `DirMirrorConfig` structure.
71///
72/// I assume that in the future, regardless of the configuration, this might also
73/// hold other fields such as access to the database pool, etc. The question
74/// is whether this structure will be passed around with locking mechanisms
75/// or will just be used as a way to extract configuration options initially
76/// in the consuming function, which then applies further wrapping or not.
77#[derive(Debug)]
78#[non_exhaustive]
79pub struct DirMirror {
80 /// The [`PathBuf`] where the [`database`](crate::database) is located.
81 path: PathBuf,
82 /// The [`AuthorityContacts`] data structure for contacting authorities.
83 authorities: AuthorityContacts,
84 /// The [`DownloadScheduleConfig`] used for properly retrying downloads.
85 schedule: DownloadScheduleConfig,
86 /// The [`DirTolerance`] to tolerate clock skews.
87 tolerance: DirTolerance,
88}
89
90impl DirMirror {
91 /// Creates a new [`DirMirror`] with a given set of configuration options.
92 ///
93 /// # Parameters
94 ///
95 /// * `path`: The [`PathBuf`] where the database is located.
96 /// * `authorities`: The [`AuthorityContacts`] data structure for contacting authorities.
97 /// * `schedule`: The [`DownloadScheduleConfig`] used for properly retrying downloads.
98 /// * `tolerance`: The [`DirTolerance`] to tolerate clock skews.
99 ///
100 /// # Notes
101 ///
102 /// **Beware of [`DirTolerance::default()`]!**, as the default values are
103 /// inteded for clients, not directory mirrors. Tolerances of several days
104 /// are not recommened for directory mirrors. Consider using something in
105 /// the minute range instead, such as `60s`, which is what ctor uses.[^1]
106 ///
107 /// TODO DIRMIRROR: This is unacceptable for the actual release. We **NEED**
108 /// a proper way to configure this, such as with a `DirMirrorConfig` struct
109 /// that can properly serialize from configuration files and such. However,
110 /// this task is not a trivial one and maybe one of the hardest parts of this
111 /// entire development, as it would involve a radical change to many higher
112 /// level crates. The reason for this being, that we need a clean way to
113 /// share "global" settings such as the list of authorities into various
114 /// sub-configurations, such as the configuration for the directory mirror.
115 /// We must not offer different configurations for the list of authorities
116 /// for those different components, that would result in lots of boilerplate
117 /// and potentially wrong execution given that those resources are affecting
118 /// so many parts of the Tor protocol that a consistent view must be assumed
119 /// in order to avoid surprising behavior.
120 ///
121 /// [^1]: <https://gitlab.torproject.org/tpo/core/tor/-/blob/0b20710/src/feature/nodelist/networkstatus.c#L1890>.
122 pub fn new(
123 path: PathBuf,
124 authorities: AuthorityContacts,
125 schedule: DownloadScheduleConfig,
126 tolerance: DirTolerance,
127 ) -> Self {
128 Self {
129 path,
130 authorities,
131 schedule,
132 tolerance,
133 }
134 }
135
136 /// Consumes the [`DirMirror`] by running endlessly in the current task.
137 ///
138 /// This method accepts a `listener`, which is a [`Stream`] yielding a
139 /// [`Result`] in order to model a generic way of accepting incoming
140 /// connections. Think of `S` as the file descriptor you would call
141 /// `accept(2)` upon if you were in C. The idea behind this generic is,
142 /// as outlined in the module documentation, that a [`DirMirror`] can
143 /// handle incoming connections in multiple ways, such as by serving
144 /// through an ordinary TCP socket or through a Tor circuit in combination
145 /// with a `RELAY_BEGIN_DIR` cell. How this is concretely done, is outside
146 /// the scope of this crate; instead we provide the primitives making such
147 /// flexibility possible.
148 #[allow(clippy::unused_async)] // TODO
149 pub async fn serve<S, T, E>(self, _listener: S) -> Result<(), Infallible>
150 where
151 S: Stream<Item = Result<T, E>> + Unpin,
152 T: AsyncRead + AsyncWrite + Unpin + Send + 'static,
153 E: std::error::Error,
154 {
155 todo!()
156 }
157}