Skip to main content

tor_dirserver/
mirror.rs

1//! The Tor directory mirror implementation.
2//!
3//! # Specifications
4//!
5//! * [Directory cache operation](https://spec.torproject.org/dir-spec/directory-cache-operation.html).
6//!
7//! # Rationale
8//!
9//! The network documents specified in the directory specification form a
10//! fundamental part within the Tor protocol, namely the creation and distribution
11//! of a canonical list, listing all relays present in the Tor network, thereby
12//! giving all clients a unified view of the entire Tor network, a fact that
13//! is very important for defending against partitioning attacks and other potential
14//! attacks in the domain of distributed networks.
15//!
16//! These network documents are generated, signed, and served by so called
17//! "directory authorities", a set of 10-ish highly trusted Tor relays more or
18//! less governing the entirety of the Tor network.
19//!
20//! Now here comes the bottleneck: Tor has millions of active daily users but
21//! only 10-ish relays responsible for these crucial documents.  Having all
22//! clients download from those 10-ish relays would present an immense overload
23//! to those, thereby potentially shutting the entire Tor network down, if the
24//! amount of traffic to those relays is so high, that they are unable to
25//! communicate and coordinate under themselves.
26//!
27//! Fortunately, all network documents are either directly or indirectly signed
28//! by well-known keys of directory authorities, thereby making mirroring them
29//! trivially possible, due the fact that authenticity can be established outside
30//! the raw TLS connection thanks to cryptographic signatures.
31//!
32//! This is the place where directory mirrors come in hnady.  Directory mirrors
33//! (previously known as "directory caches") are ordinary relays that mirror all
34//! network documents from the authorities, by implementing the respective routes
35//! for all HTTP GET endpoints from the relays.
36//!
37//! The network documents are usually served through ordinary Tor circuits,
38//! by accepting incoming connections through `RELAY_BEGIN_DIR` cells.
39//! In the past, this was done by some relays optionally enabling an additional
40//! socket on the ordinary Internet through a dedicated SocketAddr, known as
41//! "directory address".  Since about 2020, this is no longer done.  However,
42//! the functionality continues to persist and this module is written fairly
43//! agnostic on how it accepts such connections, as directory authorities continue
44//! to advertise their directory address.
45
46use std::{convert::Infallible, path::PathBuf};
47
48use futures::Stream;
49use tokio::io::{AsyncRead, AsyncWrite};
50use tor_dircommon::{
51    authority::AuthorityContacts,
52    config::{DirTolerance, DownloadScheduleConfig},
53};
54
55mod operation;
56
57/// Core data type of a directory mirror.
58///
59/// # External Notes
60///
61/// This structure serves as the entrence point to the [`mirror`](crate::mirror)
62/// API.  It represents an instance that is launchable using [`DirMirror::serve`].
63/// Calling this method consumes the instance, as this is the common behavior
64/// for objects representing server-like things, in order to not imply that this
65/// instance serves as a mere configuration template only.
66///
67/// # Internal Notes
68///
69/// For now, this data structure only holds configuration options as an ad-hoc
70/// replacement for a yet missing hypothetical `DirMirrorConfig` structure.
71///
72/// I assume that in the future, regardless of the configuration, this might also
73/// hold other fields such as access to the database pool, etc.  The question
74/// is whether this structure will be passed around with locking mechanisms
75/// or will just be used as a way to extract configuration options initially
76/// in the consuming function, which then applies further wrapping or not.
77#[derive(Debug)]
78#[non_exhaustive]
79pub struct DirMirror {
80    /// The [`PathBuf`] where the [`database`](crate::database) is located.
81    path: PathBuf,
82    /// The [`AuthorityContacts`] data structure for contacting authorities.
83    authorities: AuthorityContacts,
84    /// The [`DownloadScheduleConfig`] used for properly retrying downloads.
85    schedule: DownloadScheduleConfig,
86    /// The [`DirTolerance`] to tolerate clock skews.
87    tolerance: DirTolerance,
88}
89
90impl DirMirror {
91    /// Creates a new [`DirMirror`] with a given set of configuration options.
92    ///
93    /// # Parameters
94    ///
95    /// * `path`: The [`PathBuf`] where the database is located.
96    /// * `authorities`: The [`AuthorityContacts`] data structure for contacting authorities.
97    /// * `schedule`: The [`DownloadScheduleConfig`] used for properly retrying downloads.
98    /// * `tolerance`: The [`DirTolerance`] to tolerate clock skews.
99    ///
100    /// # Notes
101    ///
102    /// **Beware of [`DirTolerance::default()`]!**, as the default values are
103    /// inteded for clients, not directory mirrors.  Tolerances of several days
104    /// are not recommened for directory mirrors.  Consider using something in
105    /// the minute range instead, such as `60s`, which is what ctor uses.[^1]
106    ///
107    /// TODO DIRMIRROR: This is unacceptable for the actual release.  We **NEED**
108    /// a proper way to configure this, such as with a `DirMirrorConfig` struct
109    /// that can properly serialize from configuration files and such.  However,
110    /// this task is not a trivial one and maybe one of the hardest parts of this
111    /// entire development, as it would involve a radical change to many higher
112    /// level crates.  The reason for this being, that we need a clean way to
113    /// share "global" settings such as the list of authorities into various
114    /// sub-configurations, such as the configuration for the directory mirror.
115    /// We must not offer different configurations for the list of authorities
116    /// for those different components, that would result in lots of boilerplate
117    /// and potentially wrong execution given that those resources are affecting
118    /// so many parts of the Tor protocol that a consistent view must be assumed
119    /// in order to avoid surprising behavior.
120    ///
121    /// [^1]: <https://gitlab.torproject.org/tpo/core/tor/-/blob/0b20710/src/feature/nodelist/networkstatus.c#L1890>.
122    pub fn new(
123        path: PathBuf,
124        authorities: AuthorityContacts,
125        schedule: DownloadScheduleConfig,
126        tolerance: DirTolerance,
127    ) -> Self {
128        Self {
129            path,
130            authorities,
131            schedule,
132            tolerance,
133        }
134    }
135
136    /// Consumes the [`DirMirror`] by running endlessly in the current task.
137    ///
138    /// This method accepts a `listener`, which is a [`Stream`] yielding a
139    /// [`Result`] in order to model a generic way of accepting incoming
140    /// connections.  Think of `S` as the file descriptor you would call
141    /// `accept(2)` upon if you were in C.  The idea behind this generic is,
142    /// as outlined in the module documentation, that a [`DirMirror`] can
143    /// handle incoming connections in multiple ways, such as by serving
144    /// through an ordinary TCP socket or through a Tor circuit in combination
145    /// with a `RELAY_BEGIN_DIR` cell.  How this is concretely done, is outside
146    /// the scope of this crate; instead we provide the primitives making such
147    /// flexibility possible.
148    #[allow(clippy::unused_async)] // TODO
149    pub async fn serve<S, T, E>(self, _listener: S) -> Result<(), Infallible>
150    where
151        S: Stream<Item = Result<T, E>> + Unpin,
152        T: AsyncRead + AsyncWrite + Unpin + Send + 'static,
153        E: std::error::Error,
154    {
155        todo!()
156    }
157}