1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
#![deny(missing_docs)]
#![deny(unsafe_code)]
#![deny(warnings)]
//! Initialize holochain metrics.
//! This crate should only be used in binaries to initialize the actual
//! metrics collection. Libraries should just use the opentelemetry_api
//! to report metrics if any collector has been initialized.
//!
//! ## Environment Variables
//!
//! When calling `HolochainMetricsConfig::new(&path).init()`, the actual
//! metrics instance that will be created is largely controlled by
//! the existence of environment variables.
//!
//! Curently, by default, the Null metrics collector will be used, meaning
//! metrics will not be collected, and all metrics operations will be no-ops.
//!
//! If you wish to enable metrics, the current options are:
//!
//! - InfluxDB as a zero-config child-process.
//!   - Enable via environment variable: `HOLOCHAIN_INFLUXIVE_CHILD_SVC=1`
//!   - The binaries `influxd` and `influx` will be downloaded and verified
//!     before automatically being run as a child process, and set up
//!     to be reported to. The InfluxDB UI will be available on a randomly
//!     assigned port (currently only reported in the trace logging).
//! - InfluxDB as a pre-existing system process.
//!   - Enable via environment variable: `HOLOCHAIN_INFLUXIVE_EXTERNAL=1`
//!   - Configure via environment variables:
//!     - `HOLOCHAIN_INFLUXIVE_EXTERNAL_HOST=[my influxdb url]` where a default InfluxDB install will need `http://localhost:8086` and otherwise can be found by running `influx config` in a terminal.
//!     - `HOLOCHAIN_INFLUXIVE_EXTERNAL_BUCKET=[my influxdb bucket name]` but it's simplest to use `influxive` if you plan to import the provided dashboards.
//!     - `HOLOCHAIN_INFLUXIVE_EXTERNAL_TOKEN=[my influxdb auth token]`
//!   - Metrics will be set up to report to this already running InfluxDB.
//!
//! ## Metric Naming Conventions
//!
//! We will largely attempt to follow the guidelines for metric naming
//! enumerated at
//! [https://opentelemetry.io/docs/specs/otel/metrics/semantic_conventions/](https://opentelemetry.io/docs/specs/otel/metrics/semantic_conventions/),
//! with additional rules made to fit with our particular project.
//! We will also attempt to keep this documentation up-to-date on a best-effort
//! basis to act as an example and registry of metrics avaliable in Holochain,
//! and related support dependency crates managed by the organization.
//!
//! Generic naming convention rules:
//!
//! - Dot notation logical module hierarchy. This need not, and perhaps should
//!   not, match the rust crate/module hierarchy. As we may rearange crates
//!   and modules, but the metric names themselves should remain more
//!   consistant.
//!   - Examples:
//!     - `hc.db`
//!     - `hc.workflow.integration`
//!     - `kitsune.gossip`
//!     - `tx5.signal`
//! - A dot notation metric name or context should follow the logical module
//!   name. The thing that can be charted should be the actual metric. Related
//!   context that may want to be filtered for the chart should be attributes.
//!   For example, a "request" may have two separate metrics, "duration", and
//!   "byte.count", which both may have the filtering attribute "remote_id".
//!   - Examples
//!     - ```
//!         use opentelemetry_api::{Context, KeyValue, metrics::Unit};
//!         let req_dur = opentelemetry_api::global::meter("tx5")
//!             .f64_histogram("tx5.signal.request.duration")
//!             .with_description("tx5 signal server request duration")
//!             .with_unit(Unit::new("s"))
//!             .init();
//!         req_dur.record(&Context::new(), 0.42, &[
//!             KeyValue::new("remote_id", "abcd"),
//!         ]);
//!       ```
//!     - ```
//!         use opentelemetry_api::{Context, KeyValue, metrics::Unit};
//!         let req_size = opentelemetry_api::global::meter("tx5")
//!             .u64_histogram("tx5.signal.request.byte.count")
//!             .with_description("tx5 signal server request byte count")
//!             .with_unit(Unit::new("By"))
//!             .init();
//!         req_size.record(&Context::new(), 42, &[
//!             KeyValue::new("remote_id", "abcd"),
//!         ]);
//!       ```
//!
//! ## Metric Name Registry
//!
//! | Full Metric Name | Type | Unit (optional) | Description | Attributes |
//! | ---------------- | ---- | --------------- | ----------- | ---------- |
//! | `kitsune.peer.send.duration` | `f64_histogram` | `s` | When kitsune sends data to a remote peer. |- `remote_id`: the base64 remote peer id.<br />- `is_error`: if the send failed. |
//! | `kitsune.peer.send.byte.count` | `u64_histogram` | `By` | When kitsune sends data to a remote peer. |- `remote_id`: the base64 remote peer id.<br />- `is_error`: if the send failed. |
//! | `tx5.conn.ice.send` | `u64_observable_counter` | `By` | Bytes sent on ice channel. |- `remote_id`: the base64 remote peer id.<br />- `state_uniq`: endpoint identifier.<br />- `conn_uniq`: connection identifier. |
//! | `tx5.conn.ice.recv` | `u64_observable_counter` | `By` | Bytes received on ice channel. |- `remote_id`: the base64 remote peer id.<br />- `state_uniq`: endpoint identifier.<br />- `conn_uniq`: connection identifier. |
//! | `tx5.conn.data.send` | `u64_observable_counter` | `By` | Bytes sent on data channel. |- `remote_id`: the base64 remote peer id.<br />- `state_uniq`: endpoint identifier.<br />- `conn_uniq`: connection identifier. |
//! | `tx5.conn.data.recv` | `u64_observable_counter` | `By` | Bytes received on data channel. |- `remote_id`: the base64 remote peer id.<br />- `state_uniq`: endpoint identifier.<br />- `conn_uniq`: connection identifier. |
//! | `tx5.conn.data.send.message.count` | `u64_observable_counter` | | Message count sent on data channel. |- `remote_id`: the base64 remote peer id.<br />- `state_uniq`: endpoint identifier.<br />- `conn_uniq`: connection identifier. |
//! | `tx5.conn.data.recv.message.count` | `u64_observable_counter` | | Message count received on data channel. |- `remote_id`: the base64 remote peer id.<br />- `state_uniq`: endpoint identifier.<br />- `conn_uniq`: connection identifier. |
//! | `hc.conductor.p2p_event.duration`  | `f64_histogram` | `s` | The time spent processing a p2p event. |- `dna_hash`: The DNA hash that this event is being sent on behalf of. |
//! | `hc.conductor.post_commit.duration` | `f64_histogram` | `s` | The time spent executing a post commit. |- `dna_hash`: The DNA hash that this post commit is running for.<br />- `agent`: The agent running the post commit. |
//! | `hc.conductor.workflow.duration` | `f64_histogram` | `s` | The time spent running a workflow. |- `workflow`: The name of the workflow.<br />- `dna_hash`: The DNA hash that this workflow is running for.<br />- `agent`: (optional) The agent that this workflow is running for if the workflow is cell bound. |
//! | `hc.cascade.duration` | `f64_histogram` | `s` | The time taken to execute a cascade query. | |
//! | `hc.db.pool.utilization` | `f64_gauge` | | The utilisation of connections in the pool. |- `kind`: The kind of database such as Conductor, Wasm or Dht etc.<br />- `id`: The unique identifier for this database if multiple instances can exist, such as a Dht database. |
//! | `hc.db.connections.use_time` | `f64_histogram` | `s` | The time between borrowing a connection and returning it to the pool. |- `kind`: The kind of database such as Conductor, Wasm or Dht etc.<br />- `id`: The unique identifier for this database if multiple instances can exist, such as a Dht database. |

#[cfg(feature = "influxive")]
const DASH_NETWORK_STATS: &[u8] = include_bytes!("dashboards/networkstats.json");
#[cfg(feature = "influxive")]
const DASH_TX5: &[u8] = include_bytes!("dashboards/tx5.json");
#[cfg(feature = "influxive")]
const DASH_DATABASE: &[u8] = include_bytes!("dashboards/database.json");
#[cfg(feature = "influxive")]
const DASH_CONDUCTOR: &[u8] = include_bytes!("dashboards/conductor.json");

/// Configuration for holochain metrics.
pub enum HolochainMetricsConfig {
    /// Metrics are disabled.
    Disabled,

    #[cfg(feature = "influxive")]
    /// Use influxive to connect to an already running InfluxDB instance.
    /// NOTE: this means we cannot initialize any dashboards.
    InfluxiveExternal {
        /// The writer config for connecting to the external influxdb instance.
        writer_config: influxive::InfluxiveWriterConfig,

        /// The meter provider config for setting up opentelemetry.
        otel_config: influxive::InfluxiveMeterProviderConfig,

        /// The url for the external influxdb instance.
        host: String,

        /// The bucket to write to in this external influxdb instance.
        bucket: String,

        /// The authentication token to use for writing to this external
        /// influxdb instance.
        token: String,
    },

    #[cfg(feature = "influxive")]
    /// Use influxive as a child service to write metrics.
    InfluxiveChildSvc {
        /// The child service config for running the influxd server.
        child_svc_config: Box<influxive::InfluxiveChildSvcConfig>,

        /// The meter provider config for setting up opentelemetry.
        otel_config: influxive::InfluxiveMeterProviderConfig,
    },
}

const E_CHILD_SVC: &str = "HOLOCHAIN_INFLUXIVE_CHILD_SVC";

const E_EXTERNAL: &str = "HOLOCHAIN_INFLUXIVE_EXTERNAL";
const E_EXTERNAL_HOST: &str = "HOLOCHAIN_INFLUXIVE_EXTERNAL_HOST";
const E_EXTERNAL_BUCKET: &str = "HOLOCHAIN_INFLUXIVE_EXTERNAL_BUCKET";
const E_EXTERNAL_TOKEN: &str = "HOLOCHAIN_INFLUXIVE_EXTERNAL_TOKEN";

impl HolochainMetricsConfig {
    /// Initialize a new default metrics config.
    ///
    /// The output of this function is largely controlled by environment
    /// variables, please see the [crate-level documentation](crate) for usage.
    pub fn new(root_path: &std::path::Path) -> Self {
        #[cfg(feature = "influxive")]
        {
            if std::env::var_os(E_CHILD_SVC).is_some() {
                let mut database_path = std::path::PathBuf::from(root_path);
                database_path.push("influxive");
                return Self::InfluxiveChildSvc {
                    child_svc_config: Box::new(
                        influxive::InfluxiveChildSvcConfig::default()
                            .with_database_path(Some(database_path)),
                    ),
                    otel_config: influxive::InfluxiveMeterProviderConfig::default(),
                };
            }

            if std::env::var_os(E_EXTERNAL).is_some() {
                let host = match std::env::var(E_EXTERNAL_HOST) {
                    Ok(host) => host,
                    Err(err) => {
                        tracing::error!(env = %E_EXTERNAL_HOST, ?err, "invalid");
                        return Self::Disabled;
                    }
                };
                let bucket = match std::env::var(E_EXTERNAL_BUCKET) {
                    Ok(bucket) => bucket,
                    Err(err) => {
                        tracing::error!(env = %E_EXTERNAL_BUCKET, ?err, "invalid");
                        return Self::Disabled;
                    }
                };
                let token = match std::env::var(E_EXTERNAL_TOKEN) {
                    Ok(token) => token,
                    Err(err) => {
                        tracing::error!(env = %E_EXTERNAL_TOKEN, ?err, "invalid");
                        return Self::Disabled;
                    }
                };
                return Self::InfluxiveExternal {
                    writer_config: influxive::InfluxiveWriterConfig::default(),
                    otel_config: influxive::InfluxiveMeterProviderConfig::default(),
                    host,
                    bucket,
                    token,
                };
            }
        }

        #[cfg(not(feature = "influxive"))]
        {
            let _root_path = root_path;
        }

        Self::Disabled
    }

    /// Initialize holochain metrics based on this configuration.
    pub async fn init(self) {
        match self {
            Self::Disabled => {
                tracing::info!("Running without metrics");
            }
            #[cfg(feature = "influxive")]
            Self::InfluxiveExternal {
                writer_config,
                otel_config,
                host,
                bucket,
                token,
            } => {
                Self::init_influxive_external(writer_config, otel_config, host, bucket, token);
            }
            #[cfg(feature = "influxive")]
            Self::InfluxiveChildSvc {
                child_svc_config,
                otel_config,
            } => {
                Self::init_influxive_child_svc(*child_svc_config, otel_config).await;
            }
        }
    }

    #[cfg(feature = "influxive")]
    fn init_influxive_external(
        writer_config: influxive::InfluxiveWriterConfig,
        otel_config: influxive::InfluxiveMeterProviderConfig,
        host: String,
        bucket: String,
        token: String,
    ) {
        tracing::info!(?writer_config, %host, %bucket, "initializing holochain_metrics");

        let meter_provider = influxive::influxive_external_meter_provider_token_auth(
            writer_config,
            otel_config,
            host,
            bucket,
            token,
        );

        // setup opentelemetry to use our metrics collector
        opentelemetry_api::global::set_meter_provider(meter_provider);
    }

    #[cfg(feature = "influxive")]
    async fn init_influxive_child_svc(
        child_svc_config: influxive::InfluxiveChildSvcConfig,
        otel_config: influxive::InfluxiveMeterProviderConfig,
    ) {
        tracing::info!(?child_svc_config, "initializing holochain_metrics");

        match influxive::influxive_child_process_meter_provider(child_svc_config, otel_config).await
        {
            Ok((influxive, meter_provider)) => {
                // apply templates
                if let Ok(cur) = influxive.list_dashboards().await {
                    // only initialize dashboards if the db is new
                    if cur.contains("\"dashboards\": []") {
                        if let Err(err) = influxive.apply(DASH_NETWORK_STATS).await {
                            tracing::warn!(?err, "failed to initialize network stats dashboard");
                        }
                        if let Err(err) = influxive.apply(DASH_TX5).await {
                            tracing::warn!(?err, "failed to initialize tx5 dashboard");
                        }
                        if let Err(err) = influxive.apply(DASH_DATABASE).await {
                            tracing::warn!(?err, "failed to initialize database dashboard");
                        }
                        if let Err(err) = influxive.apply(DASH_CONDUCTOR).await {
                            tracing::warn!(?err, "failed to initialize conductor dashboard");
                        }
                    }
                }

                // setup opentelemetry to use our metrics collector
                opentelemetry_api::global::set_meter_provider(meter_provider);

                tracing::info!(host = %influxive.get_host(), "influxive metrics running");
            }
            Err(err) => {
                tracing::warn!(?err, "unable to initialize local metrics");
            }
        }
    }
}