Crate holochain_metrics

Expand description

Initialize holochain metrics. This crate should only be used in binaries to initialize the actual metrics collection. Libraries should just use the opentelemetry_api to report metrics if any collector has been initialized.

§Environment Variables

When calling HolochainMetricsConfig::new(&path).init(), the actual metrics instance that will be created is largely controlled by the existence of environment variables.

Curently, by default, the Null metrics collector will be used, meaning metrics will not be collected, and all metrics operations will be no-ops.

If you wish to enable metrics, the current options are:

InfluxDB as a zero-config child-process.
- Enable via environment variable: HOLOCHAIN_INFLUXIVE_CHILD_SVC=1
- The binaries influxd and influx will be downloaded and verified before automatically being run as a child process, and set up to be reported to. The InfluxDB UI will be available on a randomly assigned port (currently only reported in the trace logging).
InfluxDB as a pre-existing system process.
- Enable via environment variable: HOLOCHAIN_INFLUXIVE_EXTERNAL=1
- Configure via environment variables:
  - HOLOCHAIN_INFLUXIVE_EXTERNAL_HOST=[my influxdb url] where a default InfluxDB install will need http://localhost:8086 and otherwise can be found by running influx config in a terminal.
  - HOLOCHAIN_INFLUXIVE_EXTERNAL_BUCKET=[my influxdb bucket name] but it’s simplest to use influxive if you plan to import the provided dashboards.
  - HOLOCHAIN_INFLUXIVE_EXTERNAL_TOKEN=[my influxdb auth token]
- Metrics will be set up to report to this already running InfluxDB.

§Metric Naming Conventions

We will largely attempt to follow the guidelines for metric naming enumerated at https://opentelemetry.io/docs/specs/otel/metrics/semantic_conventions/, with additional rules made to fit with our particular project. We will also attempt to keep this documentation up-to-date on a best-effort basis to act as an example and registry of metrics avaliable in Holochain, and related support dependency crates managed by the organization.

Generic naming convention rules:

Dot notation logical module hierarchy. This need not, and perhaps should not, match the rust crate/module hierarchy. As we may rearange crates and modules, but the metric names themselves should remain more consistant.
- Examples:
  - hc.db
  - hc.workflow.integration
  - kitsune.gossip
  - tx5.signal

A dot notation metric name or context should follow the logical module name. The thing that can be charted should be the actual metric. Related context that may want to be filtered for the chart should be attributes. For example, a “request” may have two separate metrics, “duration”, and “byte.count”, which both may have the filtering attribute “remote_id”.

Examples

  use opentelemetry_api::{Context, KeyValue, metrics::Unit};
  let req_dur = opentelemetry_api::global::meter("tx5")
      .f64_histogram("tx5.signal.request.duration")
      .with_description("tx5 signal server request duration")
      .with_unit(Unit::new("s"))
      .init();
  req_dur.record(&Context::new(), 0.42, &[
      KeyValue::new("remote_id", "abcd"),
  ]);

  use opentelemetry_api::{Context, KeyValue, metrics::Unit};
  let req_size = opentelemetry_api::global::meter("tx5")
      .u64_histogram("tx5.signal.request.byte.count")
      .with_description("tx5 signal server request byte count")
      .with_unit(Unit::new("By"))
      .init();
  req_size.record(&Context::new(), 42, &[
      KeyValue::new("remote_id", "abcd"),
  ]);

§Metric Name Registry

Full Metric Name	Type	Unit (optional)	Description	Attributes
`kitsune.peer.send.duration`	`f64_histogram`	`s`	When kitsune sends data to a remote peer.	- `remote_id`: the base64 remote peer id. - `is_error`: if the send failed.
`kitsune.peer.send.byte.count`	`u64_histogram`	`By`	When kitsune sends data to a remote peer.	- `remote_id`: the base64 remote peer id. - `is_error`: if the send failed.
`tx5.conn.ice.send`	`u64_observable_counter`	`By`	Bytes sent on ice channel.	- `remote_id`: the base64 remote peer id. - `state_uniq`: endpoint identifier. - `conn_uniq`: connection identifier.
`tx5.conn.ice.recv`	`u64_observable_counter`	`By`	Bytes received on ice channel.	- `remote_id`: the base64 remote peer id. - `state_uniq`: endpoint identifier. - `conn_uniq`: connection identifier.
`tx5.conn.data.send`	`u64_observable_counter`	`By`	Bytes sent on data channel.	- `remote_id`: the base64 remote peer id. - `state_uniq`: endpoint identifier. - `conn_uniq`: connection identifier.
`tx5.conn.data.recv`	`u64_observable_counter`	`By`	Bytes received on data channel.	- `remote_id`: the base64 remote peer id. - `state_uniq`: endpoint identifier. - `conn_uniq`: connection identifier.
`tx5.conn.data.send.message.count`	`u64_observable_counter`		Message count sent on data channel.	- `remote_id`: the base64 remote peer id. - `state_uniq`: endpoint identifier. - `conn_uniq`: connection identifier.
`tx5.conn.data.recv.message.count`	`u64_observable_counter`		Message count received on data channel.	- `remote_id`: the base64 remote peer id. - `state_uniq`: endpoint identifier. - `conn_uniq`: connection identifier.
`hc.conductor.p2p_event.duration`	`f64_histogram`	`s`	The time spent processing a p2p event.	- `dna_hash`: The DNA hash that this event is being sent on behalf of.
`hc.conductor.post_commit.duration`	`f64_histogram`	`s`	The time spent executing a post commit.	- `dna_hash`: The DNA hash that this post commit is running for. - `agent`: The agent running the post commit.
`hc.conductor.workflow.duration`	`f64_histogram`	`s`	The time spent running a workflow.	- `workflow`: The name of the workflow. - `dna_hash`: The DNA hash that this workflow is running for. - `agent`: (optional) The agent that this workflow is running for if the workflow is cell bound.
`hc.cascade.duration`	`f64_histogram`	`s`	The time taken to execute a cascade query.
`hc.db.pool.utilization`	`f64_gauge`		The utilisation of connections in the pool.	- `kind`: The kind of database such as Conductor, Wasm or Dht etc. - `id`: The unique identifier for this database if multiple instances can exist, such as a Dht database.
`hc.db.connections.use_time`	`f64_histogram`	`s`	The time between borrowing a connection and returning it to the pool.	- `kind`: The kind of database such as Conductor, Wasm or Dht etc. - `id`: The unique identifier for this database if multiple instances can exist, such as a Dht database.

Enums§

HolochainMetricsConfig
Configuration for holochain metrics.

Crate holochain_metricsCopy item path

§Environment Variables

§Metric Naming Conventions

§Metric Name Registry

Enums§

Crate holochain_metrics