Crate holochain_metrics
source ·Expand description
Initialize holochain metrics. This crate should only be used in binaries to initialize the actual metrics collection. Libraries should just use the opentelemetry_api to report metrics if any collector has been initialized.
§Environment Variables
When calling HolochainMetricsConfig::new(&path).init()
, the actual
metrics instance that will be created is largely controlled by
the existence of environment variables.
Curently, by default, the Null metrics collector will be used, meaning metrics will not be collected, and all metrics operations will be no-ops.
If you wish to enable metrics, the current options are:
- InfluxDB as a zero-config child-process.
- Enable via environment variable:
HOLOCHAIN_INFLUXIVE_CHILD_SVC=1
- The binaries
influxd
andinflux
will be downloaded and verified before automatically being run as a child process, and set up to be reported to. The InfluxDB UI will be available on a randomly assigned port (currently only reported in the trace logging).
- Enable via environment variable:
- InfluxDB as a pre-existing system process.
- Enable via environment variable:
HOLOCHAIN_INFLUXIVE_EXTERNAL=1
- Configure via environment variables:
HOLOCHAIN_INFLUXIVE_EXTERNAL_HOST=[my influxdb url]
where a default InfluxDB install will needhttp://localhost:8086
and otherwise can be found by runninginflux config
in a terminal.HOLOCHAIN_INFLUXIVE_EXTERNAL_BUCKET=[my influxdb bucket name]
but it’s simplest to useinfluxive
if you plan to import the provided dashboards.HOLOCHAIN_INFLUXIVE_EXTERNAL_TOKEN=[my influxdb auth token]
- The influxdb auth token must have permission to write to all buckets
- Metrics will be set up to report to this already running InfluxDB.
- Enable via environment variable:
§Metric Naming Conventions
We will largely attempt to follow the guidelines for metric naming enumerated at https://opentelemetry.io/docs/specs/otel/metrics/semantic_conventions/, with additional rules made to fit with our particular project. We will also attempt to keep this documentation up-to-date on a best-effort basis to act as an example and registry of metrics avaliable in Holochain, and related support dependency crates managed by the organization.
Generic naming convention rules:
- Dot notation logical module hierarchy. This need not, and perhaps should
not, match the rust crate/module hierarchy. As we may rearange crates
and modules, but the metric names themselves should remain more
consistant.
- Examples:
hc.db
hc.workflow.integration
kitsune.gossip
tx5.signal
- Examples:
- A dot notation metric name or context should follow the logical module
name. The thing that can be charted should be the actual metric. Related
context that may want to be filtered for the chart should be attributes.
For example, a “request” may have two separate metrics, “duration”, and
“byte.count”, which both may have the filtering attribute “remote_id”.
- Examples
-
use opentelemetry_api::{Context, KeyValue, metrics::Unit}; let req_dur = opentelemetry_api::global::meter("tx5") .f64_histogram("tx5.signal.request.duration") .with_description("tx5 signal server request duration") .with_unit(Unit::new("s")) .init(); req_dur.record(&Context::new(), 0.42, &[ KeyValue::new("remote_id", "abcd"), ]);
-
use opentelemetry_api::{Context, KeyValue, metrics::Unit}; let req_size = opentelemetry_api::global::meter("tx5") .u64_histogram("tx5.signal.request.byte.count") .with_description("tx5 signal server request byte count") .with_unit(Unit::new("By")) .init(); req_size.record(&Context::new(), 42, &[ KeyValue::new("remote_id", "abcd"), ]);
-
- Examples
§Metric Name Registry
Full Metric Name | Type | Unit (optional) | Description | Attributes |
---|---|---|---|---|
kitsune.peer.send.duration | f64_histogram | s | When kitsune sends data to a remote peer. | - remote_id : the base64 remote peer id.- is_error : if the send failed. |
kitsune.peer.send.byte.count | u64_histogram | By | When kitsune sends data to a remote peer. | - remote_id : the base64 remote peer id.- is_error : if the send failed. |
kitsune.gossip.generate_op_blooms.duration | f64_histogram | s | The time taken to generate op blooms for gossip. | - space : The space (dna_hash representation) that gossip is being performed for.- batch_size : The number of ops that were included in the bloom batch for this observation. |
kitsune.gossip.generate_op_region_set.duration | f64_histogram | s | The time taken to generate op region sets for gossip. | - space : The space (dna_hash representation) that gossip is being performed for. |
tx5.conn.ice.send | u64_observable_counter | By | Bytes sent on ice channel. | - remote_id : the base64 remote peer id.- state_uniq : endpoint identifier.- conn_uniq : connection identifier. |
tx5.conn.ice.recv | u64_observable_counter | By | Bytes received on ice channel. | - remote_id : the base64 remote peer id.- state_uniq : endpoint identifier.- conn_uniq : connection identifier. |
tx5.conn.data.send | u64_observable_counter | By | Bytes sent on data channel. | - remote_id : the base64 remote peer id.- state_uniq : endpoint identifier.- conn_uniq : connection identifier. |
tx5.conn.data.recv | u64_observable_counter | By | Bytes received on data channel. | - remote_id : the base64 remote peer id.- state_uniq : endpoint identifier.- conn_uniq : connection identifier. |
tx5.conn.data.send.message.count | u64_observable_counter | Message count sent on data channel. | - remote_id : the base64 remote peer id.- state_uniq : endpoint identifier.- conn_uniq : connection identifier. | |
tx5.conn.data.recv.message.count | u64_observable_counter | Message count received on data channel. | - remote_id : the base64 remote peer id.- state_uniq : endpoint identifier.- conn_uniq : connection identifier. | |
hc.conductor.p2p_event.duration | f64_histogram | s | The time spent processing a p2p event. | - dna_hash : The DNA hash that this event is being sent on behalf of. |
hc.conductor.post_commit.duration | f64_histogram | s | The time spent executing a post commit. | - dna_hash : The DNA hash that this post commit is running for.- agent : The agent running the post commit. |
hc.conductor.workflow.duration | f64_histogram | s | The time spent running a workflow. | - workflow : The name of the workflow.- dna_hash : The DNA hash that this workflow is running for.- agent : (optional) The agent that this workflow is running for if the workflow is cell bound. |
hc.cascade.duration | f64_histogram | s | The time taken to execute a cascade query. | |
hc.db.pool.utilization | f64_gauge | The utilisation of connections in the pool. | - kind : The kind of database such as Conductor, Wasm or Dht etc.- id : The unique identifier for this database if multiple instances can exist, such as a Dht database. | |
hc.db.connections.use_time | f64_histogram | s | The time between borrowing a connection and returning it to the pool. | - kind : The kind of database such as Conductor, Wasm or Dht etc.- id : The unique identifier for this database if multiple instances can exist, such as a Dht database. |
hc.ribosome.wasm.usage | u64_counter | The metered usage of a wasm ribosome. | - dna : The DNA hash that this wasm is metered for.- zome : The zome that this wasm is metered for.- fn : The function that this wasm is metered for.- agent : The agent that this wasm is metered for (if there is one). |
Enums§
- Configuration for holochain metrics.