Expand description
Initialize holochain metrics. This crate should only be used in binaries to initialize the actual metrics collection. Libraries should just use the opentelemetry crate to report metrics if any collector has been initialized.
§Environment Variables
When calling HolochainMetricsConfig::new(&path).init(), the actual
metrics instance that will be created is largely controlled by
the existence of environment variables.
Currently, by default, the Null metrics collector will be used, meaning metrics will not be collected, and all metrics operations will be no-ops.
If you wish to enable metrics, the current options are:
- A file, containing InfluxDB line protocol metrics. These can be pushed to InfluxDB later with Telegraf.
- Enable and configure via environment variable:
HOLOCHAIN_INFLUXIVE_FILE="path/to/influx/file"
- Enable and configure via environment variable:
- InfluxDB as a zero-config child-process.
- Enable via environment variable:
HOLOCHAIN_INFLUXIVE_CHILD_SVC=1 - The binaries
influxdandinfluxwill be downloaded and verified before automatically being run as a child process, and set up to be reported to. The InfluxDB UI will be available on a randomly assigned port (currently only reported in the trace logging).
- Enable via environment variable:
- InfluxDB as a pre-existing system process.
- Enable via environment variable:
HOLOCHAIN_INFLUXIVE_EXTERNAL=1 - Configure via environment variables:
HOLOCHAIN_INFLUXIVE_EXTERNAL_HOST=[my influxdb url]where a default InfluxDB install will needhttp://localhost:8086and otherwise can be found by runninginflux configin a terminal.HOLOCHAIN_INFLUXIVE_EXTERNAL_BUCKET=[my influxdb bucket name]but it’s simplest to useinfluxiveif you plan to import the provided dashboards.HOLOCHAIN_INFLUXIVE_EXTERNAL_TOKEN=[my influxdb auth token]
- The influxdb auth token must have permission to write to all buckets
- Metrics will be set up to report to this already running InfluxDB.
- Enable via environment variable:
All metrics modes automatically stamp a host tag on every emitted metric so that metrics from
different nodes can be distinguished when multiple Holochain instances write to a shared
InfluxDB. The value defaults to the OS hostname. Override it with:
HOLOCHAIN_INFLUXIVE_HOST_TAG=<my-custom-node-name>
To set the interval at which recorded metrics are written to Influx,
use OTEL_METRIC_EXPORT_INTERVAL. The value is specified as milliseconds.
10 s is the default. When the report interval is configured in the code,
it overrides this environment variable setting.
§Metric Naming Conventions
We will largely attempt to follow the guidelines for metric naming enumerated at https://opentelemetry.io/docs/specs/otel/metrics/semantic_conventions/, with additional rules made to fit with our particular project. We will also attempt to keep this documentation up to date on a best-effort basis to act as an example and registry of metrics available in Holochain, and related support dependency crates managed by the organization.
Generic naming convention rules:
- Dot notation logical module hierarchy. This need not, and perhaps should
not, match the rust crate/module hierarchy. As we may rearrange crates
and modules, but the metric names themselves should remain more
consistent.
- Examples:
hc.dbhc.workflow.integrationhc.ribosome.wasm
- Examples:
- A dot notation metric name or context should follow the logical module
name. The thing that can be charted should be the actual metric. Related
context that may want to be filtered for the chart should be attributes.
For example, a “request” may have two separate metrics, “duration”, and
“byte.count”, which both may have the filtering attribute “remote_id”.
- Examples
-
use opentelemetry::KeyValue; let req_dur = opentelemetry::global::meter("hc") .f64_histogram("hc.holochain_p2p.request.duration") .with_description("holochain p2p request duration") .with_unit("s") .build(); req_dur.record(0.42, &[KeyValue::new("remote_id", "abcd")]); -
use opentelemetry::KeyValue; let req_size = opentelemetry::global::meter("hc") .u64_histogram("hc.holochain_p2p.request.byte.count") .with_description("holochain p2p request byte count") .with_unit("B") .build(); req_size.record(42, &[ KeyValue::new("remote_id", "abcd"), ]);
-
- Examples
§Metric Name Registry
These following metrics are defined and recorded in their respective crates. Do a text search to look up metric type, description and unit.
| Full Metric Name | Type | Unit (optional) | Description | Attributes |
|---|---|---|---|---|
hc.db.connections.use_time | f64 histogram | s | The time between borrowing a connection and returning it to the pool | kind: DB type (authored/dht/cache/…), id: DB instance identifier |
hc.db.write_txn.duration | f64 histogram | s | The time spent executing an exclusive write transaction | kind: DB type (authored/dht/cache/…), id: DB instance identifier |
hc.keystore.lair_request.duration | f64 histogram | s | Duration of signing and encryption requests to Lair | operation: cryptographic operation (sign/encrypt/…) |
hc.conductor.workflow.duration | f64 histogram | s | The time spent running a workflow | workflow: workflow process name, dna_hash: DNA identifier, agent: agent public key |
hc.conductor.workflow.integrated_ops | u64 counter | The number of integrated operations | ||
hc.conductor.workflow.integration_delay | f64 histogram | s | Time between an op being stored and it being integrated | |
hc.conductor.workflow.validation_attempts | u64 histogram | Number of validation attempts required to integrate an op | ||
hc.conductor.post_commit.duration | f64 histogram | s | The time spent executing a post commit | dna_hash: DNA identifier, agent: agent public key |
hc.conductor.uptime | f64 observable gauge | s | The number of seconds the conductor has been running | |
hc.conductor.app_ws.dropped_signal | u64 counter | The number of signals dropped from app ws due to channel overload | ||
hc.ribosome.wasm.usage | u64 counter | The metered usage of a wasm ribosome | dna_hash: DNA identifier, zome: zome module name, fn: function name, agent: agent public key | |
hc.ribosome.zome_call.duration | f64 histogram | s | The time spent running a zome call | dna_hash: DNA identifier, zome: zome module name, fn: function name |
hc.ribosome.wasm_call.duration | f64 histogram | s | The time spent running a wasm call | dna_hash: DNA identifier, zome: zome module name, fn: function name, agent: agent public key |
hc.ribosome.host_fn_call.duration | f64 histogram | s | The time spent executing a host function call | dna_hash: DNA identifier, zome: zome module name, fn: function name, host_fn: host function name |
hc.ribosome.host_fn.emit_signal | u64 counter | The number of local signals emitted | cell_id: cell identifier, zome: zome module name | |
hc.ribosome.host_fn.send_remote_signal | u64 counter | The number of remote signals sent | dna_hash: DNA identifier, zome: zome module name | |
hc.cascade.duration | f64 histogram | s | The time taken to execute a cascade query | zome: originating zome name, fn: originating function name |
hc.cascade.fetch_error | u64 counter | Number of errors encountered while fetching data from the network | fetch_type: type of data fetched, zome: originating zome name, fn: originating function name | |
hc.holochain_p2p.request.duration | f64 histogram | s | The time spent sending an outgoing p2p request awaiting the response | dna_hash: DNA identifier, tag: request category tag, error: request failed, zome: originating zome name, fn: originating function name |
hc.holochain_p2p.handle_request.duration | f64 histogram | s | The time spent handling an incoming p2p request | message_type: p2p message type, dna_hash: DNA identifier |
hc.holochain_p2p.recv_remote_signal | u64 counter | The number of remote signals received | dna_hash: DNA identifier |
Enums§
- Holochain
Metrics Config - Configuration for holochain metrics.