Skip to main content

Crate holochain_metrics

Crate holochain_metrics 

Source
Expand description

Initialize holochain metrics. This crate should only be used in binaries to initialize the actual metrics collection. Libraries should just use the opentelemetry crate to report metrics if any collector has been initialized.

§Environment Variables

When calling HolochainMetricsConfig::new(&path).init(), the actual metrics instance that will be created is largely controlled by the existence of environment variables.

Currently, by default, the Null metrics collector will be used, meaning metrics will not be collected, and all metrics operations will be no-ops.

If you wish to enable metrics, the current options are:

  • A file, containing InfluxDB line protocol metrics. These can be pushed to InfluxDB later with Telegraf.
    • Enable and configure via environment variable: HOLOCHAIN_INFLUXIVE_FILE="path/to/influx/file"
  • InfluxDB as a zero-config child-process.
    • Enable via environment variable: HOLOCHAIN_INFLUXIVE_CHILD_SVC=1
    • The binaries influxd and influx will be downloaded and verified before automatically being run as a child process, and set up to be reported to. The InfluxDB UI will be available on a randomly assigned port (currently only reported in the trace logging).
  • InfluxDB as a pre-existing system process.
    • Enable via environment variable: HOLOCHAIN_INFLUXIVE_EXTERNAL=1
    • Configure via environment variables:
      • HOLOCHAIN_INFLUXIVE_EXTERNAL_HOST=[my influxdb url] where a default InfluxDB install will need http://localhost:8086 and otherwise can be found by running influx config in a terminal.
      • HOLOCHAIN_INFLUXIVE_EXTERNAL_BUCKET=[my influxdb bucket name] but it’s simplest to use influxive if you plan to import the provided dashboards.
      • HOLOCHAIN_INFLUXIVE_EXTERNAL_TOKEN=[my influxdb auth token]
    • The influxdb auth token must have permission to write to all buckets
    • Metrics will be set up to report to this already running InfluxDB.

All metrics modes automatically stamp a host tag on every emitted metric so that metrics from different nodes can be distinguished when multiple Holochain instances write to a shared InfluxDB. The value defaults to the OS hostname. Override it with:

  • HOLOCHAIN_INFLUXIVE_HOST_TAG=<my-custom-node-name>

To set the interval at which recorded metrics are written to Influx, use OTEL_METRIC_EXPORT_INTERVAL. The value is specified as milliseconds. 10 s is the default. When the report interval is configured in the code, it overrides this environment variable setting.

§Metric Naming Conventions

We will largely attempt to follow the guidelines for metric naming enumerated at https://opentelemetry.io/docs/specs/otel/metrics/semantic_conventions/, with additional rules made to fit with our particular project. We will also attempt to keep this documentation up to date on a best-effort basis to act as an example and registry of metrics available in Holochain, and related support dependency crates managed by the organization.

Generic naming convention rules:

  • Dot notation logical module hierarchy. This need not, and perhaps should not, match the rust crate/module hierarchy. As we may rearrange crates and modules, but the metric names themselves should remain more consistent.
    • Examples:
      • hc.db
      • hc.workflow.integration
      • hc.ribosome.wasm
  • A dot notation metric name or context should follow the logical module name. The thing that can be charted should be the actual metric. Related context that may want to be filtered for the chart should be attributes. For example, a “request” may have two separate metrics, “duration”, and “byte.count”, which both may have the filtering attribute “remote_id”.
    • Examples
      •   use opentelemetry::KeyValue;
          let req_dur = opentelemetry::global::meter("hc")
              .f64_histogram("hc.holochain_p2p.request.duration")
              .with_description("holochain p2p request duration")
              .with_unit("s")
              .build();
          req_dur.record(0.42, &[KeyValue::new("remote_id", "abcd")]);
      •   use opentelemetry::KeyValue;
          let req_size = opentelemetry::global::meter("hc")
              .u64_histogram("hc.holochain_p2p.request.byte.count")
              .with_description("holochain p2p request byte count")
              .with_unit("B")
              .build();
          req_size.record(42, &[
              KeyValue::new("remote_id", "abcd"),
          ]);

§Metric Name Registry

These following metrics are defined and recorded in their respective crates. Do a text search to look up metric type, description and unit.

Full Metric NameTypeUnit (optional)DescriptionAttributes
hc.db.connections.use_timef64 histogramsThe time between borrowing a connection and returning it to the poolkind: DB type (authored/dht/cache/…), id: DB instance identifier
hc.db.write_txn.durationf64 histogramsThe time spent executing an exclusive write transactionkind: DB type (authored/dht/cache/…), id: DB instance identifier
hc.keystore.lair_request.durationf64 histogramsDuration of signing and encryption requests to Lairoperation: cryptographic operation (sign/encrypt/…)
hc.conductor.workflow.durationf64 histogramsThe time spent running a workflowworkflow: workflow process name, dna_hash: DNA identifier, agent: agent public key
hc.conductor.workflow.integrated_opsu64 counterThe number of integrated operations
hc.conductor.workflow.integration_delayf64 histogramsTime between an op being stored and it being integrated
hc.conductor.workflow.validation_attemptsu64 histogramNumber of validation attempts required to integrate an op
hc.conductor.post_commit.durationf64 histogramsThe time spent executing a post commitdna_hash: DNA identifier, agent: agent public key
hc.conductor.uptimef64 observable gaugesThe number of seconds the conductor has been running
hc.conductor.app_ws.dropped_signalu64 counterThe number of signals dropped from app ws due to channel overload
hc.ribosome.wasm.usageu64 counterThe metered usage of a wasm ribosomedna_hash: DNA identifier, zome: zome module name, fn: function name, agent: agent public key
hc.ribosome.zome_call.durationf64 histogramsThe time spent running a zome calldna_hash: DNA identifier, zome: zome module name, fn: function name
hc.ribosome.wasm_call.durationf64 histogramsThe time spent running a wasm calldna_hash: DNA identifier, zome: zome module name, fn: function name, agent: agent public key
hc.ribosome.host_fn_call.durationf64 histogramsThe time spent executing a host function calldna_hash: DNA identifier, zome: zome module name, fn: function name, host_fn: host function name
hc.ribosome.host_fn.emit_signalu64 counterThe number of local signals emittedcell_id: cell identifier, zome: zome module name
hc.ribosome.host_fn.send_remote_signalu64 counterThe number of remote signals sentdna_hash: DNA identifier, zome: zome module name
hc.cascade.durationf64 histogramsThe time taken to execute a cascade queryzome: originating zome name, fn: originating function name
hc.cascade.fetch_erroru64 counterNumber of errors encountered while fetching data from the networkfetch_type: type of data fetched, zome: originating zome name, fn: originating function name
hc.holochain_p2p.request.durationf64 histogramsThe time spent sending an outgoing p2p request awaiting the responsedna_hash: DNA identifier, tag: request category tag, error: request failed, zome: originating zome name, fn: originating function name
hc.holochain_p2p.handle_request.durationf64 histogramsThe time spent handling an incoming p2p requestmessage_type: p2p message type, dna_hash: DNA identifier
hc.holochain_p2p.recv_remote_signalu64 counterThe number of remote signals receiveddna_hash: DNA identifier

Enums§

HolochainMetricsConfig
Configuration for holochain metrics.