# Observability
`kafkit-client` uses `tracing` for diagnostics and the `metrics` facade for
operational measurements. Applications choose the subscriber, metrics recorder,
and exporters; the client emits spans, events, counters, gauges, and histograms
around public operations, runtime tasks, network connections, metadata
refreshes, and Kafka protocol requests.
## Tracing Surface
The following span names are part of the initial diagnostic surface:
- `admin.*`: public admin operations, such as `admin.create_topics`,
`admin.describe_cluster`, `admin.describe_groups`, and `admin.delete_groups`.
- `producer.*`: public producer operations, such as `producer.send`,
`producer.flush`, `producer.init_transactions`, and transaction boundaries.
- `consumer.*`: public consumer operations, such as `consumer.subscribe`,
`consumer.poll`, `consumer.commit`, `consumer.assignment`, and
`consumer.shutdown`.
- `share_consumer.*`: share consumer connection and subscription operations.
- `consumer_runtime`: the background runtime task for a group consumer.
- `kafka_request`: one encoded Kafka request and decoded response.
- `metadata.refresh`: cluster metadata refresh work.
Important fields include `client_id`, `group_id`, `topic`, `partition`,
`broker_id`, `server`, `security_protocol`, `api_key`, `api_version`,
`correlation_id`, request/response byte counts, assigned partition counts, and
record counts. Producer queue diagnostics also include aggregate pending batch,
record, byte, retrying-batch, and oldest-batch-age fields. High-cardinality
payload data is not emitted by default.
## Recommended Filter
Start with debug-level client spans and trace-level Kafka request detail only
when diagnosing protocol or network behavior:
```sh
RUST_LOG=kafkit_client=debug
```
For short diagnostic captures:
```sh
RUST_LOG=kafkit_client=trace
```
## What To Look For
- Stuck producers: `producer.send`, `producer.flush`, `kafka_request`,
connection, retry, producer queue-depth, and metadata events.
- Slow consumers: `consumer.poll`, `consumer_runtime`, heartbeat, fetch,
assignment, and commit events.
- Metadata issues: `metadata.refresh` plus broker connection events.
- Auth or listener issues: connection events with `security_protocol` and
server fields.
- Broker disconnects: failed `kafka_request` spans and reconnect events around
coordinator or leader connections.
## Metrics
The client emits metrics through the process-wide `metrics` recorder. It does
not install a recorder itself, so applications can route the same measurements
to OpenTelemetry, Prometheus, logs, or tests.
Install an OpenTelemetry-compatible `metrics` recorder in the application before
creating clients. Once installed, the client records:
- Broker connection counts and connection latency.
- Kafka request counts, request latency, request bytes, and response bytes,
labelled by API key, API version, client id, and result.
- Producer queue, batch, record, accumulator, request, delivery success, and
delivery failure measurements.
- Consumer poll, fetch, buffered record, offset reset, delivered record, and
commit measurements.
- Share consumer poll, fetch, acknowledgement queue, and acknowledgement request
measurements.
Metric names use the `kafkit.client.*` prefix. Labels intentionally avoid record
keys, values, headers, and other payload data.