# Troubleshooting
Start with the connection path, then check protocol compatibility, then inspect
the specific producer, consumer, admin, or security workflow.
## Connection Failures
Check:
- The broker is Apache Kafka 4.0 or newer.
- The bootstrap addresses are reachable from the application host.
- Broker `advertised.listeners` point back to hosts the application can reach.
- The configured listener type matches the client settings.
- Firewalls, Docker port mappings, and Kubernetes services expose the listener.
If bootstrap succeeds but later requests fail, advertised listeners are a common
cause. The client uses metadata returned by the broker after the first
connection.
## TLS And SASL Failures
Check:
- TLS settings are used only against TLS listeners.
- SASL settings are used only against SASL-enabled listeners.
- The server name matches the broker certificate.
- Custom CA, client certificate, and client key paths are readable.
- SCRAM mechanism matches the broker-side credential.
- Credentials are not accidentally pointed at a plaintext listener.
## Producer Issues
Check:
- Topic metadata exists and partitions are available.
- Explicit partition numbers are valid for the topic.
- Idempotent or transactional producers use `acks = -1`.
- Transactional producers set a stable transactional id.
- The first `begin_transaction().await` can reach the transaction coordinator;
call `init_transactions().await` earlier only if you want to isolate startup
readiness failures.
- Delivery timeout, retry backoff, and request timeout are appropriate for the
cluster latency.
Use `flush().await` to isolate whether records are still buffered or failing.
## Consumer Issues
Check:
- The broker supports modern consumer groups.
- The topic subscription or regex matches existing topics.
- The group coordinator is reachable.
- `AutoOffsetReset` is set for new groups with no committed offsets.
- Manual assignments reference existing topic partitions.
- Commits happen after processing the returned records.
For empty polls, check whether the group already has committed offsets at the
end of the topic, whether records are being produced to a different topic, and
whether fetch settings are too restrictive.
## Share Consumer Issues
Check:
- The cluster supports share groups.
- The share consumer subscribed to at least one topic.
- Records are acknowledged with `Accept`, `Release`, or `Reject`.
- `commit_sync().await` is called when acknowledgements should be committed.
Share groups are not the same as normal consumer groups; do not use them when
partition ownership and per-partition ordering are required.
## Admin Issues
Check:
- The principal has authorization for the admin operation.
- Topic names and config resource names are correct.
- Replication factors do not exceed available brokers.
- Config keys are valid for the resource type.
- Delete operations are allowed by cluster policy.
## Error Model
Public APIs return `kafkit_client::Result<T>`, whose error type is
`kafkit_client::Error`. Match on the top-level variants when the application
needs category-specific handling:
- `Error::Broker` means Kafka returned a broker error code. The message includes
the operation and resource context when the client has it.
- `Error::Protocol` means a response could not be decoded or the broker does
not support a required API version.
- `Error::Validation` means a public API input was rejected before the client
sent a broker request.
- `Error::Admin`, `Error::Consumer`, and `Error::Producer` cover validation and
lifecycle failures raised before or around broker requests.
- `Error::TransactionState` describes transactional state-machine failures such
as fatal producer state or an abort-required transaction.
- `Error::Internal` is reserved for lower-level IO and protocol plumbing that
does not yet have a more specific public variant. Common input validation,
broker responses, lifecycle, and transaction-state failures should use typed
variants.
Use `error.classification()` for operational decisions. `is_retriable()` means
retrying after backoff, metadata refresh, or coordinator rediscovery may
succeed. `is_fatal()` means the client instance or transaction should not be
reused. `transaction_abort_required()` means a transactional producer must abort
the current transaction before more work.
Do not retry every failure blindly. Input validation errors, unsupported broker
features, invalid regexes, unsupported assignors, and fenced transactional or
static-member state require caller or deployment changes.
## Diagnostics
Enable `tracing` in your application so the client emits connection, producer,
consumer, and admin spans. Include bootstrap servers, listener type, client id,
group id, topic names, and the exact operation in bug reports.