Expand description
DataFusion Tracing is an extension for Apache DataFusion that helps you monitor and debug queries. It uses tracing
and OpenTelemetry to gather DataFusion metrics, trace execution steps, and preview partial query results.
Note: This is not an official Apache Software Foundation release.
§Overview
When you run queries with DataFusion Tracing enabled, it automatically adds tracing around execution steps, records all native DataFusion metrics such as execution time and output row count, lets you preview partial results for easier debugging, and integrates with OpenTelemetry for distributed tracing. This makes it simpler to understand and improve query performance.
§See it in action
Here’s what DataFusion Tracing can look like in practice:
Jaeger UI
DataDog UI
§Getting Started
§Installation
Include DataFusion Tracing in your project’s Cargo.toml
:
[dependencies]
datafusion = "50.0.0"
datafusion-tracing = "50.0.2"
§Compatibility note
The ellipsis truncation indicator in pretty_format_compact_batch
is disabled in this version
because it requires comfy-table >= 7.1.4
, while Apache Arrow currently pins comfy-table
to
7.1.2
to preserve its MSRV. Context: comfy-table 7.2.0
bumped MSRV to Rust 1.85 while Arrow
remains at 1.84. See arrow-rs issue #8243
and PR #8244. Arrow used an exact pin rather
than ~7.1
, which would also preserve MSRV while allowing 7.1.x (including 7.1.4). We will
re-enable it once Arrow relaxes the pin to allow >= 7.1.4
.
§Quick Start Example
use datafusion::{
arrow::{array::RecordBatch, util::pretty::pretty_format_batches},
error::Result,
execution::SessionStateBuilder,
prelude::*,
};
use datafusion_tracing::{
instrument_with_info_spans, pretty_format_compact_batch, InstrumentationOptions,
};
use std::sync::Arc;
use tracing::field;
#[tokio::main]
async fn main() -> Result<()> {
// Initialize tracing subscriber as usual
// (See examples/otlp.rs for a complete example).
// Set up tracing options (you can customize these).
let options = InstrumentationOptions::builder()
.record_metrics(true)
.preview_limit(5)
.preview_fn(Arc::new(|batch: &RecordBatch| {
pretty_format_compact_batch(batch, 64, 3, 10).map(|fmt| fmt.to_string())
}))
.add_custom_field("env", "production")
.add_custom_field("region", "us-west")
.build();
let instrument_rule = instrument_with_info_spans!(
options: options,
env = field::Empty,
region = field::Empty,
);
let session_state = SessionStateBuilder::new()
.with_default_features()
.with_physical_optimizer_rule(instrument_rule)
.build();
let ctx = SessionContext::new_with_state(session_state);
let results = ctx.sql("SELECT 1").await?.collect().await?;
println!(
"Query Results:\n{}",
pretty_format_batches(results.as_slice())?
);
Ok(())
}
A more complete example can be found in the examples directory.
§Optimizer rule ordering (put instrumentation last)
Always register the instrumentation rule last in your physical optimizer chain.
- Many optimizer rules identify nodes using
as_any().downcast_ref::<ConcreteExec>()
. Since instrumentation wraps each node in a privateInstrumentedExec
, those downcasts won’t match if instrumentation runs first, causing rules to be skipped or, in code that assumes success, to panic. - Some rules may rewrite parts of the plan after instrumentation. While
InstrumentedExec
re-wraps many common mutations, placing the rule last guarantees full, consistent coverage regardless of other rules’ behaviors.
Why is InstrumentedExec
private?
- To prevent downstream code from downcasting to or unwrapping the wrapper, which would be brittle and force long-term compatibility constraints on its internals. The public contract is the optimizer rule, not the concrete node.
How to ensure it is last:
- When chaining:
builder.with_physical_optimizer_rule(rule_a) .with_physical_optimizer_rule(rule_b) .with_physical_optimizer_rule(instrument_rule)
- Or collect:
builder.with_physical_optimizer_rules(vec![..., instrument_rule])
Macros§
- instrument_
with_ debug_ spans - Constructs a new instrumentation
PhysicalOptimizerRule
for a DataFusionExecutionPlan
at the debug level. - instrument_
with_ error_ spans - Constructs a new instrumentation
PhysicalOptimizerRule
for a DataFusionExecutionPlan
at the error level. - instrument_
with_ info_ spans - Constructs a new instrumentation
PhysicalOptimizerRule
for a DataFusionExecutionPlan
at the info level. - instrument_
with_ spans - Constructs a new instrumentation
PhysicalOptimizerRule
for a DataFusionExecutionPlan
. - instrument_
with_ trace_ spans - Constructs a new instrumentation
PhysicalOptimizerRule
for a DataFusionExecutionPlan
at the trace level. - instrument_
with_ warn_ spans - Constructs a new instrumentation
PhysicalOptimizerRule
for a DataFusionExecutionPlan
at the warn level.
Structs§
- Instrumentation
Options - Configuration options for instrumented execution plans.
Functions§
- pretty_
format_ compact_ batch - Formats a
RecordBatch
as a neatly aligned ASCII table, constraining the total width tomax_width
. Columns are dynamically resized or truncated, and columns that cannot fit within the given width may be dropped.