Crate datafusion_tracing

Crate datafusion_tracing 

Source
Expand description

DataFusion Tracing is an extension for Apache DataFusion that helps you monitor and debug queries. It uses tracing and OpenTelemetry to gather DataFusion metrics, trace execution steps, and preview partial query results.

Note: This is not an official Apache Software Foundation release.

§Overview

When you run queries with DataFusion Tracing enabled, it automatically adds tracing around execution steps, records all native DataFusion metrics such as execution time and output row count, lets you preview partial results for easier debugging, and integrates with OpenTelemetry for distributed tracing. This makes it simpler to understand and improve query performance.

§See it in action

Here’s what DataFusion Tracing can look like in practice:

Jaeger UI

Jaeger UI screenshot

DataDog UI

DataDog UI screenshot

§Getting Started

§Installation

Include DataFusion Tracing in your project’s Cargo.toml:

[dependencies]
datafusion = "52.0.0"
datafusion-tracing = "52.0.0"

§Quick Start Example

use datafusion::{
    arrow::{array::RecordBatch, util::pretty::pretty_format_batches},
    error::Result,
    execution::SessionStateBuilder,
    prelude::*,
};
use datafusion_tracing::{
    instrument_rules_with_info_spans, instrument_with_info_spans,
    pretty_format_compact_batch, InstrumentationOptions, RuleInstrumentationOptions,
};
use std::sync::Arc;
use tracing::field;

#[tokio::main]
async fn main() -> Result<()> {
    // Initialize tracing subscriber as usual
    // (See examples/otlp.rs for a complete example).

    // Set up execution plan tracing options (you can customize these).
    let exec_options = InstrumentationOptions::builder()
        .record_metrics(true)
        .preview_limit(5)
        .preview_fn(Arc::new(|batch: &RecordBatch| {
            pretty_format_compact_batch(batch, 64, 3, 10).map(|fmt| fmt.to_string())
        }))
        .add_custom_field("env", "production")
        .add_custom_field("region", "us-west")
        .build();

    let instrument_rule = instrument_with_info_spans!(
        options: exec_options,
        env = field::Empty,
        region = field::Empty,
    );

    let session_state = SessionStateBuilder::new()
        .with_default_features()
        .with_physical_optimizer_rule(instrument_rule)
        .build();

    // Instrument all rules (analyzer, logical optimizer, physical optimizer)
    // Physical plan creation tracing is automatically enabled when physical_optimizer is set
    let rule_options = RuleInstrumentationOptions::full().with_plan_diff();
    let session_state = instrument_rules_with_info_spans!(
        options: rule_options,
        state: session_state
    );

    let ctx = SessionContext::new_with_state(session_state);

    // Execute a query - the entire lifecycle is now traced:
    // SQL Parsing -> Logical Plan -> Analyzer Rules -> Optimizer Rules ->
    // Physical Plan Creation -> Physical Optimizer Rules -> Execution
    let results = ctx.sql("SELECT 1").await?.collect().await?;
    println!(
        "Query Results:\n{}",
        pretty_format_batches(results.as_slice())?
    );

    Ok(())
}

A more complete example can be found in the examples directory.

Macros§

instrument_rules_with_debug_spans
Instruments a SessionState with DEBUG-level tracing spans.
instrument_rules_with_error_spans
Instruments a SessionState with ERROR-level tracing spans.
instrument_rules_with_info_spans
Instruments a SessionState with INFO-level tracing spans.
instrument_rules_with_spans
Instruments a SessionState with tracing spans for all rule phases.
instrument_rules_with_trace_spans
Instruments a SessionState with TRACE-level tracing spans.
instrument_rules_with_warn_spans
Instruments a SessionState with WARN-level tracing spans.
instrument_with_debug_spans
Constructs a new instrumentation PhysicalOptimizerRule for a DataFusion ExecutionPlan at the debug level.
instrument_with_error_spans
Constructs a new instrumentation PhysicalOptimizerRule for a DataFusion ExecutionPlan at the error level.
instrument_with_info_spans
Constructs a new instrumentation PhysicalOptimizerRule for a DataFusion ExecutionPlan at the info level.
instrument_with_spans
Constructs a new instrumentation PhysicalOptimizerRule for a DataFusion ExecutionPlan.
instrument_with_trace_spans
Constructs a new instrumentation PhysicalOptimizerRule for a DataFusion ExecutionPlan at the trace level.
instrument_with_warn_spans
Constructs a new instrumentation PhysicalOptimizerRule for a DataFusion ExecutionPlan at the warn level.

Structs§

InstrumentationOptions
Configuration options for instrumented execution plans.
RuleInstrumentationOptions
Configuration options for instrumented DataFusion rules (Analyzer, Optimizer, Physical Optimizer).

Functions§

pretty_format_compact_batch
Formats a RecordBatch as a neatly aligned ASCII table, constraining the total width to max_width. Columns are dynamically resized or truncated, and columns that cannot fit within the given width may be dropped.