Skip to main content

Module span_record

Module span_record 

Source
Expand description

Parquet-compatible span record schema (Sprint 40 - Golden Thread Core)

This module defines the canonical schema for storing OpenTelemetry spans in trueno-db’s Parquet-backed storage. The schema is optimized for:

  • Query performance: Flat structure for predicate pushdown
  • Compression: Columnar layout with RLE/dictionary encoding
  • W3C Trace Context: Native support for traceparent format
  • Causal ordering: Lamport logical clock for happens-before

§Design Principles

  1. Flat Structure: Parquet performs best with flat schemas (no deep nesting)
  2. Fixed-size IDs: trace_id (16 bytes), span_id (8 bytes) for efficient indexing
  3. JSON Attributes: Flexible key-value pairs stored as JSON string
  4. Timestamp Precision: Nanosecond precision for microsecond-level tracing
  5. Logical Causality: Lamport clock field for provable ordering

§Parquet Schema Mapping

SpanRecord (Rust)              →  Parquet Physical Type
├─ trace_id: [u8; 16]          →  FIXED_LEN_BYTE_ARRAY(16)
├─ span_id: [u8; 8]            →  FIXED_LEN_BYTE_ARRAY(8)
├─ parent_span_id: Option<..>  →  FIXED_LEN_BYTE_ARRAY(8), nullable=true
├─ span_name: String           →  BYTE_ARRAY (UTF8)
├─ span_kind: SpanKind         →  INT32 (enum)
├─ start_time_nanos: u64       →  INT64
├─ end_time_nanos: u64         →  INT64
├─ logical_clock: u64          →  INT64 (Lamport timestamp)
├─ duration_nanos: u64         →  INT64 (computed: end - start)
├─ status_code: StatusCode     →  INT32 (enum)
├─ status_message: String      →  BYTE_ARRAY (UTF8)
├─ attributes_json: String     →  BYTE_ARRAY (UTF8) - JSON map
├─ resource_json: String       →  BYTE_ARRAY (UTF8) - JSON map
├─ process_id: u32             →  INT32
└─ thread_id: u64              →  INT64

§Query Patterns

The schema is optimized for these access patterns:

-- Critical path queries (p95 <20ms for 1M spans)
SELECT * FROM spans WHERE trace_id = ?
SELECT * FROM spans WHERE trace_id = ? ORDER BY logical_clock
SELECT * FROM spans WHERE trace_id = ? AND parent_span_id IS NULL

-- Temporal range queries
SELECT * FROM spans WHERE start_time_nanos BETWEEN ? AND ?

-- Process/thread filtering
SELECT * FROM spans WHERE process_id = ? AND thread_id = ?

-- Status filtering (error analysis)
SELECT * FROM spans WHERE status_code = 2 -- ERROR

§Peer-Reviewed Foundation

  • Melnik et al. (2010). “Dremel: Interactive Analysis of Web-Scale Datasets.” Google.

    • Finding: Columnar storage with nested encoding enables <1s queries on trillion-row tables
    • Application: Parquet schema optimized for predicate pushdown
  • Lamb et al. (2012). “The Vertica Analytic Database.” VLDB.

    • Finding: Column-store compression (RLE, dictionary) achieves 10-50× reduction
    • Application: Fixed-size IDs and enums for optimal compression

Structs§

SpanRecord
Span record compatible with trueno-db Parquet storage

Enums§

SpanKind
Span kind (OpenTelemetry semantic convention)
StatusCode
Span status code (OpenTelemetry semantic convention)