dial9-trace-format 0.4.1

Self-describing binary trace format with schema registry
Documentation

dial9-trace-format

A binary trace format for tokio runtime telemetry, usable for any structured event stream.

See SPEC.md for the wire format specification.

Design principles

  1. Self-describing. Schemas are embedded in the stream so readers don't need out-of-band type definitions. No code generation or IDL compiler is needed to read a stream.
  2. Relatively compact. Events average ~15 bytes raw and ~3 bytes gzipped on a real-world tokio trace. The format isn't the absolute smallest possible; it trades a few bytes of overhead for the properties below.
  3. Extremely fast to write. The encoder does no allocations per event and uses fixed-width or LEB128 (variable-length integer) fields with no framing overhead beyond a 3-byte header. Benchmarks show ~48M events/s encode throughput on a single core.
  4. Compressible. Schemas are written once; events are pure data with no repeated field names or tags. Timestamps are delta-encoded as 3-byte offsets. This structure compresses well: gzip reduces a typical trace to ~20% of its raw size (and beats a hand-tuned bespoke format by 1.4x after compression).
  5. Simple enough to port. The entire wire format is ~5 frame types, LEB128 integers, and little-endian fixed-width fields. A JavaScript decoder is under 200 lines.

Numbers

On a 42k-event tokio runtime trace (format_comparison example, release mode):

Raw Gzipped
dial9-trace-format 632 KB (14.8 B/event) 129 KB (3.0 B/event)
Hand-tuned bespoke format 586 KB (13.7 B/event) 177 KB (4.1 B/event)

The self-describing format is ~8% larger raw but 37% smaller after gzip because its regular structure compresses better than the bespoke format's variable-length tag soup.

At 42k events per trace and a 1-second collection interval, a continuously-running agent produces roughly 11 GB/day raw or 2.3 GB/day gzipped.

Throughput (criterion, 1M mixed events, single core):

Operation Events/s
Encode ~48M
Decode (visitor, zero-alloc) ~30M
Decode (zero-copy ref) ~7M
Decode (owned) ~6M

Usage

Derive macro

For event types known at compile time, use #[derive(TraceEvent)]:

use dial9_trace_format::{TraceEvent, StackFrames};
use dial9_trace_format::encoder::Encoder;
use dial9_trace_format::decoder::{Decoder, DecodedFrame};

#[derive(TraceEvent)]
struct PollStart {
    #[traceevent(timestamp)]
    timestamp_ns: u64,
    worker_id: u64,
    task_id: u64,
}

#[derive(TraceEvent)]
struct CpuSample {
    #[traceevent(timestamp)]
    timestamp_ns: u64,
    tid: u32,
    frames: StackFrames,
}

// Encode
let mut enc = Encoder::new();
enc.write(&PollStart { timestamp_ns: 1_000_000, worker_id: 0, task_id: 42 });
enc.write(&CpuSample {
    timestamp_ns: 1_050_000, tid: 12345,
    frames: StackFrames(vec![0x5555_1234, 0x5555_0a00]),
});
let bytes = enc.finish();

// Decode
let mut dec = Decoder::new(&bytes).unwrap();
for frame in dec.decode_all() {
    match frame {
        DecodedFrame::Schema(s) => println!("schema: {}", s.name),
        DecodedFrame::Event { type_id, timestamp_ns, values } => {
            let name = &dec.registry().get(type_id).unwrap().name;
            println!("{name} @ {timestamp_ns:?}: {values:?}");
        }
        DecodedFrame::StringPool(entries) => println!("{} pool entries", entries.len()),
    }
}

The #[traceevent(timestamp)] attribute marks a u64 field as the event's timestamp. It is encoded as a u24 nanosecond delta in the event header (not as a regular field), giving nanosecond precision with no accumulation error. The encoder automatically emits TimestampReset frames when the delta exceeds ~16.7 ms.

Integer fields use fixed-width little-endian encoding (u8, u16, u32) or LEB128 (u64). The derive macro handles the mapping automatically.

Fields of type Option<T> are encoded as optional: 1 byte (0x00) when None, or 1 byte (0x01) followed by the inner encoding when Some. On decode, missing fields (not present in the wire schema) default to None. This supports schema evolution across feature flags and reduces wire size for frequently-absent values.

Manual schema registration

For event types whose fields are determined at runtime (e.g., user-defined metrics, kernel tracepoints), register schemas by name:

use dial9_trace_format::encoder::{Encoder, Schema};
use dial9_trace_format::schema::FieldDef;
use dial9_trace_format::types::{FieldType, FieldValue};

let mut enc = Encoder::new();

// Fields determined at runtime (e.g., from a config file)
let schema = enc.register_schema("CustomMetric", vec![
    FieldDef { name: "name".into(), field_type: FieldType::String },
    FieldDef { name: "value".into(), field_type: FieldType::Varint },
]).unwrap();

// First value is always the timestamp (encoded in the event header)
enc.write_event(&schema, &[
    FieldValue::Varint(1_000_000),       // timestamp_ns
    FieldValue::String("request_count".into()),
    FieldValue::Varint(42),
]).unwrap();

// Schemas are portable — pass the same handle to a different encoder
let mut enc2 = Encoder::new();
enc2.write_event(&schema, &[
    FieldValue::Varint(2_000_000),
    FieldValue::String("error_count".into()),
    FieldValue::Varint(3),
]).unwrap();

String interning

Frequently-repeated strings can be interned to avoid encoding them multiple times:

use dial9_trace_format::encoder::Encoder;

let mut enc = Encoder::new();
let id = enc.intern_string("my_function");
// Use `id` in InternedString fields of subsequent events

Symbol table

For CPU profile stack frame symbolization, attach symbol data as schema-based events:

use dial9_trace_format::encoder::Encoder;
use dial9_trace_format::schema::FieldDef;
use dial9_trace_format::types::{FieldType, FieldValue};

struct SymEntry;
let mut enc = Encoder::new();
let name_id = enc.intern_string("my_function").unwrap();
enc.register_schema_for::<SymEntry>("SymbolTableEntry", vec![
    FieldDef { name: "base_addr".into(), field_type: FieldType::Varint },
    FieldDef { name: "size".into(), field_type: FieldType::Varint },
    FieldDef { name: "symbol_name".into(), field_type: FieldType::PooledString },
]).unwrap();
enc.write_event_for::<SymEntry>(&[
    FieldValue::Varint(0x1000),
    FieldValue::Varint(256),
    FieldValue::PooledString(name_id),
]).unwrap();

JavaScript reader

A decode-only JS reader is at js/decode.js:

const { TraceDecoder } = require('./js/decode.js');
const fs = require('fs');

const dec = new TraceDecoder(fs.readFileSync('trace.bin'));
dec.decodeHeader();
for (const frame of dec.decodeAll()) {
    console.log(frame);
}

Field types

Rust type Wire type Notes
u8, u16, u32 Fixed 1, 2, or 4 bytes LE
u64 Varint LEB128, 1–10 bytes
i64 I64 8 bytes LE
f64 F64 8 bytes LE
bool Bool 1 byte
String String u32 length + UTF-8
Vec<u8> Bytes u32 length + raw
StackFrames StackFrames u32 count + u64 LE addresses
Vec<(String, String)> StringMap u32 count + key/value pairs

PooledString (u32 pool ID) is available via manual schema registration.