otlp2pipeline
Stream OpenTelemetry data to Cloudflare R2 Data Catalog, Amazon S3 Tables, or Azure ADLS Gen2.
Table of Contents
What it does
Receives OpenTelemetry data and routes it to object storage via cloud-native pipelines. Data lands in Parquet format using a Clickhouse-inspired schema.
Cloud services handle batching and format conversion. Catalog maintenance (compaction, snapshot expiration) runs automatically. Only want to do some simple OTLP->Parquet conversion without the cloud bells and whistles? Check out otlp2parquet.
Why?
- Deploy a managed Iceberg observability backend with one command
- Query with any tool that reads Iceberg: DuckDB, Pandas, Trino, Athena, AI agents
Quickstart
Install the CLI for your favorite cloud provider:
# requires rust toolchain: `curl https://sh.rustup.rs -sSf | sh`
# Create a new project to deploy on AWS (requires AWS CLI)
# or create a new project with Cloudflare (requires the wrangler CLI)
# or create a new project with Azure (requires Azure CLI)
# see what will be created automatically
Deploy to Cloudflare
Requires the wrangler CLI.
# 1. `init` a Cloudflare project as described above
# 2. Create R2 API token (Admin Read & Write)
# https://dash.cloudflare.com/?to=/:account/r2/api-token
#3. Create pipelines
# 3a. Set token for worker to write to pipeline
# Go to https://dash.cloudflare.com/?to=/:account/api-tokens
# Create key with "Workers Pipelines: Edit" permissions
# 3b. Deploy worker defined in wrangler.toml
Deploy to AWS or Azure
Requires the AWS CLI or Azure CLI configured with appropiate credentials to create resources.
# 1. `init` an AWS/Azure project as described above
# 2. Deploy with authentication turned on
Check status and connect Claude Code, Collectors, or Codex
# Verify successful deployment
# How to stream telemetry from different sources
# Query tables with DuckDB, by default data is available after ~5 minutes
Cloudflare
Worker Architecture
flowchart TB
subgraph Ingest["Ingest + Store"]
OTLP[OTLP] --> W[Worker]
W --> P[Pipelines]
W --> DOs[(Durable Objects)]
P --> R2[(R2 Data Catalog / Iceberg)]
end
subgraph Query["Query"]
CLI[CLI / WebSocket] -->|real-time| W
DuckDB[🦆 DuckDB] --> R2
end
The worker uses Durable Objects for real-time RED metrics, see openapi.yaml for API details.
Cloudflare-specific CLI commands
Cloudflare has additional features like live tail log/traces and real-time RED metrics.
# List known services
# Stream live logs
# Stream live traces
AWS
Lambda Architecture
flowchart TB
subgraph Ingest["Ingest + Store"]
OTLP[OTLP] --> L[Lambda]
L --> F[Data Firehose]
F --> S3[(S3 Tables / Iceberg)]
end
subgraph Query["Query"]
DuckDB[🦆 DuckDB] --> S3
Athena[Athena / Trino] --> S3
end
Azure
Fabric users: Eventstreams can replace Stream Analytics for Azure Data Lake ingestion.
Stream Analytics Architecture
flowchart TB
subgraph Ingest["Ingest + Store"]
OTLP[OTLP] --> CA[Container App]
CA --> EH[Event Hub]
EH --> SA[Stream Analytics]
SA --> ADLS[(ADLS Gen2 / Parquet)]
end
subgraph Query["Query"]
DuckDB[🦆 DuckDB] --> ADLS
Synapse[Synapse / Spark] --> ADLS
end
Schema
Schemas come from the otlp2records library and generate at build time. The same schema is used by the otlp2parquet and duckdb-otlp projects.
Performance
Compaction and snapshot expiration run automatically where supported.
Security
Authentication
No built-in authentication. Use --auth flag (see Quickstart).
Input Validation
- Maximum payload size: 10 MB (after decompression)
- Invalid JSON or timestamps are rejected with 400 errors
- Service names: alphanumeric, hyphens, underscores, dots only (max 128 chars)
- Service registry limit: 10,000 unique services (returns 507 if exceeded)