otlp2parquet

What if your observability data was just a bunch of Parquet files?

Receive OpenTelemetry logs, metrics, and traces and write them as Parquet files to local disk, cloud storage or Apache Iceberg. Query with DuckDB, Spark, or anything that reads Parquet.

otlp2parquet architecture

Quick Start

See Deploy to Cloud for running in an AWS Lambda or Cloudflare Worker.

# requires rust toolchain: `curl https://sh.rustup.rs -sSf | sh`
cargo install otlp2parquet

otlp2parquet

Server starts on http://localhost:4318. Send a simple OTLP HTTP log:

curl -X POST http://localhost:4318/v1/logs \
  -H "Content-Type: application/json" \
  -d '{"resourceLogs":[{"scopeLogs":[{"logRecords":[{"body":{"stringValue":"hello world"}}]}]}]}'

Query it:

# see https://duckdb.org/install
duckdb -c "SELECT * FROM './data/logs/**/*.parquet'"

Why?

Keep monitoring data around a long time Parquet on S3 can be 90% cheaper than large monitoring vendors for long-term analytics.
Query with good tools — duckDB, Spark, Athena, Trino, Pandas
Easy Iceberg — Optional catalog support, including S3 Tables and R2 Data Catalog
Deploy anywhere — Local binary, Cloudflare Workers (WASM), AWS Lambda

Deploy to the Cloud

Once you've kicked the tires locally, deploy to serverless:

Cloudflare Workers + R2 or R2 Data Catalog with wrangler CLI:

# Generates config for workers
otlp2parquet deploy cloudflare

# Deploy to Cloudflare
wrangler deploy

AWS Lambda + S3 or S3 Tables with AWS CLI:

# Generates a Cloudformation template for Lambda + S3
otlp2parquet deploy aws

# Deploy with Cloudformation
aws cloudformation deploy --template-file template.yaml --stack-name otlp2parquet --capabilities CAPABILITY_IAM

# Send a log (requires IAM sigv4 auth by default)
uvx awscurl \
  --service lambda \
  --region $AWS_REGION \
  -X POST $FUNCTION_URL \
  -H "Content-Type: application/json" \
  -d '{"resourceLogs":[{"scopeLogs":[{"logRecords":[{"body":{"stringValue":"hello world"}}]}]}]}'

Both commands walk you through setup and generate the config files you need.

Supported Signals

Logs, Metrics, Traces via OTLP/HTTP (protobuf or JSON, gzip compression supported). No gRPC support for now.

Stable Surface (v1)

OTLP/HTTP endpoints: /v1/logs, /v1/metrics, /v1/traces (protobuf or JSON; gzip supported)
Partition layout: logs/{service}/year=.../hour=.../{ts}-{uuid}.parquet, metrics/{type}/{service}/..., traces/{service}/...
Storage: filesystem, S3, or R2 with optional Iceberg catalog
Schemas: ClickHouse-compatible, PascalCase columns; five metric schemas (Gauge, Sum, Histogram, ExponentialHistogram, Summary)
Error model: HTTP 400 on invalid input/too large; 5xx on conversion/storage

Best-effort catalog commits: Parquet files are always written to storage first. If you enable an Iceberg catalog (S3 Tables, R2 Data Catalog), catalog registration happens after the write. If catalog registration fails (network error, conflict), the data is still safely stored and a warning is logged—your data is never lost due to catalog issues.

Future work (contributions welcome)

OpenTelemetry Arrow alignment
Additional platforms: Azure Functions; Kubernetes manifests
Iceberg ergonomics: queued commits (SQS/Queues), richer partition configs

Learn More

Batching: Serverless deployments write one file per request. Don't write a lot of small files or your performance and cloud bill will explode. Use an OTel Collector upstream to batch, or enable S3 Tables / R2 Data Catalog for automatic compaction.
Schema: Uses ClickHouse-compatible column names. Will converge with OTel Arrow (OTAP) when it stabilizes.
Status: Functional but evolving. API may change.

otlp2parquet 0.2.2