otlp2parquet
What if your observability data was just a bunch of Parquet files?
Receive OpenTelemetry logs, metrics, and traces and write them as Parquet files to local disk, cloud storage or Apache Iceberg. Query with DuckDB, Spark, or anything that reads Parquet.
Quick Start
See Deploy to Cloud for running in an AWS Lambda or Cloudflare Worker.
# requires rust toolchain: `curl https://sh.rustup.rs -sSf | sh`
Server starts on http://localhost:4318. Send a simple OTLP HTTP log:
Query it:
# see https://duckdb.org/install
Why?
- Keep monitoring data around a long time Parquet on S3 can be 90% cheaper than large monitoring vendors for long-term analytics.
- Query with good tools — duckDB, Spark, Athena, Trino, Pandas
- Easy Iceberg — Optional catalog support, including S3 Tables and R2 Data Catalog
- Deploy anywhere — Local binary, Cloudflare Workers, AWS Lambda
Deploy to the Cloud
Once you've kicked the tires locally, deploy to serverless:
Cloudflare Workers + R2 or R2 Data Catalog with wrangler CLI:
# Generates config for workers
# Deploy to Cloudflare
AWS Lambda + S3 or S3 Tables with AWS CLI:
# Generates a Cloudformation template for Lambda + S3
# Deploy with Cloudformation
# Send a log (requires IAM sigv4 auth by default)
Both commands walk you through setup and generate the config files you need.
Supported Signals
Logs, Metrics, Traces via OTLP/HTTP (protobuf or JSON, gzip compression supported). No gRPC support for now.
Future work (contributions welcome)
As of late 2025 this project is a prototype. More development in needed in the following areas:
Schema
- Alignment with OpenTelemetry Arrow project
Additional platform support
- Azure Functions
- Helm chart/manifest for Kubernetes
Apache Iceberg
There are many TODOs related to the Iceberg integration:
- Queueing writes to Iceberg with SQS or Cloudflare Queues
- Partitions
Learn More
- Batching: Serverless deployments write one file per request. Don't write a lot of small files or your performance and cloud bill will explode. Use an OTel Collector upstream to batch, or enable S3 Tables / R2 Data Catalog for automatic compaction.
- Schema: Uses ClickHouse-compatible column names. Will converge with OTel Arrow (OTAP) when it stabilizes.
- Status: Functional but evolving. API may change.