██████╗ ██╗ ██╗██╗ ███████╗███████╗██████╗ ██████╗
██╔══██╗██║ ██║██║ ██╔════╝██╔════╝██╔══██╗██╔══██╗
██████╔╝██║ ██║██║ ███████╗█████╗ ██║ ██║██████╔╝
██╔═══╝ ██║ ██║██║ ╚════██║██╔══╝ ██║ ██║██╔══██╗
██║ ╚██████╔╝███████╗███████║███████╗██████╔╝██████╔╝
╚═╝ ╚═════╝ ╚══════╝╚══════╝╚══════╝╚═════╝ ╚═════╝
High-Performance Time-Series Database for Rust
✨ Feature Highlights
| Feature | Description |
|---|---|
| Pure Rust | Zero C dependencies. Single static binary. Cross-compiles anywhere Rust does. |
| Columnar Storage | Fields stored column-by-column in immutable segments for cache-friendly scans and dramatic compression. |
| Gorilla Float Compression | Facebook's XOR-based float encoding — 8–15× compression on metric data. |
| Delta-of-Delta Timestamps | Regular-interval timestamps compress to ~1 byte per point (10–50×). |
| Write-Ahead Log | Append-only WAL with CRC32 checksums. Crash recovery replays unflushed data on startup. |
| Inverted Tag Index | Tag key-value pairs map to sorted posting lists. O(n+m) intersection for compound predicates. |
| Time-Based Partitioning | Hourly partition directories for fast time-range pruning. Drop old data by deleting directories. |
| PulseQL | SQL-like query language: SELECT mean(cpu) FROM metrics WHERE host='a' GROUP BY time(5m). |
| InfluxDB Line Protocol | Compatible ingestion format — existing Telegraf, Prometheus, and IoT collectors work out of the box. |
| LZ4 Compression | Outer compression layer on encoded columns. ~4GB/s decompression speed. |
| Concurrent Reads | parking_lot::RwLock for minimal contention between writers and readers. |
| Background Compactor | Merges small segments within partitions for fewer files and faster scans. |
| Retention Policies | Auto-drop data older than a configurable duration. Delete a directory, reclaim space. |
| Regex Tag Matching | =~ and !~ operators in PulseQL WHERE clauses for flexible tag filtering. |
| Schema Enforcement | Schema-on-write prevents type conflicts — first write sets the type, mismatches are rejected. |
🏗️ Architecture
TCP :8086 HTTP :8087
(line protocol) (PulseQL)
│ │
▼ ▼
┌─────────┐ ┌─────────────────────────────────────────┐ ┌────────┐
│ Parser │───►│ Database Engine │◄──│ Parser │
└─────────┘ │ │ └────────┘
│ WAL ──► MemTable ──► Flush ──► Segment │
│ │ (columnar) │
│ ▼ ▲ │
│ Compactor ──────┘ │
│ │
│ SeriesIndex ◄── InvertedIndex │
└─────────────────────────────────────────┘
Write Path
- Line Protocol Parser — Parse incoming InfluxDB-compatible text
- WAL — Append-only log with CRC32 for durability
- MemTable — In-memory sorted buffer (BTreeMap per series)
- Flush — When memtable exceeds 64MB, freeze and write columnar segments
Read Path
- PulseQL Parser — Parse SQL-like query into an AST
- Planner — Resolve series via tag index, prune segments by time range
- Executor — Decompress and scan only needed columns
- Aggregator — Compute
mean,sum,min,max,countwithGROUP BY time(interval)
📊 Compression
PulseDB uses type-aware encodings tuned for time-series patterns, then wraps each column in LZ4:
| Data Type | Encoding | Algorithm | Typical Ratio |
|---|---|---|---|
| Timestamps | Delta-of-delta | delta[i] - delta[i-1] → zigzag → varint |
10–50× |
| Floats | Gorilla XOR | XOR consecutive values → leading zeros + meaningful bits | 8–15× |
| Integers | Delta + zigzag | Delta encode → zigzag → varint | 5–20× |
| Booleans | Bit-packing | 8 values per byte | 8× |
Combined: For typical metric workloads (regular timestamps, slowly changing floats), expect 12–25× total compression over raw storage.
📐 Data Model
cpu,host=server01,region=us-east usage_idle=98.2,usage_system=1.3 1672531200000000000
│ │ │ │
│ └─ tags (indexed) └─ fields (values) └─ timestamp (ns)
measurement
- Measurement — Logical grouping (like a table)
- Tags — Indexed string key-value pairs for filtering and grouping
- Fields — The actual data:
f64,i64,u64,bool - Timestamp — Nanosecond Unix epoch
📦 Installation
From Source
# Binary is at ./target/release/pulsedb
From crates.io
🚀 Quick Start
Start the Server
# Start with defaults (data in ./pulsedb_data, TCP :8086, HTTP :8087)
# Custom configuration
Write Data (Line Protocol)
Send data over TCP using InfluxDB line protocol:
# Single point
|
# Batch write
Or via HTTP:
Query Data (PulseQL)
# Interactive REPL
# HTTP API
📝 Query Language — PulseQL
SQL-like, purpose-built for time-series:
-- Aggregation with time bucketing
SELECT mean(usage_idle), max(usage_system)
FROM cpu
WHERE host = 'server01' AND time > now - 1h
GROUP BY time(5m)
-- Multi-tag filter with regex
SELECT sum(bytes_in)
FROM network
WHERE region = 'us-east' AND host =~ /web-\d+/
GROUP BY time(1m), host
-- Raw data retrieval
SELECT *
FROM temperature
WHERE sensor_id = 'T-42'
AND time BETWEEN '2024-01-01' AND '2024-01-02'
ORDER BY time DESC
LIMIT 1000
-- Downsampling with fill
SELECT mean(value) AS avg_temp, min(value), max(value)
FROM temperature
GROUP BY time(1h), location
FILL(linear)
Aggregation Functions
count · sum · mean / avg · min · max · first · last · stddev · percentile(field, N)
Operators
= · != · > · < · >= · <= · =~ (regex) · !~ · IN · AND · OR · BETWEEN
Duration Syntax
1ns · 100us · 5ms · 10s · 5m · 1h · 7d · 2w
🔌 Wire Protocol
Ingestion — TCP :8086
InfluxDB-compatible line protocol. Works with Telegraf, Prometheus remote_write adapters, and any tool that speaks line protocol.
<measurement>,<tag1>=<val1> <field1>=<fval1>,<field2>=<fval2> <timestamp_ns>
Field type suffixes: 1.0 (float), 1i (integer), 1u (unsigned), t/f (boolean), "hello" (string).
Query — HTTP :8087
| Endpoint | Method | Description |
|---|---|---|
/query |
POST | Execute PulseQL query, return JSON |
/write |
POST | Ingest line protocol over HTTP |
/health |
GET | Liveness check |
/status |
GET | Engine statistics (series count, throughput, disk usage) |
⚙️ Configuration
PulseDB is configured via CLI flags (config file support coming):
| Flag | Default | Description |
|---|---|---|
--data-dir |
./pulsedb_data |
Root directory for all data |
--tcp-port |
8086 |
Line protocol ingestion port |
--http-port |
8087 |
HTTP query API port |
--wal-fsync |
batch |
WAL fsync policy: every / batch / none |
--memtable-size |
64MB |
Flush threshold for in-memory buffer |
--segment-duration |
3600 |
Partition duration in seconds (1 hour) |
--retention |
∞ | Auto-drop data older than duration (e.g., 30d) |
--log-level |
info |
Logging: trace / debug / info / warn / error |
Data Directory Layout
pulsedb_data/
├── wal/
│ └── wal.log # Write-ahead log
├── partitions/
│ ├── 2024-01-15T14/ # Hourly partition
│ │ ├── cpu_host=server01.seg # Compressed columnar segment
│ │ └── mem_host=server01.seg
│ └── 2024-01-15T15/
│ └── ...
├── index/
│ ├── series.idx # Series key → ID mapping
│ └── tags.idx # Tag inverted index
└── meta/
└── measurements.json # Schema (field names + types)
🎯 Performance Targets
| Metric | Target |
|---|---|
| Write throughput | ≥ 1M points/sec (batch) |
| Single-point write latency | < 10μs (WAL + memtable) |
| Time-range query (1h, 1 series) | < 1ms |
| Time-range query (1h, 1K series) | < 50ms |
| Aggregation (24h, GROUP BY 5m) | < 10ms |
| Compression ratio (float metrics) | ≥ 10× |
| Memory (1M active series) | < 2GB |
| Segment flush (1M points) | < 100ms |
🏛️ Tech Stack
| Layer | Crate | Purpose |
|---|---|---|
| Async Runtime | tokio |
TCP/HTTP server, background tasks |
| Compression | lz4_flex |
Fast outer compression layer |
| Checksums | crc32fast |
WAL and segment integrity |
| Concurrency | parking_lot |
Low-overhead RwLock |
| CLI | clap (derive) |
Command-line argument parsing |
| Serialization | serde, serde_json |
Config, WAL payload, HTTP responses |
| Time | chrono |
Partition key formatting |
| Hashing | xxhash-rust (xxh3) |
Fast non-crypto hashing |
| Memory Mapping | memmap2 |
Zero-copy segment reads |
| Logging | tracing, tracing-subscriber |
Structured logging |
| Errors | thiserror, anyhow |
Error handling |
Module Structure
src/
├── main.rs # CLI entry point, server bootstrap
├── model/ # DataPoint, FieldValue, Tags, SeriesKey, SeriesId
├── encoding/ # Compression codecs
│ ├── timestamp.rs # Delta-of-delta + zigzag + varint
│ ├── float.rs # Gorilla XOR (Facebook paper)
│ ├── integer.rs # Delta + zigzag + varint
│ └── boolean.rs # Bit-packing
├── engine/ # Core database engine
│ ├── database.rs # Write path coordinator
│ ├── wal.rs # Write-ahead log
│ ├── memtable.rs # In-memory sorted buffer
│ └── config.rs # Engine configuration
├── storage/ # On-disk storage
│ ├── segment.rs # Columnar segment reader/writer
│ ├── partition.rs # Hourly time partitions
│ ├── cache.rs # Segment metadata cache
│ └── compactor.rs # Background segment merging
├── index/ # Series & tag indexing
│ ├── series.rs # Key → ID mapping
│ └── inverted.rs # Tag inverted index (posting lists)
├── query/ # Query engine (PulseQL parser, planner, executor)
├── server/ # TCP + HTTP network layer
└── cli/ # CLI commands (server, query, import, status)
🤝 Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-feature) - Make your changes with tests (
cargo test) - Ensure formatting (
cargo fmt) and lints pass (cargo clippy) - Open a pull request
Building & Testing
🗺️ Roadmap
- Core data model (DataPoint, FieldValue, Tags, SeriesKey)
- Compression codecs (delta-of-delta, Gorilla XOR, delta+zigzag, bit-pack)
- Write-ahead log with CRC32 crash recovery
- MemTable with freeze/rotate
- Columnar segment writer/reader with LZ4
- Time-based partitioning
- Series index + tag inverted index
- Segment flush integration (memtable → disk)
- Line protocol parser
- PulseQL query engine (lexer, parser, planner, executor)
- Aggregation functions (count, sum, mean, min, max, GROUP BY)
- TCP ingestion server
- HTTP query API
- CLI (server, query, import, status)
- Background compactor
- Retention policies
- Regex tag matching (=~ and !~ operators)
- Schema enforcement (type-mismatch rejection)
- Criterion benchmarks (ingestion, query, compression)
- Flamegraph profiling + hot-path optimization
- GitHub Actions CI
📄 License
MIT — see LICENSE for details.