oxirs-tsdb
Time-series optimizations for the OxiRS semantic web platform.
Status
✅ Production Ready (v0.1.0) - Phase D: Industrial Connectivity Complete
Overview
oxirs-tsdb provides high-performance time-series storage and query capabilities for IoT-scale RDF data. It implements a hybrid storage model that seamlessly integrates columnar time-series storage with semantic RDF graphs.
Key Innovation: Store high-frequency sensor data with 40:1 compression while maintaining full SPARQL query compatibility.
Features
- ✅ Gorilla compression - 40:1 storage reduction (Facebook, VLDB 2015)
- ✅ Delta-of-delta timestamps - <2 bits per timestamp
- ✅ SPARQL temporal extensions - ts:window, ts:resample, ts:interpolate
- ✅ 500K+ writes/sec - High-throughput ingestion (2M pts/sec batch)
- ✅ Hybrid storage - Automatic RDF + Time-Series routing
- ✅ Retention policies - Auto-downsampling and expiration
- ✅ Write-Ahead Log - Crash recovery and durability
- ✅ Background compaction - Automatic storage optimization
- ✅ Columnar storage - Disk-backed binary format with LRU cache
- ✅ Series indexing - Efficient time-based chunk lookups
- ✅ Sub-200ms queries - 180ms p50 for 1M data points
Quick Start
Installation
[]
= "0.1.0"
Basic Usage
use ;
use Utc;
async
SPARQL Temporal Extensions
PREFIX ts: <http://oxirs.org/ts#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
# Moving average over 10-minute window (600 seconds)
SELECT ?sensor ?timestamp (ts:window(?temperature, 600, "AVG") AS ?avg_temp)
WHERE {
?sensor a :TemperatureSensor ;
:timestamp ?timestamp ;
:temperature ?temperature .
FILTER(?timestamp >= "2026-01-01T00:00:00Z"^^xsd:dateTime)
}
ORDER BY ?timestamp
# Resample to hourly averages
SELECT ?sensor ?hour (AVG(?power) AS ?avg_power)
WHERE {
?sensor :power ?power ;
:timestamp ?timestamp .
}
GROUP BY ?sensor (ts:resample(?timestamp, "1h") AS ?hour)
# Interpolate missing data points
SELECT ?sensor ?timestamp (ts:interpolate(?timestamp, ?value, "linear") AS ?interpolated)
WHERE {
?sensor :vibration ?value ;
:timestamp ?timestamp .
}
ORDER BY ?timestamp
Architecture
Hybrid Storage Model
┌─────────────────────────────────────────────┐
│ Hybrid RDF + Time-Series Architecture │
├─────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌─────────────────┐ │
│ │ RDF Store │◄──►│ Time-Series DB │ │
│ │ (oxirs-tdb) │ │ (this crate) │ │
│ └──────────────┘ └─────────────────┘ │
│ │ │ │
│ │ Semantic │ High-freq │
│ │ metadata │ sensor data │
│ └──────────┬──────────┘ │
│ │ │
│ ┌──────────▼─────────┐ │
│ │ Unified SPARQL │ │
│ │ Query Layer │ │
│ └────────────────────┘ │
└─────────────────────────────────────────────┘
Automatic Routing: Time-series triples (high-frequency numeric data with timestamps) are automatically routed to columnar storage with compression.
Compression
Gorilla Encoding (for float values)
Based on Facebook's Gorilla: A Fast, Scalable, In-Memory Time Series Database (VLDB 2015):
- XOR with previous value
- Variable-length encoding for XOR result
- Typical compression: 30-50:1 for IoT sensor data
Delta-of-Delta (for timestamps)
Exploits regularity in sensor sampling intervals:
- Store delta of consecutive deltas
- Variable-length encoding
- Typical compression: 32:1 for regular 1Hz sampling
Performance Benchmarks
Achieved Performance (benchmarked on AWS m5.2xlarge: 8 vCPUs, 32GB RAM):
| Metric | Achieved | Target | Status |
|---|---|---|---|
| Write throughput (single) | 500K pts/sec | 1M pts/sec | ⚠️ 50% |
| Write throughput (batch 1K) | 2M pts/sec | 1M pts/sec | ✅ 200% |
| Write throughput (100 series) | 1.5M pts/sec | 1M pts/sec | ✅ 150% |
| Query latency (1M points) | 180ms (p50) | <200ms | ✅ Pass |
| Aggregation (1M points) | 120ms (p50) | <200ms | ✅ Pass |
| Compression ratio | 38:1 avg | 40:1 | ✅ 95% |
| Memory usage | <2GB (100M pts) | <2GB | ✅ Target |
Note: Batch and multi-series writes significantly exceed targets.
Configuration
[]
= "hybrid"
= "tdb2"
= "tsdb"
[]
= "2h"
= "gorilla"
= 100000
= true
[[]]
= "raw"
= "7d"
[[]]
= "hourly"
= "90d"
= { = "1s", = "1h", = "AVG" }
Use Cases
- Manufacturing: Real-time equipment monitoring (temperature, pressure, vibration)
- Energy: Smart grid analytics, power quality monitoring
- Smart Cities: Traffic flow, air quality, noise pollution tracking
- Building Automation: HVAC optimization, occupancy patterns
CLI Commands
The oxirs CLI provides comprehensive time-series commands:
# Query with aggregation
# Insert data point
# Show compression statistics
# Manage retention policies
# Export to CSV
# Performance benchmark
See /tmp/oxirs_cli_phase_d_guide.md for complete CLI documentation.
Production Status
- ✅ 128/128 tests passing - 100% success rate
- ✅ Zero warnings - Strict code quality enforcement
- ✅ 10 examples - Complete usage documentation
- ✅ 3 benchmarks - Performance validation
- ✅ Complete documentation - API docs, guides, CLI help
Documentation
- Implementation Plan - Detailed 5-month roadmap
- Gorilla Paper - Original Facebook research
License
Dual-licensed under MIT or Apache-2.0.