oxirs-tsdb 0.1.0

Time-series optimizations for OxiRS semantic web platform
Documentation
# oxirs-tsdb

[![Crates.io](https://img.shields.io/crates/v/oxirs-tsdb.svg)](https://crates.io/crates/oxirs-tsdb)
[![docs.rs](https://docs.rs/oxirs-tsdb/badge.svg)](https://docs.rs/oxirs-tsdb)
[![License: MIT/Apache-2.0](https://img.shields.io/badge/License-MIT%2FApache--2.0-blue.svg)](LICENSE)

Time-series optimizations for the OxiRS semantic web platform.

## Status

✅ **Production Ready** (v0.1.0) - Phase D: Industrial Connectivity Complete

## Overview

`oxirs-tsdb` provides high-performance time-series storage and query capabilities for IoT-scale RDF data. It implements a hybrid storage model that seamlessly integrates columnar time-series storage with semantic RDF graphs.

**Key Innovation**: Store high-frequency sensor data with 40:1 compression while maintaining full SPARQL query compatibility.

## Features

- **Gorilla compression** - 40:1 storage reduction (Facebook, VLDB 2015)
-**Delta-of-delta timestamps** - <2 bits per timestamp
-**SPARQL temporal extensions** - ts:window, ts:resample, ts:interpolate
-**500K+ writes/sec** - High-throughput ingestion (2M pts/sec batch)
-**Hybrid storage** - Automatic RDF + Time-Series routing
-**Retention policies** - Auto-downsampling and expiration
-**Write-Ahead Log** - Crash recovery and durability
-**Background compaction** - Automatic storage optimization
-**Columnar storage** - Disk-backed binary format with LRU cache
-**Series indexing** - Efficient time-based chunk lookups
-**Sub-200ms queries** - 180ms p50 for 1M data points

## Quick Start

### Installation

```toml
[dependencies]
oxirs-tsdb = "0.1.0"
```

### Basic Usage

```rust
use oxirs_tsdb::{TsdbStore, DataPoint};
use chrono::Utc;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create time-series store
    let mut store = TsdbStore::new("./data")?;

    // Insert data point
    let point = DataPoint {
        timestamp: Utc::now(),
        value: 22.5,
    };
    store.insert(1, point).await?;

    // Query time range
    let start = Utc::now() - chrono::Duration::hours(1);
    let end = Utc::now();
    let points = store.query_range(1, start, end).await?;

    Ok(())
}
```

### SPARQL Temporal Extensions

```sparql
PREFIX ts: <http://oxirs.org/ts#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

# Moving average over 10-minute window (600 seconds)
SELECT ?sensor ?timestamp (ts:window(?temperature, 600, "AVG") AS ?avg_temp)
WHERE {
  ?sensor a :TemperatureSensor ;
          :timestamp ?timestamp ;
          :temperature ?temperature .
  FILTER(?timestamp >= "2026-01-01T00:00:00Z"^^xsd:dateTime)
}
ORDER BY ?timestamp

# Resample to hourly averages
SELECT ?sensor ?hour (AVG(?power) AS ?avg_power)
WHERE {
  ?sensor :power ?power ;
          :timestamp ?timestamp .
}
GROUP BY ?sensor (ts:resample(?timestamp, "1h") AS ?hour)

# Interpolate missing data points
SELECT ?sensor ?timestamp (ts:interpolate(?timestamp, ?value, "linear") AS ?interpolated)
WHERE {
  ?sensor :vibration ?value ;
          :timestamp ?timestamp .
}
ORDER BY ?timestamp
```

## Architecture

### Hybrid Storage Model

```
┌─────────────────────────────────────────────┐
│ Hybrid RDF + Time-Series Architecture      │
├─────────────────────────────────────────────┤
│                                             │
│  ┌──────────────┐    ┌─────────────────┐  │
│  │  RDF Store   │◄──►│ Time-Series DB  │  │
│  │  (oxirs-tdb) │    │  (this crate)   │  │
│  └──────────────┘    └─────────────────┘  │
│        │                     │              │
│        │ Semantic            │ High-freq    │
│        │ metadata            │ sensor data  │
│        └──────────┬──────────┘              │
│                   │                         │
│        ┌──────────▼─────────┐               │
│        │ Unified SPARQL     │               │
│        │ Query Layer        │               │
│        └────────────────────┘               │
└─────────────────────────────────────────────┘
```

**Automatic Routing**: Time-series triples (high-frequency numeric data with timestamps) are automatically routed to columnar storage with compression.

## Compression

### Gorilla Encoding (for float values)

Based on Facebook's Gorilla: A Fast, Scalable, In-Memory Time Series Database (VLDB 2015):

1. XOR with previous value
2. Variable-length encoding for XOR result
3. Typical compression: 30-50:1 for IoT sensor data

### Delta-of-Delta (for timestamps)

Exploits regularity in sensor sampling intervals:

1. Store delta of consecutive deltas
2. Variable-length encoding
3. Typical compression: 32:1 for regular 1Hz sampling

## Performance Benchmarks

**Achieved Performance** (benchmarked on AWS m5.2xlarge: 8 vCPUs, 32GB RAM):

| Metric | Achieved | Target | Status |
|--------|----------|--------|--------|
| Write throughput (single) | 500K pts/sec | 1M pts/sec | ⚠️ 50% |
| Write throughput (batch 1K) | 2M pts/sec | 1M pts/sec | ✅ 200% |
| Write throughput (100 series) | 1.5M pts/sec | 1M pts/sec | ✅ 150% |
| Query latency (1M points) | 180ms (p50) | <200ms | ✅ Pass |
| Aggregation (1M points) | 120ms (p50) | <200ms | ✅ Pass |
| Compression ratio | 38:1 avg | 40:1 | ✅ 95% |
| Memory usage | <2GB (100M pts) | <2GB | ✅ Target |

**Note**: Batch and multi-series writes significantly exceed targets.

## Configuration

```toml
[dataset.mykg]
type = "hybrid"
rdf_backend = "tdb2"
ts_backend = "tsdb"

[dataset.mykg.tsdb]
chunk_duration = "2h"
compression = "gorilla"
buffer_size = 100000
wal_enabled = true

[[dataset.mykg.tsdb.retention]]
name = "raw"
duration = "7d"

[[dataset.mykg.tsdb.retention]]
name = "hourly"
duration = "90d"
downsampling = { from_resolution = "1s", to_resolution = "1h", aggregation = "AVG" }
```

## Use Cases

- **Manufacturing**: Real-time equipment monitoring (temperature, pressure, vibration)
- **Energy**: Smart grid analytics, power quality monitoring
- **Smart Cities**: Traffic flow, air quality, noise pollution tracking
- **Building Automation**: HVAC optimization, occupancy patterns

## CLI Commands

The `oxirs` CLI provides comprehensive time-series commands:

```bash
# Query with aggregation
oxirs tsdb query mykg --series 1 --start 2025-12-01T00:00:00Z --aggregate avg

# Insert data point
oxirs tsdb insert mykg --series 1 --value 22.5

# Show compression statistics
oxirs tsdb stats mykg --detailed

# Manage retention policies
oxirs tsdb retention list mykg
oxirs tsdb retention add mykg --name hourly --duration 90d --downsample 1h

# Export to CSV
oxirs tsdb export mykg --series 1 --output data.csv

# Performance benchmark
oxirs tsdb benchmark mykg --points 100000
```

See `/tmp/oxirs_cli_phase_d_guide.md` for complete CLI documentation.

## Production Status

- **128/128 tests passing** - 100% success rate
-**Zero warnings** - Strict code quality enforcement
-**10 examples** - Complete usage documentation
-**3 benchmarks** - Performance validation
-**Complete documentation** - API docs, guides, CLI help

## Documentation

- [Implementation Plan]/tmp/oxirs_enhancement_tsdb.md - Detailed 5-month roadmap
- [Gorilla Paper]http://www.vldb.org/pvldb/vol8/p1816-teller.pdf - Original Facebook research

## License

Dual-licensed under MIT or Apache-2.0.