timeseries-table-core 0.2.2

Core engine for a Rust-native log-structured time-series table format (metadata log + coverage bitmaps).
Documentation

timeseries-table-core

Core engine for a log-structured time-series table format. It owns table metadata, append rules, coverage math, storage IO, and the TimeSeriesTable API that higher-level integrations build on.

This crate does not implement a query engine. It exposes metadata and scan streams that other layers (DataFusion, Polars, custom code) can consume.

Layers (module layout)

  • metadata: pure metadata model + validation (logical schema, table metadata, segment types). No IO.
  • transaction_log: append-only metadata log APIs (OCC) + table state materialization.
  • table: user-facing TimeSeriesTable API (create/open/append/scan).
  • storage: local backend + table-root IO helpers.
  • coverage: coverage math and gap analysis.
  • formats: format-specific helpers (currently formats::parquet).

During the refactor, older module paths remain available as compatibility re-exports (for example, transaction_log, time_series_table, helpers).

Responsibilities

  • Transaction log + metadata: versioned commits, optimistic concurrency, table schema.
  • Segment metadata: min/max timestamps, row counts, file format, coverage sidecars.
  • Coverage math: RoaringBitmap overlap checks and gap analysis in bucket space.
  • Storage access: local filesystem backend and atomic IO helpers.
  • User API: create/open/append/scan plus coverage/gap queries.

On-disk layout (local backend)

<table_root>/
  _timeseries_log/
    CURRENT                  # latest committed version
    0000000001.json          # commit files (LogAction list)
    ...
  _coverage/
    segments/<id>.roar       # per-segment RoaringBitmap coverage
    table/<ver>-<id>.roar    # table snapshot coverage
  data/...                   # Parquet segments (convention)

Transaction log and OCC

  • Commits are append-only JSON files under _timeseries_log/.
  • CURRENT stores the latest committed version.
  • Commit flow is optimistic:
    1. Read current version N.
    2. Build a commit with expected_version = N.
    3. Write version N+1 only if CURRENT is still N.
  • On conflict, the caller reloads and retries.

Log actions:

  • AddSegment: adds a new segment descriptor (and coverage_path if enabled).
  • UpdateTableMeta: updates table-level metadata (schema adoption, entity pinning).
  • UpdateTableCoverage: points to the latest table coverage snapshot.

Table lifecycle

Create

TimeSeriesTable::create validates TableMeta, writes the initial UpdateTableMeta commit, then rebuilds TableState.

Open

TimeSeriesTable::open rebuilds TableState from the log and extracts TimeIndexSpec. Empty logs return TableError::EmptyTable.

Append (Parquet)

  1. Read Parquet bytes from storage.
  2. Extract metadata and derive a LogicalSchema.
  3. If this is the first segment, adopt its schema; otherwise enforce exact match.
  4. Validate entity identity (if TimeIndexSpec.entity_columns is set).
  5. Compute coverage, reject overlaps, and write coverage sidecars.
  6. Commit AddSegment + optional UpdateTableMeta + UpdateTableCoverage.

Coverage and gaps

  • Bucket IDs: timestamps are mapped to discrete bucket IDs using TimeBucket.
  • Overlap checks: a new segment is rejected if any bucket ID already exists.
  • Snapshots: table coverage snapshots are the union of all segment coverages.
  • Recovery: if the snapshot sidecar is missing/corrupt, it is rebuilt from segment coverage sidecars when possible.
  • Read-side metrics: coverage ratio, missing runs, max gap length, and "last fully covered window" are computed against the current snapshot.

Coverage sidecar lifecycle (v0.1)

  1. Write per-segment coverage to _coverage/segments/<id>.roar.
  2. Union with the current snapshot to build the next snapshot.
  3. Write snapshot to _coverage/table/<ver>-<id>.roar.
  4. Commit both references in a single log entry.

Range scans

  1. Select segments whose [ts_min, ts_max] intersect [ts_start, ts_end).
  2. Read each segment, build a Parquet RecordBatch reader, and filter by time.
  3. Stream filtered batches in chronological order as TimeSeriesScan.

Schema rules (v0.1)

  • No schema evolution: all segments must match the canonical schema exactly.
  • Time column must exist and have a supported timestamp type.
  • If entity columns are configured, each segment must be single-entity and match the pinned table identity.

Error behavior (high level)

  • Missing coverage snapshot when segments exist yields a clear error.
  • Overlaps surface a dedicated overlap error with context.
  • Invalid time ranges return TableError::InvalidRange.
  • Schema mismatches fail before any commit is attempted.

Extension points

  • Storage: extend TableLocation and storage for new backends.
  • Segment formats: extend FileFormat and SegmentMeta::new_validated.
  • Schema evolution: rules live in helpers::schema.