timeseries-table-datafusion
This crate lets you query a timeseries-table-core table using DataFusion SQL.
It focuses on time-series workloads and includes segment-level pruning:
when a query has a time predicate, we determine which data segments cannot
possibly match (based on each segment's ts_min/ts_max), and skip them
before execution.
The goal is simple: keep SQL queries fast without changing your data.
Getting Started
Installation
[]
= "0.1"
= "51"
= { = "1", = ["rt-multi-thread"] }
Rust API Example
use *;
use TimeSeriesTable;
use TsTableProvider;
use Arc;
async
How Pruning Works
When you include time predicates in your query (e.g., WHERE ts >= '...' AND ts < '...'),
the TsTableProvider extracts those filters and compares them against each segment's
metadata (ts_min, ts_max). Segments that fall entirely outside the query range
are skipped—no I/O is performed for them.
This happens automatically; you don't need to do anything special.
Current Limitations
- Read-only: This integration is for querying only. Use
timeseries-table-coreor the CLI for writes. - Best-effort filter extraction: Complex predicates may not be fully recognized (see below). Unrecognized predicates fall back to scanning all segments—correctness is preserved.
- No custom execution plan: We use DataFusion's default
ParquetExecafter pruning. Future versions may add custom operators.
Pruning Reference
Below is a detailed reference of which SQL predicates enable segment pruning.
What works well today
Basic time comparisons
We recognize direct comparisons between the timestamp column and a literal:
ts < '2024-01-01T00:00:00Z'ts >= '2024-01-01T00:00:00Z'ts = '2024-01-01T00:00:00Z'ts != '2024-01-01T00:00:00Z'
The literal can be an RFC3339 string or a timestamp literal. Date-only strings
like '2024-01-01' are treated as midnight UTC.
BETWEEN / IN / NOT / AND / OR
These are supported as long as they ultimately boil down to time comparisons:
ts BETWEEN '...' AND '...'ts IN ('...', '...')ts NOT IN ('...')NOT (ts < '...')- Combined with
AND/OR
When a predicate can’t be understood, we do not prune (safe fallback).
Interval arithmetic on ts
We support SQL interval arithmetic as long as it’s based on interval literals:
ts + INTERVAL '1 day' < '...'ts - INTERVAL '2 hours' <= '...'ts + INTERVAL '1 day' + INTERVAL '1 hour' < '...'INTERVAL '1 day' + ts < '...'
Intervals can include common units (seconds, millis, micros, nanos, minutes, hours, days, months, years). Mixed intervals are handled by combining months, days, and sub-day nanos.
to_timestamp* scalar functions
We recognize DataFusion’s timestamp helper functions when they use literals:
to_timestamp(...)to_timestamp_seconds(...)to_timestamp_millis(...)to_timestamp_micros(...)to_timestamp_nanos(...)
Both string literals and numeric epoch literals are supported.
Example:
WHERE ts < to_timestamp_millis(1704672000123)
to_unixtime(ts)
We also recognize to_unixtime(ts) when it is compared to a numeric literal:
to_unixtime(ts) >= 17046720001704672000 < to_unixtime(ts)
This uses seconds since epoch. String literals are not supported here.
to_date(ts)
We recognize to_date(ts) when it is compared to a date literal:
to_date(ts) = '2024-01-01'to_date(ts) < '2024-01-01'to_date(ts) >= '2024-01-01'
The date literal should be YYYY-MM-DD. The comparison is expanded into a
timestamp range for that whole day. If the timestamp column has a timezone,
day boundaries are computed in that timezone.
date_trunc(precision, ts)
We recognize date_trunc when it is compared to a timestamp literal:
date_trunc('hour', ts) = '2024-01-01T10:00:00Z'date_trunc('minute', ts) > '2024-01-01T10:30:00Z'date_trunc('day', ts) <= '2024-01-01T00:00:00Z'
Supported precisions: second, minute, hour, day.
Behavior notes:
- Non-aligned literals (e.g.
10:30for hour) are handled by moving the comparison to the next bucket boundary. - If the timestamp column has an Olson timezone, hour/day boundaries are computed in that timezone (DST-aware). Fine granularity (second/minute) uses UTC arithmetic, matching DataFusion’s fast path.
date_bin(interval, ts[, origin])
We recognize date_bin when all arguments are literals:
date_bin(interval '15 minutes', ts) = '2024-01-01T10:30:00Z'date_bin(interval '1 day', ts, '2024-01-01T03:00:00Z') > '2024-01-02T03:00:00Z'
Supported interval forms:
- Day/time intervals (days, hours, minutes, seconds, millis, micros, nanos)
- Month intervals (months only; no mixed month+day+nanos)
Behavior notes:
- Default origin is the Unix epoch (
1970-01-01T00:00:00Z) when omitted. - Binning uses UTC arithmetic (timezone is not applied to bin boundaries).
What does NOT prune (yet)
These are intentionally treated as “unknown” to avoid incorrect pruning:
ts + 1 < ...(numeric, not interval)ts + other_column < ...to_timestamp(...) + ts < ...interval - ts(non-commutative, ambiguous for our matcher)to_unixtime(ts) < '1704672000'(string literal)to_date(ts) = 'not-a-date'(invalid date literal)
Queries still run correctly; they just won’t be pruned.
Related
timeseries-table-cli— Command-line tool for managing and querying tables (no Rust code required)timeseries-table-core— Core library for table creation, appends, and coverage tracking