tsink 0.10.2

A lightweight embedded time-series database with a straightforward API
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
# PromQL implementation

tsink ships a native PromQL parser and evaluator with no external query layer.
The implementation lives entirely in `src/promql/` and is exposed through the
public `tsink::promql` module.

---

## Architecture

The PromQL pipeline has three stages:

```
query string → Lexer → Parser → Evaluator → PromqlValue
```

| Module | File | Responsibility |
|---|---|---|
| Lexer | `src/promql/lexer.rs` | Tokenise raw input into a flat `Vec<Token>` |
| Parser | `src/promql/parser.rs` | Turn tokens into an `Expr` AST (Pratt / precedence-climbing) |
| Evaluator | `src/promql/eval/` | Walk the AST and resolve values against the storage engine |
| Types | `src/promql/types.rs` | `PromqlValue`, `Sample`, `Series`, histogram helpers |
| Errors | `src/promql/error.rs` | `PromqlError` enum |

---

## Public API

### Parsing

```rust
use tsink::promql::{parse, ast::Expr};

let expr: Expr = parse("rate(http_requests_total[5m])")?;
```

### Query engine

```rust
use std::sync::Arc;
use tsink::promql::{Engine, PromqlValue};
use tsink::{Storage, TimestampPrecision};

// Build an engine from any Arc<dyn Storage>.
let engine = Engine::with_precision(storage, TimestampPrecision::Milliseconds);

// Instant query — evaluate at a single timestamp.
let result: PromqlValue = engine.instant_query("up", eval_time)?;

// Range query — evaluate over [start, end] at each step.
let result: PromqlValue = engine.range_query("rate(errors_total[1m])", start, end, step)?;
```

`Engine::new` defaults to `TimestampPrecision::Nanoseconds`.
Both methods parse the expression internally before evaluation.

---

## Value types

`PromqlValue` mirrors the four PromQL expression result types.

| Variant | Contents | When returned |
|---|---|---|
| `Scalar(f64, i64)` | A single float and its evaluation timestamp | Literals, `scalar()`, arithmetic on two scalars |
| `InstantVector(Vec<Sample>)` | Zero or more labelled samples at one timestamp | Vector selectors, most functions |
| `RangeVector(Vec<Series>)` | Labelled time series with multiple samples | Matrix selectors, range queries |
| `String(String, i64)` | A string value and its evaluation timestamp | String literals |

### `Sample`

```rust
pub struct Sample {
    pub metric: String,
    pub labels: Vec<Label>,
    pub timestamp: i64,
    pub value: f64,
    pub histogram: Option<Box<NativeHistogram>>,
}
```

### `Series`

```rust
pub struct Series {
    pub metric: String,
    pub labels: Vec<Label>,
    pub samples: Vec<(i64, f64)>,       // (timestamp, float value)
    pub histograms: Vec<(i64, Box<NativeHistogram>)>,
}
```

---

## Lexer

The lexer is a single-pass byte scanner.  It produces all tokens in one call
(`Lexer::new(input).tokenize()`), returning `Vec<Token>` or a `PromqlError::Parse`.

### Comments

`#` starts a line comment; everything until the next newline is discarded.

### Identifiers

Identifiers follow the usual `[a-zA-Z_][a-zA-Z0-9_]*` alphabet.  Colons (`:`)
are also accepted inside identifiers to support recording-rule naming
conventions such as `job:http_requests:rate5m`.

### Keywords (case-insensitive)

`by`, `without`, `offset`, `bool`, `and`, `or`, `unless`, `on`, `ignoring`,
`group_left`, `group_right`, `atan2`, `inf`, `nan`

### Duration literals

A duration is a sequence of one or more `<integer><unit>` segments.

| Suffix | Unit |
|---|---|
| `ms` | milliseconds |
| `s` | seconds |
| `m` | minutes |
| `h` | hours |
| `d` | days (24 h) |
| `w` | weeks (7 d) |
| `y` | years (365 d) |

Segments can be combined: `1h30m`, `5m30s`, `2d12h`.
Durations are stored internally as milliseconds (`i64`).

### String literals

Double-quoted strings with `\"`, `\\`, `\n`, `\r`, `\t` escape sequences.

### Number literals

Decimal integers and floats.  The special identifiers `inf` and `nan` are
recognised as numeric tokens equivalent to `f64::INFINITY` and `f64::NAN`.

---

## Parser

The parser implements precedence-climbing (Pratt) parsing for expressions.
The entry point is `parser::parse(input)`.

### Expression grammar (summary)

```
expr        := unary ( binary_op modifiers expr )*
unary       := ('+' | '-')? primary postfix*
primary     := number | string | inf | nan
             | '(' expr ')'
             | '{' matchers '}'
             | ident [ aggregation | call | vector_selector ]
postfix     := '[' duration (':' duration?)? ']'   -- matrix selector or subquery
             | 'offset' signed_duration
             | '@' (number | 'start()' | 'end()')
```

### Operator precedence

Lower number = binds tighter.

| Precedence | Operators |
|---|---|
| 1 (highest) | `^` (right-associative) |
| 2 | `*`, `/`, `%`, `atan2` |
| 3 | `+`, `-` |
| 4 | `==`, `!=`, `<`, `>`, `<=`, `>=` |
| 5 | `and`, `unless` |
| 6 (lowest) | `or` |

### Label matchers

```
{label="value", label2!="value", label3=~"regex", label4!~"regex"}
```

| Operator | Semantics |
|---|---|
| `=` | Exact equality |
| `!=` | Exact inequality |
| `=~` | Regex match (anchored) |
| `!~` | Regex non-match (anchored) |

Regex matching compiles the pattern with the standard `regex` crate.

### Vector selector

```
metric_name
metric_name{labels}
{labels}
```

### Matrix selector

```
metric_name[5m]
metric_name{label="value"}[1h]
```

### Subquery

```
expr[range:step]
expr[range:]          # omit step → use query step or default (1m)
```

### @modifier and offset

```
metric @ 1700000000         # pin to Unix timestamp
metric @ start()            # pin to range query start
metric @ end()              # pin to range query end
metric offset 5m            # shift evaluation back 5 minutes
metric[10m] offset 1h       # combine range + offset
```

### Aggregations

```
sum(expr)
sum by  (label1, label2) (expr)
sum without (label1)     (expr)
```

The grouping clause can be placed either before or after the argument list.

### Binary operator modifiers

```
a + on(job)             b       # match only on "job"
a + ignoring(instance)  b       # ignore "instance" when matching
a * on(job) group_left  b       # many-to-one: keep left-side labels
a * on(job) group_right b       # one-to-many: keep right-side labels
a * on(job) group_left(region) b  # also copy "region" from right
a == bool   b                   # return 0/1 instead of filtering
```

`group_left` and `group_right` cannot be combined with set operators
(`and`, `or`, `unless`).

---

## Evaluator

The evaluator is in `src/promql/eval/` and is split across several files:

| File | Contents |
|---|---|
| `mod.rs` | `Engine`, instant and range query entry points, prefetch cache, `@` resolution |
| `selector.rs` | Instant vector and matrix selector evaluation |
| `functions.rs` | All built-in function implementations |
| `aggregation.rs` | Aggregation operator implementations |
| `binary.rs` | Binary operator evaluation and vector matching |
| `subquery.rs` | Subquery evaluation |
| `time.rs` | Duration/timestamp utilities (`duration_to_units`, `step_times`) |

### Default parameters

| Parameter | Default |
|---|---|
| Lookback delta | 5 minutes |
| Subquery step | 1 minute |

The lookback delta controls how far back an instant vector selector looks for
the most recent sample.

### Range query prefetch

For range queries the engine checks whether any selector uses a dynamic `@`
modifier or is wrapped in a subquery.  When no dynamic time is involved it
pre-fetches all required metric data from storage in a single pass before
iterating over steps.  This significantly reduces storage I/O for wide time
ranges.  Subqueries and `@` modifiers disable prefetch for accuracy.

---

## Aggregation operators

All aggregation operators accept an optional `by (labels)` or
`without (labels)` grouping clause.

| Operator | Parameter | Description |
|---|---|---|
| `sum` || Sum of values |
| `avg` || Average of values |
| `min` || Minimum value |
| `max` || Maximum value |
| `count` || Number of series |
| `group` || 1 for each group (existence aggregation) |
| `stddev` || Population standard deviation |
| `stdvar` || Population variance |
| `count_values` | `label` (string) | Count series per distinct value; adds a `label` dimension |
| `quantile` | `φ` (scalar) | φ-quantile across the group |
| `topk` | `k` (scalar) | Top k series by value |
| `bottomk` | `k` (scalar) | Bottom k series by value |
| `limitk` | `k` (scalar) | Deterministically select k series (hash-stable) |
| `limit_ratio` | `ratio` (scalar) | Deterministically select a ratio of series |

`sum`, `avg`, `min`, `max`, `count`, `group`, `stddev`, and `stdvar` support
native histograms for `sum`.  `count_values`, `quantile`, `topk`, and
`bottomk` require float samples.

---

## Functions

### Counter and gauge range functions

| Function | Input | Description |
|---|---|---|
| `rate(v[d])` | range vector | Per-second rate of counter increase (extrapolated to fit `d`) |
| `irate(v[d])` | range vector | Per-second instant rate using the last two samples |
| `increase(v[d])` | range vector | Total counter increase over `d` (extrapolated) |
| `delta(v[d])` | range vector | Value change over `d` (extrapolated, for gauges) |
| `idelta(v[d])` | range vector | Instant delta between the last two samples |
| `changes(v[d])` | range vector | Number of value changes within `d` |
| `resets(v[d])` | range vector | Number of counter resets within `d` |

`rate` and `increase` support native histogram series and produce a histogram
result.  The other range functions require float samples.

**Extrapolation**: `rate`, `increase`, and `delta` use the same boundary
extrapolation algorithm as Prometheus — the sampled interval is extended toward
the range boundaries when the gap is within 110% of the average sample interval.

### Over-time aggregations

All take a range vector and return an instant vector.

| Function | Description |
|---|---|
| `avg_over_time(v[d])` | Average of samples in window |
| `sum_over_time(v[d])` | Sum |
| `min_over_time(v[d])` | Minimum |
| `max_over_time(v[d])` | Maximum |
| `count_over_time(v[d])` | Count of samples |
| `last_over_time(v[d])` | Most recent sample |
| `present_over_time(v[d])` | 1 if any sample exists |
| `stddev_over_time(v[d])` | Standard deviation |
| `stdvar_over_time(v[d])` | Variance |
| `mad_over_time(v[d])` | Median absolute deviation |
| `quantile_over_time(φ, v[d])` | φ-quantile of samples |

### Histogram functions

| Function | Description |
|---|---|
| `histogram_quantile(φ, v)` | φ-quantile from classic (bucket-based) or native histograms |
| `histogram_avg(v)` | Average from native histograms |
| `histogram_count(v)` | Observation count from native histograms |
| `histogram_sum(v)` | Sum of observations from native histograms |
| `histogram_stddev(v)` | Standard deviation from native histograms |
| `histogram_stdvar(v)` | Variance from native histograms |
| `histogram_fraction(lower, upper, v)` | Fraction of observations in `(lower, upper]` from native histograms |

### Regression and prediction

| Function | Description |
|---|---|
| `deriv(v[d])` | Estimated per-second derivative by linear regression |
| `predict_linear(v[d], t)` | Predicted value `t` seconds from now using linear regression |
| `double_exponential_smoothing(v[d], sf, tf)` | Double exponential smoothing; `sf` = smoothing factor, `tf` = trend factor; also callable as `holt_winters` |

### Math functions

| Function | Description |
|---|---|
| `abs(v)` | Absolute value |
| `ceil(v)` | Ceiling |
| `floor(v)` | Floor |
| `round(v)` | Round to nearest integer |
| `round(v, to_nearest)` | Round to nearest multiple of `to_nearest` |
| `sqrt(v)` | Square root |
| `exp(v)` | e^v |
| `ln(v)` | Natural logarithm |
| `log2(v)` | Base-2 logarithm |
| `log10(v)` | Base-10 logarithm |
| `sgn(v)` | Sign (−1, 0, or 1) |
| `clamp(v, min, max)` | Clamp value to `[min, max]` |
| `clamp_min(v, min)` | Lower-clamp |
| `clamp_max(v, max)` | Upper-clamp |

### Trigonometry

| Function | Function |
|---|---|
| `cos(v)` | `acos(v)` |
| `cosh(v)` | `acosh(v)` |
| `sin(v)` | `asin(v)` |
| `sinh(v)` | `asinh(v)` |
| `tan(v)` | `atan(v)` |
| `tanh(v)` | `atanh(v)` |
| `deg(v)` — radians to degrees | `rad(v)` — degrees to radians |
| `pi()` — π as a scalar | |

### Date and time

When called with no argument these functions use the eval-time timestamp.
When called with an instant vector they use each sample's timestamp.

| Function | Description |
|---|---|
| `time()` | Current evaluation time in seconds since epoch (scalar) |
| `timestamp(v)` | Timestamp of each sample in seconds since epoch |
| `minute(v?)` | Minute of the hour (0–59) |
| `hour(v?)` | Hour of the day (0–23) |
| `day_of_week(v?)` | Day of the week (0=Sunday–6=Saturday) |
| `day_of_month(v?)` | Day of the month (1–31) |
| `day_of_year(v?)` | Day of the year (1–366) |
| `days_in_month(v?)` | Number of days in the month (28–31) |
| `month(v?)` | Month (1–12) |
| `year(v?)` | Year |

### Label manipulation

| Function | Description |
|---|---|
| `label_replace(v, dst, repl, src, regex)` | Rewrite label `src` into `dst` using a capture-aware `regex` and `repl` |
| `label_join(v, dst, sep, src1, src2, ...)` | Concatenate source labels into `dst` with `sep` as separator |
| `drop_common_labels(v)` | Remove labels that are identical across all series in the vector |

### Type coercion

| Function | Description |
|---|---|
| `scalar(v)` | Convert a single-element instant vector to a scalar; `NaN` if more than one element |
| `vector(s)` | Convert a scalar to a single-element instant vector with no labels |

### Sorting

| Function | Description |
|---|---|
| `sort(v)` | Sort by value ascending |
| `sort_desc(v)` | Sort by value descending |
| `sort_by_label(v, l1, ...)` | Sort by the specified label names, ascending |
| `sort_by_label_desc(v, l1, ...)` | Sort by the specified label names, descending |

### Absence detection

| Function | Description |
|---|---|
| `absent(v)` | Returns `{} 1` when the instant vector is empty; nothing otherwise |
| `absent_over_time(v[d])` | Returns `{} 1` when the range vector is empty; nothing otherwise |

### Miscellaneous

| Function | Description |
|---|---|
| `info(v)` | Experimental: fetches info-metric labels and merges them into each series |
| `count_scalar(v)` | Returns the element count of a vector as a scalar |

---

## Supported features vs. standard PromQL

| Feature | Supported |
|---|---|
| Instant and range queries | Yes |
| All arithmetic and set operators | Yes |
| `bool` comparison modifier | Yes |
| Vector matching (`on` / `ignoring` / `group_left` / `group_right`) | Yes |
| `offset` modifier | Yes |
| `@` modifier with literal timestamp, `start()`, `end()` | Yes |
| Subqueries `expr[range:step]` | Yes |
| Native histograms | Yes (float samples only for most functions; `rate` and `increase` supported) |
| Stale NaN markers (Prometheus compatibility) | Yes |
| `limitk` / `limit_ratio` aggregations (VictoriaMetrics extension) | Yes |
| `mad_over_time` | Yes |
| `double_exponential_smoothing` / `holt_winters` | Yes |
| `sort_by_label` / `sort_by_label_desc` | Yes |
| `info` | Yes (experimental) |
| UTF-8 / non-ASCII metric names | No — identifiers are ASCII only |
| Backtick string literals | No |

---

## Error types

```rust
pub enum PromqlError {
    Parse(String),                                  // invalid syntax
    UnexpectedToken { expected: String, found: String },
    UnknownFunction(String),                        // unrecognised function name
    ArgumentCount { func, expected, got },          // wrong arity
    Type(String),                                   // type mismatch at evaluation
    Eval(String),                                   // runtime evaluation error
    Regex(String),                                  // invalid regex in matcher
    Storage(TsinkError),                            // underlying storage error
}
```

`PromqlError::Storage` is constructed automatically from `TsinkError` via a
`From` impl, so storage errors surface transparently through the query result.

---

## Examples

```rust
use std::sync::Arc;
use tsink::{StorageBuilder, TimestampPrecision};
use tsink::promql::Engine;

let storage = Arc::new(
    StorageBuilder::new()
        .with_timestamp_precision(TimestampPrecision::Seconds)
        .build()?
);
let engine = Engine::with_precision(Arc::clone(&storage), TimestampPrecision::Seconds);

// Instant queries
let v = engine.instant_query("up", 1_700_000_000)?;
let v = engine.instant_query(r#"http_requests_total{method="GET"}"#, now)?;
let v = engine.instant_query("rate(http_requests_total[5m])", now)?;
let v = engine.instant_query("sum by (job) (rate(errors_total[1m]))", now)?;

// Range query (returns PromqlValue::RangeVector)
let v = engine.range_query(
    "rate(http_requests_total[5m])",
    start,   // inclusive
    end,     // inclusive
    step,    // interval between evaluation points
)?;

// Parse only (no storage required)
let expr = tsink::promql::parse("histogram_quantile(0.99, rate(latency_seconds_bucket[5m]))")?;
```