# PromQL implementation
tsink ships a native PromQL parser and evaluator with no external query layer.
The implementation lives entirely in `src/promql/` and is exposed through the
public `tsink::promql` module.
---
## Architecture
The PromQL pipeline has three stages:
```
query string → Lexer → Parser → Evaluator → PromqlValue
```
| Lexer | `src/promql/lexer.rs` | Tokenise raw input into a flat `Vec<Token>` |
| Parser | `src/promql/parser.rs` | Turn tokens into an `Expr` AST (Pratt / precedence-climbing) |
| Evaluator | `src/promql/eval/` | Walk the AST and resolve values against the storage engine |
| Types | `src/promql/types.rs` | `PromqlValue`, `Sample`, `Series`, histogram helpers |
| Errors | `src/promql/error.rs` | `PromqlError` enum |
---
## Public API
### Parsing
```rust
use tsink::promql::{parse, ast::Expr};
let expr: Expr = parse("rate(http_requests_total[5m])")?;
```
### Query engine
```rust
use std::sync::Arc;
use tsink::promql::{Engine, PromqlValue};
use tsink::{Storage, TimestampPrecision};
// Build an engine from any Arc<dyn Storage>.
let engine = Engine::with_precision(storage, TimestampPrecision::Milliseconds);
// Instant query — evaluate at a single timestamp.
let result: PromqlValue = engine.instant_query("up", eval_time)?;
// Range query — evaluate over [start, end] at each step.
let result: PromqlValue = engine.range_query("rate(errors_total[1m])", start, end, step)?;
```
`Engine::new` defaults to `TimestampPrecision::Nanoseconds`.
Both methods parse the expression internally before evaluation.
---
## Value types
`PromqlValue` mirrors the four PromQL expression result types.
| `Scalar(f64, i64)` | A single float and its evaluation timestamp | Literals, `scalar()`, arithmetic on two scalars |
| `InstantVector(Vec<Sample>)` | Zero or more labelled samples at one timestamp | Vector selectors, most functions |
| `RangeVector(Vec<Series>)` | Labelled time series with multiple samples | Matrix selectors, range queries |
| `String(String, i64)` | A string value and its evaluation timestamp | String literals |
### `Sample`
```rust
pub struct Sample {
pub metric: String,
pub labels: Vec<Label>,
pub timestamp: i64,
pub value: f64,
pub histogram: Option<Box<NativeHistogram>>,
}
```
### `Series`
```rust
pub struct Series {
pub metric: String,
pub labels: Vec<Label>,
pub samples: Vec<(i64, f64)>, // (timestamp, float value)
pub histograms: Vec<(i64, Box<NativeHistogram>)>,
}
```
---
## Lexer
The lexer is a single-pass byte scanner. It produces all tokens in one call
(`Lexer::new(input).tokenize()`), returning `Vec<Token>` or a `PromqlError::Parse`.
### Comments
`#` starts a line comment; everything until the next newline is discarded.
### Identifiers
Identifiers follow the usual `[a-zA-Z_][a-zA-Z0-9_]*` alphabet. Colons (`:`)
are also accepted inside identifiers to support recording-rule naming
conventions such as `job:http_requests:rate5m`.
### Keywords (case-insensitive)
`by`, `without`, `offset`, `bool`, `and`, `or`, `unless`, `on`, `ignoring`,
`group_left`, `group_right`, `atan2`, `inf`, `nan`
### Duration literals
A duration is a sequence of one or more `<integer><unit>` segments.
| `ms` | milliseconds |
| `s` | seconds |
| `m` | minutes |
| `h` | hours |
| `d` | days (24 h) |
| `w` | weeks (7 d) |
| `y` | years (365 d) |
Segments can be combined: `1h30m`, `5m30s`, `2d12h`.
Durations are stored internally as milliseconds (`i64`).
### String literals
Double-quoted strings with `\"`, `\\`, `\n`, `\r`, `\t` escape sequences.
### Number literals
Decimal integers and floats. The special identifiers `inf` and `nan` are
recognised as numeric tokens equivalent to `f64::INFINITY` and `f64::NAN`.
---
## Parser
The parser implements precedence-climbing (Pratt) parsing for expressions.
The entry point is `parser::parse(input)`.
### Expression grammar (summary)
```
expr := unary ( binary_op modifiers expr )*
| '(' expr ')'
| '{' matchers '}'
| ident [ aggregation | call | vector_selector ]
postfix := '[' duration (':' duration?)? ']' -- matrix selector or subquery
| 'offset' signed_duration
| '@' (number | 'start()' | 'end()')
```
### Operator precedence
Lower number = binds tighter.
| 1 (highest) | `^` (right-associative) |
| 2 | `*`, `/`, `%`, `atan2` |
| 3 | `+`, `-` |
| 4 | `==`, `!=`, `<`, `>`, `<=`, `>=` |
| 5 | `and`, `unless` |
| 6 (lowest) | `or` |
### Label matchers
```
{label="value", label2!="value", label3=~"regex", label4!~"regex"}
```
| `=` | Exact equality |
| `!=` | Exact inequality |
| `=~` | Regex match (anchored) |
| `!~` | Regex non-match (anchored) |
Regex matching compiles the pattern with the standard `regex` crate.
### Vector selector
```
metric_name
metric_name{labels}
{labels}
```
### Matrix selector
```
metric_name[5m]
metric_name{label="value"}[1h]
```
### Subquery
```
expr[range:step]
expr[range:] # omit step → use query step or default (1m)
```
### @modifier and offset
```
metric @ 1700000000 # pin to Unix timestamp
metric @ start() # pin to range query start
metric @ end() # pin to range query end
metric offset 5m # shift evaluation back 5 minutes
metric[10m] offset 1h # combine range + offset
```
### Aggregations
```
sum(expr)
sum by (label1, label2) (expr)
sum without (label1) (expr)
```
The grouping clause can be placed either before or after the argument list.
### Binary operator modifiers
```
a + on(job) b # match only on "job"
a + ignoring(instance) b # ignore "instance" when matching
a * on(job) group_left b # many-to-one: keep left-side labels
a * on(job) group_right b # one-to-many: keep right-side labels
a * on(job) group_left(region) b # also copy "region" from right
a == bool b # return 0/1 instead of filtering
```
`group_left` and `group_right` cannot be combined with set operators
(`and`, `or`, `unless`).
---
## Evaluator
The evaluator is in `src/promql/eval/` and is split across several files:
| `mod.rs` | `Engine`, instant and range query entry points, prefetch cache, `@` resolution |
| `selector.rs` | Instant vector and matrix selector evaluation |
| `functions.rs` | All built-in function implementations |
| `aggregation.rs` | Aggregation operator implementations |
| `binary.rs` | Binary operator evaluation and vector matching |
| `subquery.rs` | Subquery evaluation |
| `time.rs` | Duration/timestamp utilities (`duration_to_units`, `step_times`) |
### Default parameters
| Lookback delta | 5 minutes |
| Subquery step | 1 minute |
The lookback delta controls how far back an instant vector selector looks for
the most recent sample.
### Range query prefetch
For range queries the engine checks whether any selector uses a dynamic `@`
modifier or is wrapped in a subquery. When no dynamic time is involved it
pre-fetches all required metric data from storage in a single pass before
iterating over steps. This significantly reduces storage I/O for wide time
ranges. Subqueries and `@` modifiers disable prefetch for accuracy.
---
## Aggregation operators
All aggregation operators accept an optional `by (labels)` or
`without (labels)` grouping clause.
| `sum` | — | Sum of values |
| `avg` | — | Average of values |
| `min` | — | Minimum value |
| `max` | — | Maximum value |
| `count` | — | Number of series |
| `group` | — | 1 for each group (existence aggregation) |
| `stddev` | — | Population standard deviation |
| `stdvar` | — | Population variance |
| `count_values` | `label` (string) | Count series per distinct value; adds a `label` dimension |
| `quantile` | `φ` (scalar) | φ-quantile across the group |
| `topk` | `k` (scalar) | Top k series by value |
| `bottomk` | `k` (scalar) | Bottom k series by value |
| `limitk` | `k` (scalar) | Deterministically select k series (hash-stable) |
| `limit_ratio` | `ratio` (scalar) | Deterministically select a ratio of series |
`sum`, `avg`, `min`, `max`, `count`, `group`, `stddev`, and `stdvar` support
native histograms for `sum`. `count_values`, `quantile`, `topk`, and
`bottomk` require float samples.
---
## Functions
### Counter and gauge range functions
| `rate(v[d])` | range vector | Per-second rate of counter increase (extrapolated to fit `d`) |
| `irate(v[d])` | range vector | Per-second instant rate using the last two samples |
| `increase(v[d])` | range vector | Total counter increase over `d` (extrapolated) |
| `delta(v[d])` | range vector | Value change over `d` (extrapolated, for gauges) |
| `idelta(v[d])` | range vector | Instant delta between the last two samples |
| `changes(v[d])` | range vector | Number of value changes within `d` |
| `resets(v[d])` | range vector | Number of counter resets within `d` |
`rate` and `increase` support native histogram series and produce a histogram
result. The other range functions require float samples.
**Extrapolation**: `rate`, `increase`, and `delta` use the same boundary
extrapolation algorithm as Prometheus — the sampled interval is extended toward
the range boundaries when the gap is within 110% of the average sample interval.
### Over-time aggregations
All take a range vector and return an instant vector.
| `avg_over_time(v[d])` | Average of samples in window |
| `sum_over_time(v[d])` | Sum |
| `min_over_time(v[d])` | Minimum |
| `max_over_time(v[d])` | Maximum |
| `count_over_time(v[d])` | Count of samples |
| `last_over_time(v[d])` | Most recent sample |
| `present_over_time(v[d])` | 1 if any sample exists |
| `stddev_over_time(v[d])` | Standard deviation |
| `stdvar_over_time(v[d])` | Variance |
| `mad_over_time(v[d])` | Median absolute deviation |
| `quantile_over_time(φ, v[d])` | φ-quantile of samples |
### Histogram functions
| `histogram_quantile(φ, v)` | φ-quantile from classic (bucket-based) or native histograms |
| `histogram_avg(v)` | Average from native histograms |
| `histogram_count(v)` | Observation count from native histograms |
| `histogram_sum(v)` | Sum of observations from native histograms |
| `histogram_stddev(v)` | Standard deviation from native histograms |
| `histogram_stdvar(v)` | Variance from native histograms |
| `histogram_fraction(lower, upper, v)` | Fraction of observations in `(lower, upper]` from native histograms |
### Regression and prediction
| `deriv(v[d])` | Estimated per-second derivative by linear regression |
| `predict_linear(v[d], t)` | Predicted value `t` seconds from now using linear regression |
| `double_exponential_smoothing(v[d], sf, tf)` | Double exponential smoothing; `sf` = smoothing factor, `tf` = trend factor; also callable as `holt_winters` |
### Math functions
| `abs(v)` | Absolute value |
| `ceil(v)` | Ceiling |
| `floor(v)` | Floor |
| `round(v)` | Round to nearest integer |
| `round(v, to_nearest)` | Round to nearest multiple of `to_nearest` |
| `sqrt(v)` | Square root |
| `exp(v)` | e^v |
| `ln(v)` | Natural logarithm |
| `log2(v)` | Base-2 logarithm |
| `log10(v)` | Base-10 logarithm |
| `sgn(v)` | Sign (−1, 0, or 1) |
| `clamp(v, min, max)` | Clamp value to `[min, max]` |
| `clamp_min(v, min)` | Lower-clamp |
| `clamp_max(v, max)` | Upper-clamp |
### Trigonometry
| `cos(v)` | `acos(v)` |
| `cosh(v)` | `acosh(v)` |
| `sin(v)` | `asin(v)` |
| `sinh(v)` | `asinh(v)` |
| `tan(v)` | `atan(v)` |
| `tanh(v)` | `atanh(v)` |
| `deg(v)` — radians to degrees | `rad(v)` — degrees to radians |
| `pi()` — π as a scalar | |
### Date and time
When called with no argument these functions use the eval-time timestamp.
When called with an instant vector they use each sample's timestamp.
| `time()` | Current evaluation time in seconds since epoch (scalar) |
| `timestamp(v)` | Timestamp of each sample in seconds since epoch |
| `minute(v?)` | Minute of the hour (0–59) |
| `hour(v?)` | Hour of the day (0–23) |
| `day_of_week(v?)` | Day of the week (0=Sunday–6=Saturday) |
| `day_of_month(v?)` | Day of the month (1–31) |
| `day_of_year(v?)` | Day of the year (1–366) |
| `days_in_month(v?)` | Number of days in the month (28–31) |
| `month(v?)` | Month (1–12) |
| `year(v?)` | Year |
### Label manipulation
| `label_replace(v, dst, repl, src, regex)` | Rewrite label `src` into `dst` using a capture-aware `regex` and `repl` |
| `label_join(v, dst, sep, src1, src2, ...)` | Concatenate source labels into `dst` with `sep` as separator |
| `drop_common_labels(v)` | Remove labels that are identical across all series in the vector |
### Type coercion
| `scalar(v)` | Convert a single-element instant vector to a scalar; `NaN` if more than one element |
| `vector(s)` | Convert a scalar to a single-element instant vector with no labels |
### Sorting
| `sort(v)` | Sort by value ascending |
| `sort_desc(v)` | Sort by value descending |
| `sort_by_label(v, l1, ...)` | Sort by the specified label names, ascending |
| `sort_by_label_desc(v, l1, ...)` | Sort by the specified label names, descending |
### Absence detection
| `absent(v)` | Returns `{} 1` when the instant vector is empty; nothing otherwise |
| `absent_over_time(v[d])` | Returns `{} 1` when the range vector is empty; nothing otherwise |
### Miscellaneous
| `info(v)` | Experimental: fetches info-metric labels and merges them into each series |
| `count_scalar(v)` | Returns the element count of a vector as a scalar |
---
## Supported features vs. standard PromQL
| Instant and range queries | Yes |
| All arithmetic and set operators | Yes |
| `bool` comparison modifier | Yes |
| Vector matching (`on` / `ignoring` / `group_left` / `group_right`) | Yes |
| `offset` modifier | Yes |
| `@` modifier with literal timestamp, `start()`, `end()` | Yes |
| Subqueries `expr[range:step]` | Yes |
| Native histograms | Yes (float samples only for most functions; `rate` and `increase` supported) |
| Stale NaN markers (Prometheus compatibility) | Yes |
| `limitk` / `limit_ratio` aggregations (VictoriaMetrics extension) | Yes |
| `mad_over_time` | Yes |
| `double_exponential_smoothing` / `holt_winters` | Yes |
| `sort_by_label` / `sort_by_label_desc` | Yes |
| `info` | Yes (experimental) |
| UTF-8 / non-ASCII metric names | No — identifiers are ASCII only |
| Backtick string literals | No |
---
## Error types
```rust
pub enum PromqlError {
Parse(String), // invalid syntax
UnexpectedToken { expected: String, found: String },
UnknownFunction(String), // unrecognised function name
ArgumentCount { func, expected, got }, // wrong arity
Type(String), // type mismatch at evaluation
Eval(String), // runtime evaluation error
Regex(String), // invalid regex in matcher
Storage(TsinkError), // underlying storage error
}
```
`PromqlError::Storage` is constructed automatically from `TsinkError` via a
`From` impl, so storage errors surface transparently through the query result.
---
## Examples
```rust
use std::sync::Arc;
use tsink::{StorageBuilder, TimestampPrecision};
use tsink::promql::Engine;
let storage = Arc::new(
StorageBuilder::new()
.with_timestamp_precision(TimestampPrecision::Seconds)
.build()?
);
let engine = Engine::with_precision(Arc::clone(&storage), TimestampPrecision::Seconds);
// Instant queries
let v = engine.instant_query("up", 1_700_000_000)?;
let v = engine.instant_query(r#"http_requests_total{method="GET"}"#, now)?;
let v = engine.instant_query("rate(http_requests_total[5m])", now)?;
let v = engine.instant_query("sum by (job) (rate(errors_total[1m]))", now)?;
// Range query (returns PromqlValue::RangeVector)
let v = engine.range_query(
"rate(http_requests_total[5m])",
start, // inclusive
end, // inclusive
step, // interval between evaluation points
)?;
// Parse only (no storage required)
let expr = tsink::promql::parse("histogram_quantile(0.99, rate(latency_seconds_bucket[5m]))")?;
```