tsink 0.10.2

A lightweight embedded time-series database with a straightforward API
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
# Data Types & Native Histograms

This document describes every type that can be stored in a tsink data point, how mixed types are handled, how timestamps are represented, the limits that apply to metrics and labels, and how custom user types are embedded inside the `bytes` lane.

---

## Table of Contents

1. [Core Data Model]#core-data-model
2. [Value Types]#value-types
   - [float64]#float64
   - [int64]#int64
   - [uint64]#uint64
   - [bool]#bool
   - [bytes]#bytes
   - [string]#string
   - [Native Histogram]#native-histogram
3. [Value Lanes]#value-lanes
4. [Encoding Codecs by Type]#encoding-codecs-by-type
5. [Timestamp Precision]#timestamp-precision
6. [Metrics and Labels]#metrics-and-labels
7. [Type Coercions]#type-coercions
8. [Custom Types via the Codec Trait]#custom-types-via-the-codec-trait
9. [Aggregation]#aggregation
10. [Python Bindings Type Mapping]#python-bindings-type-mapping

---

## Core Data Model

The smallest unit of storage is a **`DataPoint`**, which pairs a typed value with an `i64` timestamp. Multiple data points for the same named metric are grouped into a **`Row`**:

```rust
pub struct DataPoint {
    pub value: Value,
    pub timestamp: i64,
}

pub struct Row {
    // metric name
    // labels (key-value pairs)
    // data_point
}
```

A **series** is identified by a metric name together with the full, sorted set of label key-value pairs. Two rows with the same metric name but different labels belong to different series.

---

## Value Types

All sample payloads are represented by the `Value` enum:

```rust
pub enum Value {
    F64(f64),
    I64(i64),
    U64(u64),
    Bool(bool),
    Bytes(Vec<u8>),
    String(String),
    Histogram(Box<NativeHistogram>),
}
```

Every variant in a chunk must belong to the same **value family** (see [Value Lanes](#value-lanes)). Mixing different numeric variants, or mixing a numeric variant with a blob variant, in the same write batch returns a `ValueTypeMismatch` error.

### float64

`Value::F64(f64)` is the default numeric type and the only type directly consumed by the PromQL evaluator. Use it for gauges, counters, latency histograms expressed as raw floats, and any metric that needs PromQL processing.

NaN is a valid stored value and is preserved across encoding and decoding. NaN equality is defined as `NaN == NaN` for storage index purposes (deduplication), which diverges from IEEE 754.

### int64

`Value::I64(i64)` covers signed counters, monotonic event IDs, and any domain where integer semantics matter (no rounding at extreme values). The codec is ZigZag-encoded delta bitpack, which is efficient for slowly changing integers.

When queried via `as_f64()`, the conversion succeeds only if the integer can be represented exactly as a 64-bit float (i.e., the absolute value fits in 53 bits). Large values like `i64::MAX` return `None` rather than silently rounding.

### uint64

`Value::U64(u64)` covers unsigned accumulators and bitmask values. The codec is delta-bitpack (no ZigZag needed for non-negative deltas). The same exactness rule as `i64` applies when converting to `f64`.

### bool

`Value::Bool(bool)` is stored with a bitpack codec — one bit per sample. It is not coercible to `f64`; use a `0.0`/`1.0` float if you need PromQL arithmetic on boolean flags.

### bytes

`Value::Bytes(Vec<u8>)` is the escape hatch for any payload that does not fit into a numeric type — serialized Protobuf, MessagePack, JSON, or any custom binary encoding. No interpretation is done on the byte payload by the engine.

`string` and `bytes` share the same blob value lane and the same on-disk codec (bytes delta block). From the engine's perspective they are interchangeable; the distinction is only at the API boundary.

Custom Rust types can be embedded as `bytes` via the `Codec` trait — see [Custom Types via the Codec Trait](#custom-types-via-the-codec-trait).

### string

`Value::String(String)` stores UTF-8 text. Like `bytes`, it is stored on the blob lane using the bytes delta block codec and does not participate in numeric aggregation.

### Native Histogram

`Value::Histogram(Box<NativeHistogram>)` stores a complete Prometheus-compatible native histogram sample. Histograms use the blob lane and are serialized with the bytes delta block codec.

#### NativeHistogram Structure

```rust
pub struct NativeHistogram {
    // Total sample count. Either an integer or a float (for weighted observations).
    pub count: Option<HistogramCount>,
    // Sum of all observed values.
    pub sum: f64,
    // Exponential bucket schema (-4..=8, or schema 0 for custom buckets).
    pub schema: i32,
    // Half-width of the zero bucket.
    pub zero_threshold: f64,
    // Count of samples in the zero bucket.
    pub zero_count: Option<HistogramCount>,

    // Negative-side sparse buckets.
    pub negative_spans: Vec<HistogramBucketSpan>,
    // Delta-encoded bucket counts (integer mode).
    pub negative_deltas: Vec<i64>,
    // Absolute bucket counts (float mode).
    pub negative_counts: Vec<f64>,

    // Positive-side sparse buckets.
    pub positive_spans: Vec<HistogramBucketSpan>,
    pub positive_deltas: Vec<i64>,
    pub positive_counts: Vec<f64>,

    // Reset hint communicated to downstream consumers.
    pub reset_hint: HistogramResetHint,
    // Explicit bucket boundaries for custom schema (schema = -53).
    pub custom_values: Vec<f64>,
}
```

**`HistogramCount`** — count and zero count can be either integer or float to accommodate both classical integer counting and weighted/scaled histograms:

```rust
pub enum HistogramCount {
    Int(u64),
    Float(f64),
}
```

**`HistogramBucketSpan`** — describes a contiguous run of populated buckets in the sparse representation:

```rust
pub struct HistogramBucketSpan {
    pub offset: i32,  // gap in bucket index from the previous span's end
    pub length: u32,  // number of consecutive populated buckets
}
```

**`HistogramResetHint`** — indicates whether the histogram was reset before this sample:

| Variant | Meaning |
|---|---|
| `Unknown` | Reset status is not known |
| `Yes` | A reset definitely occurred |
| `No` | No reset occurred |
| `Gauge` | This is a gauge histogram (not accumulated) |

#### Sparse bucket encoding

Buckets are stored sparsely as a sequence of (span, deltas/counts) pairs. The `negative_spans`/`positive_spans` arrays describe which bucket slots are populated; `negative_deltas`/`positive_deltas` give the delta-encoded integer counts for each slot; `negative_counts`/`positive_counts` give the absolute float counts (used instead of deltas when any count is non-integer).

Either the `*_deltas` fields or the `*_counts` fields are populated for a given sample — not both. When float counts are present, `*_deltas` should be empty, and vice versa.

#### NaN semantics

For purposes of stored equality (used by deduplication and the WAL), NaN values inside a histogram are considered equal to other NaN values of the same sign.

---

## Value Lanes

Internally, every series is assigned to one of two mutually exclusive **value lanes** based on the type of its first ingested sample:

| Lane | Types | On-disk directory |
|---|---|---|
| `Numeric` | `f64`, `i64`, `u64`, `bool` | `lane_numeric/` |
| `Blob` | `bytes`, `string`, `NativeHistogram` | `lane_blob/` |

The lane is derived at ingest time and persisted in the series registry. Once a series is assigned to a lane, all subsequent writes must use a compatible value type. Writing a numeric type to a blob-lane series, or vice versa, returns a `ValueTypeMismatch` error.

Keeping numeric and blob data physically separate allows their compaction jobs to run independently and avoids mixing integer/float codecs with variable-length blob codecs in the same segment file.

---

## Encoding Codecs by Type

tsink selects the most compact codec automatically for each chunk at flush time. The codec choice is stored in the chunk header and used verbatim during reads — no re-encoding occurs on read.

**Timestamp codecs** are chosen independently from value codecs:

| Codec | When selected |
|---|---|
| `FixedStepRle` | All timestamps are evenly spaced (constant scrape interval) |
| `DeltaOfDeltaBitpack` | Timestamps have a slowly drifting interval |
| `DeltaVarint` | Irregular timestamps; always applicable as a fallback |

**Value codecs** by type:

| Type | Codec | Notes |
|---|---|---|
| `f64` | Gorilla XOR | Facebook Gorilla XOR-based float compression |
| `i64` | ZigZag delta bitpack | Maps signed deltas to unsigned, then bitpacks |
| `u64` | Delta bitpack | Non-negative deltas bitpacked directly |
| `bool` | Bit-pack | 1 bit per sample |
| `bytes` / `string` / `NativeHistogram` | Bytes delta block | Variable-length records with delta compression |
| Any type | Constant RLE | Applied when all values in a chunk are identical; takes priority over type-specific codecs |

The engine evaluates all applicable candidates for a given chunk and selects the one producing the smallest payload.

---

## Timestamp Precision

The `TimestampPrecision` configuration setting tells the engine how to interpret the `i64` timestamp in each `DataPoint`:

| Variant | Unit | Maximum date |
|---|---|---|
| `Nanoseconds` | 1 ns | ~2262 |
| `Microseconds` | 1 µs | ~294246 |
| `Milliseconds` | 1 ms | ~292278994 |
| `Seconds` | 1 s | ~292277026596 |

Precision is configured once on `StorageBuilder` and applies to all timestamps written to that storage instance. Mixing precisions within a single instance is not supported — timestamps from different precisions are not automatically renormalized.

The `DataPoint::new` constructor accepts the timestamp as a raw `i64`; callers are responsible for ensuring the value matches the configured precision.

---

## Metrics and Labels

### Metric name

A metric name is an arbitrary UTF-8 string. The only restriction is length:

| Limit | Value |
|---|---|
| Maximum metric name length | 65 535 bytes (`u16::MAX`) |

An empty metric name is rejected at ingest time.

### Labels

A label is a UTF-8 key-value pair. Names must be non-empty; values may be empty. Length limits:

| Field | Limit |
|---|---|
| Label name | 256 bytes |
| Label value | 16 384 bytes (16 KiB) |

Duplicate label names within a single row are rejected. Labels are normalized to lexicographic order by name before computing the series identity, so `{a="1", b="2"}` and `{b="2", a="1"}` refer to the same series.

### Series identity

The engine assigns each unique (metric, sorted-labels) combination a 64-bit `SeriesId`. The identity is computed as a stable FNV-1a hash over a canonical binary encoding of the metric name and sorted label pairs. This hash is used internally; callers always identify series by metric name and labels.

---

## Type Coercions

The `Value` type exposes explicit, lossless conversion accessors:

| Method | Returns | Applies to |
|---|---|---|
| `as_f64()` | `Option<f64>` | `F64`, `I64` (if fits in 53 bits), `U64` (if fits in 53 bits) |
| `as_i64()` | `Option<i64>` | `I64`, `U64` (if fits in `i64`) |
| `as_u64()` | `Option<u64>` | `U64`, `I64` (if non-negative and fits in `u64`) |
| `as_bool()` | `Option<bool>` | `Bool` only |
| `as_bytes()` | `Option<&[u8]>` | `Bytes` only |
| `as_str()` | `Option<&str>` | `String` only |
| `as_histogram()` | `Option<&NativeHistogram>` | `Histogram` only |

**`as_f64()` precision note**: an `i64` or `u64` value is only converted if its absolute value can be represented exactly with 53 mantissa bits (the precision of `f64`). Values like `i64::MAX` (63 significant bits) return `None`. This prevents silent precision loss in numeric pipelines.

The PromQL evaluator calls `as_f64()` on every sample. Series whose values cannot be losslessly represented as `f64` — including `bool`, `bytes`, `string`, and oversized integers — are excluded from PromQL evaluation.

---

## Custom Types via the Codec Trait

Any Rust type can be stored inside the `bytes` lane by implementing the `Codec` trait:

```rust
pub trait Codec: Send + Sync {
    type Item: Clone + Send + Sync + 'static;

    fn encode(&self, value: &Self::Item) -> Result<Vec<u8>>;
    fn decode(&self, bytes: &[u8]) -> Result<Self::Item>;
}
```

Encoding and decoding a point:

```rust
// Write
let encoded = Value::encode_with(&my_value, &my_codec)?;  // → Value::Bytes(...)

// Read
let decoded: MyType = point.value.decode_with(&my_codec)?;
```

### Custom aggregation

To aggregate custom-typed series at query time, pair a `Codec` with an `Aggregator`:

```rust
pub trait Aggregator<T>: Send + Sync {
    fn aggregate(&self, values: &[T]) -> Option<T>;
}
```

The `CodecAggregator<C, A>` adapter bridges a `Codec` and a typed `Aggregator` into the `BytesAggregation` interface expected by `QueryOptions::custom_aggregation`:

```rust
let agg = Arc::new(CodecAggregator::new(MyCodec, MyAggregator));

let options = QueryOptions::new(start, end)
    .with_custom_aggregation(agg);
```

`CodecAggregator` decodes each `Value::Bytes` point using the codec, passes all decoded values to the aggregator, encodes the result back to bytes, and attaches the bucket start timestamp to the aggregate point.

---

## Aggregation

When using the built-in `Aggregation` enum with `QueryOptions`, the engine aggregates numeric values (`f64`, and integers coercible to `f64`) over the query time range or per-bucket when downsampling is enabled:

| Variant | Operation |
|---|---|
| `None` | No aggregation — raw samples returned |
| `Sum` | Sum of all values |
| `Min` | Minimum value |
| `Max` | Maximum value |
| `Avg` | Mean |
| `First` | Earliest sample in the window |
| `Last` | Latest sample in the window |
| `Count` | Number of samples |
| `Median` | Median (50th percentile) |
| `Range` | `max - min` |
| `Variance` | Population variance |
| `StdDev` | Population standard deviation |

Built-in aggregation operates on the `f64` projection of a value (`Value::as_f64()`). `Bytes`, `String`, and `NativeHistogram` series are not aggregated by the built-in variants; use `custom_aggregation` for those types.

---

## Python Bindings Type Mapping

The UniFFI Python bindings expose the same model names as the Rust API. The mapping is direct:

| Python type | Rust equivalent |
|---|---|
| `Value` (enum) | `Value` |
| `DataPoint` | `DataPoint` |
| `Row` | `Row` |
| `Label` | `Label` |
| `NativeHistogram` | `NativeHistogram` |
| `HistogramBucketSpan` | `HistogramBucketSpan` |
| `HistogramCount` (enum) | `HistogramCount` |
| `HistogramResetHint` (enum) | `HistogramResetHint` |

`Value` is a tagged-union enum with named fields per variant:

```python
from tsink import Value, DataPoint, NativeHistogram, HistogramBucketSpan

# float64
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.F64(v=1.5))

# int64
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.I64(v=-7))

# uint64
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.U64(v=42))

# bool
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.Bool(v=True))

# bytes
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.Bytes(v=b"\x01\x02"))

# string
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.Str(v="hello"))

# native histogram
hist = NativeHistogram(
    count=HistogramCount.Int(v=10),
    sum=5.5,
    schema=1,
    zero_threshold=0.0,
    zero_count=HistogramCount.Int(v=0),
    negative_spans=[],
    negative_deltas=[],
    negative_counts=[],
    positive_spans=[HistogramBucketSpan(offset=0, length=2)],
    positive_deltas=[3, 2],
    positive_counts=[],
    reset_hint=HistogramResetHint.NO,
    custom_values=[],
)
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.Histogram(v=hist))
```

Note that in the Python bindings the `string` variant is named `Str` (not `String`) to avoid conflicting with the Python built-in.