narrowdb 0.3.2

A lightweight columnar database engine for log and time-series data
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
# NarrowDB Documentation

## Table of Contents

- [Installation]#installation
- [CLI Usage]#cli-usage
- [Library Usage]#library-usage
- [TCP Server]#tcp-server
- [SQL Reference]#sql-reference
- [Data Types]#data-types
- [Columnar Batch Ingestion]#columnar-batch-ingestion
- [Configuration]#configuration
- [Architecture]#architecture

---

## Installation

### From crates.io

```toml
[dependencies]
narrowdb = "0.3.1"
```

### From source

```bash
git clone https://github.com/lassejlv/narrowdb.git
cd narrowdb
cargo build --release
```

The binary is at `target/release/narrowdb`.

---

## CLI Usage

```bash
# Execute SQL against a database file
narrowdb exec <db-file> <sql>

# Run built-in benchmark
narrowdb bench <db-file> [rows]
```

### Examples

```bash
# Create a table
narrowdb exec logs.db "CREATE TABLE logs (ts TIMESTAMP, level TEXT, service TEXT, status INT, duration REAL);"

# Insert rows
narrowdb exec logs.db "INSERT INTO logs VALUES (1, 'info', 'api', 200, 12.0), (2, 'error', 'api', 500, 120.0);"

# Query with filters, aggregation, and ordering
narrowdb exec logs.db "SELECT service, COUNT(*) AS errors FROM logs WHERE level = 'error' GROUP BY service ORDER BY errors DESC LIMIT 5;"

# Arithmetic expressions in SELECT
narrowdb exec logs.db "SELECT duration * 1.1 AS padded FROM logs WHERE status >= 500;"

# Table-less expressions
narrowdb exec logs.db "SELECT 2 + 3 * 4;"

# Benchmark with 5 million rows
narrowdb bench logs.db 5000000
```

---

## Library Usage

```rust
use narrowdb::{NarrowDb, DbOptions, Value};

fn main() -> anyhow::Result<()> {
    let mut db = NarrowDb::open("my.db", DbOptions::default())?;

    db.execute_sql("CREATE TABLE logs (ts TIMESTAMP, level TEXT, service TEXT, status INT);")?;

    db.execute_sql("INSERT INTO logs VALUES (1, 'info', 'api', 200), (2, 'error', 'api', 500);")?;

    let results = db.execute_sql(
        "SELECT service, COUNT(*) AS total FROM logs GROUP BY service;"
    )?;

    for result in results {
        println!("Columns: {:?}", result.columns);
        for row in &result.rows {
            println!("{:?}", row);
        }
    }

    Ok(())
}
```

### Row-by-row insertion

```rust
use narrowdb::{NarrowDb, DbOptions, Value};

let mut db = NarrowDb::open("my.db", DbOptions::default())?;

db.insert_row("logs", vec![
    Value::Int64(1),
    Value::String("info".into()),
    Value::String("api".into()),
    Value::Int64(200),
])?;
```

### Streaming row ingestion

Use `insert_rows_iter` when your upstream source is naturally row-oriented but you do not want to allocate one giant `Vec<Vec<Value>>` first.

```rust
use narrowdb::{NarrowDb, DbOptions, Value};

let db = NarrowDb::open("my.db", DbOptions::default())?;

let inserted = db.insert_rows_iter(
    "logs",
    (0..1_000_000).map(|i| {
        vec![
            Value::Int64(1_700_000_000_000 + i),
            Value::String(if i % 2 == 0 { "api" } else { "worker" }.into()),
            Value::Int64(if i % 10 == 0 { 500 } else { 200 }),
        ]
    }),
)?;

assert_eq!(inserted, 1_000_000);
```

### Columnar batch insertion (high throughput)

```rust
use narrowdb::{ColumnarBatchBuilder, DbOptions, NarrowDb};

let db = NarrowDb::open("my.db", DbOptions::default())?;

let batch = ColumnarBatchBuilder::new()
    .timestamps([1, 2, 3])
    .strings(["info", "error", "info"])
    .strings(["api", "api", "worker"])
    .int64([200, 500, 200])
    .build()?;

db.insert_columnar_batch("logs", batch)?;
```

### Choosing an ingestion path

- Use `insert_row` for tiny writes and tests.
- Use `insert_rows_iter` when the source is row-oriented and you want bounded memory usage.
- Use `insert_columnar_batch` when you can naturally produce whole columns or want the best ingest throughput.
- Use `row_group_size` to tune flush behavior for your workload, but keep the batch path as the default for serious volume.

### Timestamp-heavy workflows

`TIMESTAMP` values are stored as 64-bit integers. NarrowDB does not currently provide built-in datetime functions, so the best pattern is to ingest:

- a raw event timestamp column, typically epoch milliseconds
- one or more precomputed bucket columns such as `minute_bucket`, `hour_bucket`, or `day_bucket`

Example schema:

```sql
CREATE TABLE metrics (
    ts TIMESTAMP,
    minute_bucket TIMESTAMP,
    day_bucket TIMESTAMP,
    service TEXT,
    requests INT,
    errors INT
);
```

This lets you express common rollups and retention-oriented scans with simple filters and grouping:

```sql
SELECT minute_bucket, service, SUM(errors) AS total_errors
FROM metrics
WHERE day_bucket = 1699920000000
GROUP BY minute_bucket, service
ORDER BY minute_bucket ASC, service ASC;
```

### Retention and partition-like ergonomics

NarrowDB does not currently implement built-in partitions or retention policies.

The recommended pattern today is:

- precompute a `day_bucket` or `hour_bucket` column for query pruning
- rotate tables by period when operationally convenient, for example `logs_2026_04_19`
- drop or archive whole old tables as your retention mechanism

That approach fits the current storage model better than trying to simulate in-place row eviction.

### Flushing

Data is automatically flushed to disk when the pending batch reaches `row_group_size`. To force a flush:

```rust
db.flush_table("logs")?;  // Flush one table
db.flush_all()?;           // Flush all tables
```

Pending rows are also flushed automatically before any SELECT query.

---

## TCP Server

The server crate (`crates/server`) exposes a PostgreSQL wire protocol interface.
It supports simple queries, extended queries, and parameterized prepared statements. MD5 auth should be treated as trusted-network only.

### Running

```bash
cargo run -p narrowdb-server -- ./logs.narrowdb \
    --listen 127.0.0.1:5433 \
    --query-threads 4 \
    --user narrowdb \
    --password secret
```

Or via environment variables:

```bash
NARROWDB_LISTEN=0.0.0.0:5433 \
NARROWDB_QUERY_THREADS=4 \
NARROWDB_USER=narrowdb \
NARROWDB_PASSWORD=secret \
narrowdb-server ./logs.narrowdb
```

### Connecting

Any PostgreSQL client works:

```bash
PGPASSWORD=secret psql "host=127.0.0.1 port=5433 user=narrowdb dbname=logs"
```

```sql
CREATE TABLE logs (ts TIMESTAMP, service TEXT, status INT);
INSERT INTO logs VALUES (1, 'api', 200);
SELECT * FROM logs;
```

---

## SQL Reference

### CREATE TABLE

```sql
CREATE TABLE table_name (
    column1 TYPE,
    column2 TYPE,
    ...
);
```

### ALTER TABLE

```sql
ALTER TABLE table_name RENAME TO new_table_name;
ALTER TABLE table_name RENAME COLUMN old_name TO new_name;
```

`ALTER TABLE` currently supports exactly one operation per statement.

### DROP TABLE

```sql
DROP TABLE table_name;
DROP TABLE IF EXISTS table_name;
```

### SHOW TABLES

```sql
SHOW TABLES;
```

### DESCRIBE

```sql
DESCRIBE table_name;
DESC table_name;
```

### INSERT

```sql
INSERT INTO table_name VALUES
    (val1, val2, ...),
    (val1, val2, ...);
```

### SELECT

```sql
SELECT projections
FROM table_name
[WHERE filters]
[GROUP BY columns]
[ORDER BY column [ASC|DESC], ...]
[LIMIT n];
```

### WHERE filters

Supported filter forms:

- Comparison filters: `=`, `<>`, `<`, `<=`, `>`, `>=`
- Boolean combinations: `AND`, `OR`, nested with parentheses
- Membership filters: `IN (...)`, `NOT IN (...)`
- Null filters: `IS NULL`, `IS NOT NULL`

### Projections

| Syntax | Example |
|--------|---------|
| Column reference | `SELECT service FROM logs` |
| Wildcard | `SELECT * FROM logs` |
| Aggregate function | `SELECT COUNT(*) FROM logs` |
| Arithmetic expression | `SELECT duration * 1.1 AS padded FROM logs` |
| Column alias | `SELECT service AS svc FROM logs` |

### Aggregate Functions

| Function | Description |
|----------|-------------|
| `COUNT(*)` | Count all rows |
| `COUNT(column)` | Count non-null values |
| `SUM(column)` | Sum of numeric column |
| `AVG(column)` | Average of numeric column |
| `MIN(column)` | Minimum value |
| `MAX(column)` | Maximum value |

### WHERE Filters

Filters are AND-chained column-to-literal comparisons:

```sql
WHERE status >= 500 AND level = 'error' AND ts >= 1700000000
```

| Operator | Example |
|----------|---------|
| `=` | `WHERE level = 'error'` |
| `!=`, `<>` | `WHERE level != 'info'` |
| `<`, `<=`, `>`, `>=` | `WHERE duration >= 100.0` |
| `IS NULL` | `WHERE service IS NULL` |
| `IS NOT NULL` | `WHERE service IS NOT NULL` |

### Table-less SELECT

Arithmetic without a table:

```sql
SELECT 1 + 2;
SELECT 5 * (3 - 2) AS result;
SELECT 10 % 3;
```

---

## Data Types

| SQL Names | Internal Type | Rust Equivalent |
|-----------|---------------|-----------------|
| `INT`, `INTEGER`, `BIGINT` | Int64 | `i64` |
| `REAL`, `FLOAT`, `DOUBLE` | Float64 | `f64` |
| `BOOL`, `BOOLEAN` | Bool | `bool` |
| `TEXT`, `STRING`, `VARCHAR`, `CHAR`, `JSON` | String | `String` |
| `TIMESTAMP`, `DATETIME` | Timestamp | `i64` (epoch) |

Null values are supported for all types.

---

## Columnar Batch Ingestion

For high-throughput ingestion (7M+ rows/sec), use `ColumnarBatch` instead of row-by-row insertion. Each column is passed as a typed vector:

| Column Type | Rust Type |
|-------------|-----------|
| `BatchColumn::Int64` | `Vec<i64>` |
| `BatchColumn::Float64` | `Vec<f64>` |
| `BatchColumn::Bool` | `Vec<bool>` |
| `BatchColumn::String` | `Vec<String>` |
| `BatchColumn::Timestamp` | `Vec<i64>` |

All columns in a batch must have the same number of rows. Column order must match the table schema.

---

## Configuration

### DbOptions

| Option | Default | Description |
|--------|---------|-------------|
| `row_group_size` | 16,384 | Rows per row group. Larger values improve compression and query throughput. Benchmark default is 32,768. |
| `sync_on_flush` | true | Call fsync after each row group flush. Disable for faster ingestion when durability isn't critical. |
| `query_threads` | auto | Number of query worker threads. `None` uses all available CPU cores. |

### Server Options

| Flag | Env Var | Default | Description |
|------|---------|---------|-------------|
| `--listen` | `NARROWDB_LISTEN` | `127.0.0.1:5433` | TCP listen address |
| `--user` | `NARROWDB_USER` | `narrowdb` | Authentication username |
| `--password` | `NARROWDB_PASSWORD` | `narrowdb` | Authentication password |
| `--row-group-size` | `NARROWDB_ROW_GROUP_SIZE` | `16384` | Rows per row group |
| `--sync-on-flush` | `NARROWDB_SYNC_ON_FLUSH` | `true` | Fsync on flush |
| `--query-threads` | `NARROWDB_QUERY_THREADS` | auto | Number of query worker threads |

---

## Architecture

### Storage Format

NarrowDB uses a log-structured file format (magic: `NRWDB007`). Data is organized into **row groups** — columnar chunks of `row_group_size` rows each.

Each row group contains:
- Per-column min/max statistics (zone maps) for query-time pruning
- Per-column null bitmaps
- Compressed column data (LZ4 block compression)

### Column Encoding

| Type | Encoding |
|------|----------|
| Int64 / Timestamp | Delta encoding (base + u8/u16/u32 offsets depending on range) + LZ4 |
| Float64 | Raw IEEE 754 + LZ4 |
| Bool | Packed bit-vector + LZ4 |
| String (high cardinality) | Plain bytes + LZ4 |
| String (low cardinality) | Dictionary encoding (auto-detected) + LZ4 |

Dictionary encoding is applied automatically when a string column has unique values <= 50% of the row count in a row group.

### Query Engine

- **Zone map pruning** — row groups are skipped entirely when min/max stats prove no rows can match the filter
- **Projection pushdown** — only columns referenced by the query are decompressed
- **Vectorized bitmap filters** — filters produce packed u64 bitmaps per column, enabling compiler auto-vectorization
- **Dictionary-aware filtering** — equality filters on dictionary columns compare integer codes instead of strings
- **Parallel row group scanning** — rayon `par_iter` across row groups for all query types when >= 4 row groups
- **Array-indexed GROUP BY** — single-column dictionary GROUP BY uses direct array indexing instead of hash maps