kimberlite-query 0.7.0

SQL query layer for Kimberlite projections
Documentation

kmb-query: SQL query layer for Kimberlite projections

This crate provides a minimal SQL query engine for compliance lookups against the projection store.

SQL Subset

Supported SQL features:

  • SELECT with column list or *

  • FROM single table or JOIN (INNER, LEFT, RIGHT, FULL OUTER, CROSS, with ON or USING(...) clauses)

  • WHERE with comparison predicates (=, <, >, <=, >=, IN)

  • IN (SELECT), NOT IN (SELECT), EXISTS, NOT EXISTS — both uncorrelated (pre-execute fast path) and correlated (semi-join decorrelation or nested-loop fallback; see docs/reference/sql/correlated-subqueries.md)

  • ORDER BY (ascending/descending)

  • LIMIT and OFFSET — literal or $N parameter

  • GROUP BY with aggregates (COUNT, SUM, AVG, MIN, MAX)

  • Aggregate FILTER (WHERE ...) clauses, independent per aggregate

  • HAVING with aggregate filtering

  • UNION / UNION ALL / INTERSECT / INTERSECT ALL / EXCEPT / EXCEPT ALL

  • DISTINCT

  • JSON operators ->, ->>, @> in WHERE clauses

  • CASE (searched and simple form)

  • WITH (Common Table Expressions / CTEs), including WITH RECURSIVE via iterative fixed-point evaluation (depth cap 1000)

  • Subqueries in FROM and JOIN (SELECT * FROM (SELECT ...) AS t)

  • Window functions (OVER, PARTITION BY, ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, FIRST_VALUE, LAST_VALUE)

  • ALTER TABLE (ADD COLUMN, DROP COLUMN) — parser only; kernel execution pending

  • Parameterized queries ($1, $2, ...) in WHERE, LIMIT, OFFSET, and DML values

  • Scalar functions in SELECT projection and WHERE predicates: UPPER, LOWER, LENGTH, TRIM, CONCAT, ||, ABS, ROUND, CEIL/CEILING, FLOOR, COALESCE, NULLIF, CAST

  • ILIKE, NOT LIKE, NOT ILIKE pattern matching

  • NOT IN (list), NOT BETWEEN low AND high

Not yet supported:

  • Scalar subquery WHERE col = (SELECT ...), ANY, ALL, SOME
  • Clock-dependent functions (NOW(), CURRENT_DATE, EXTRACT, DATE_TRUNC) — deferred pending a clock-threading decision
  • MOD, POWER, SQRT, SUBSTRING — deferred

Usage

use kimberlite_query::{QueryEngine, Schema, SchemaBuilder, ColumnDef, DataType, Value};
use kimberlite_store::{BTreeStore, TableId};

// Define schema
let schema = SchemaBuilder::new()
    .table(
        "users",
        TableId::new(1),
        vec![
            ColumnDef::new("id", DataType::BigInt).not_null(),
            ColumnDef::new("name", DataType::Text).not_null(),
        ],
        vec!["id".into()],
    )
    .build();

// Create engine
let engine = QueryEngine::new(schema);

// Execute query
let mut store = BTreeStore::open("data/projections")?;
let result = engine.query(&mut store, "SELECT * FROM users WHERE id = $1", &[Value::BigInt(42)])?;

Point-in-Time Queries

For compliance, you can query at a specific log position:

let result = engine.query_at(
    &mut store,
    "SELECT * FROM users WHERE id = 1",
    &[],
    Offset::new(1000),  // Query state as of log position 1000
)?;

Scalar expressions (v0.5.1)

The parser accepts scalar functions in SELECT projection and WHERE predicates. Each of these queries parses cleanly and produces a ParsedStatement::Select with either a ScalarCmp predicate or entries in scalar_projections:

use kimberlite_query::{parse_statement, ParsedStatement, Predicate};

// WHERE col NOT IN (list) — mirror of IN, v0.5.1.
let s = parse_statement("SELECT id FROM t WHERE x NOT IN (1, 2, 3)").unwrap();
let ParsedStatement::Select(sel) = s else { panic!() };
assert!(matches!(sel.predicates[0], Predicate::NotIn(_, _)));

// WHERE UPPER(name) = 'ALICE' — scalar LHS routes to ScalarCmp.
let s = parse_statement("SELECT id FROM t WHERE UPPER(name) = 'ALICE'").unwrap();
let ParsedStatement::Select(sel) = s else { panic!() };
assert!(matches!(sel.predicates[0], Predicate::ScalarCmp { .. }));

// SELECT col AS alias — alias preserved end-to-end.
let s = parse_statement("SELECT name AS display FROM t").unwrap();
let ParsedStatement::Select(sel) = s else { panic!() };
let aliases = sel.column_aliases.as_ref().unwrap();
assert_eq!(aliases[0].as_deref(), Some("display"));

// SELECT CAST(x AS INTEGER) — lands in scalar_projections with a
// synthesised output column name.
let s = parse_statement("SELECT CAST(x AS INTEGER) FROM t").unwrap();
let ParsedStatement::Select(sel) = s else { panic!() };
assert_eq!(sel.scalar_projections.len(), 1);
assert_eq!(sel.scalar_projections[0].output_name.as_str(), "cast");