Tetration
STILL IN DEVELOPMENT — layout v1 and query JSON may change before 1.0.
HDF5-shaped persistence (many large arrays in one durable file), Zarr-shaped chunking (regular grid, per-chunk compression, parallel I/O)—in a single mmap-friendly .tet file`, not a directory of shard blobs.
What it does today (v1)
- On-disk layout — superblock, dataset directory, chunk index, raw or zstd payloads (
docs/layout_v1.md). - Mmap + read planning — logical slices → chunk coordinates →
ReadPlan. - JSON query + execute — flat query documents, streaming reductions, tier-C stats, spill export (
docs/query_engine.md). - Import —
tet convertfrom HDF5, NetCDF, Zarr v3 directory stores. - CLI —
tet info,tet query,tet qhist,tet convert.
Dtypes on disk and in query execution: f32, f64, i32, i64.
Quick start
Requires Rust 1.95+ for cargo install; the tet binary installs to ~/.cargo/bin (or Homebrew’s prefix when using brew).
macOS — Homebrew (recommended)
One-time tap (this repo ships Formula/tetration.rb; pulls in HDF5 and NetCDF for tet convert):
Upgrade later: brew upgrade tetration.
From a local clone (no tap): brew install --build-from-source Formula/tetration.rb
cargo install
Default features need system HDF5 and NetCDF dev libraries (.h5 / .nc convert; Zarr v3 is Rust + bundled zstd):
| Platform | Typical packages |
|---|---|
| Debian / Ubuntu | libhdf5-dev, libnetcdf-dev, pkg-config, build-essential |
| macOS (Homebrew) | brew install hdf5 netcdf pkg-config |
| Windows | OpenSSL + NetCDF/HDF5 (e.g. vcpkg or conda-forge); see .github/scripts/ for CI hints |
Without HDF5/NetCDF libs: cargo install tetration --no-default-features — tet info / tet query on .tet files and Zarr import still work.
Build from source
# or: alias tet="$PWD/target/release/tet"
First commands
Daily driver: plan + execute with readable stdout:
Query JSON is flat (e.g. "mean": [], "spill": "slice.bin"); nested "operation" objects are rejected. Details: query document.
tet commands
Full flag lists: tet -h and tet <command> -h (always match the installed binary).
| Command | Alias | Role |
|---|---|---|
tet info <path.tet> |
— | Summarize a file (default: dataset table) |
tet query [QUERY] |
q |
Validate JSON; optional catalog + execute against -t |
tet qhist [list|run] |
hist |
Recent queries (platform cache; not the .tet footer) |
tet convert <in> <out.tet> |
— | HDF5 / NetCDF / Zarr v3 → .tet |
tet info
| Flag | Effect |
|---|---|
| (default) | Dataset catalog table |
--json |
Full pretty JSON (superblock, catalog, chunks, history) |
-q, --quiet |
One-line summary |
--all |
All text sections |
--layout / --execution / --datasets / --chunks / --history |
One section each (--history = convert footer; not qhist) |
-n, --limit N |
Max chunk rows with --chunks or --all (default 32; 0 = all) |
--dataset, --grep |
Case-insensitive filters on dataset name (and dtype for --grep) |
tet query
QUERY: path to .json, inline JSON, - for stdin, or omit to read stdin.
| Flag | Effect |
|---|---|
-t, --tet PATH |
Attach catalog / read plan (required for -x) |
-x, --execute |
Decode tiles, run operation, attach execution |
--format |
full (default), json, stats, plan, quiet |
-q, --quiet |
Shorthand for --format quiet (one-line stdout) |
--preview N |
Cap preview sample values when executing (--preview-f32 alias; default 64 for full/json, 0 for quiet/stats) |
--spill-allow DIR |
Extra spill roots (repeatable; needs -x and -t) |
tet qhist
Stored under the platform cache (query_history.jsonl), not in the .tet file. Env: TET_NO_QUERY_HISTORY, TET_QUERY_HISTORY_FILE, TET_QUERY_HISTORY_MAX. Details: GETTING_STARTED.md — qhist.
| Subcommand / flag | Effect |
|---|---|
list (default) |
Compact table of recent queries |
run N |
Re-run saved row (1 = newest in filtered view); honors today's --format / -q; -t / -x / --plan override |
--clear |
Remove the history file |
list --all, --dataset, --tet, --mode, --grep, --json |
Filters / full JSON export on list |
tet convert
| Input | Sniff / extensions |
|---|---|
| HDF5 | .h5, .hdf5, .hdf, .he2, .he5, or file signature |
| NetCDF | .nc, .netcdf, .nc4, .nc3, .cdf, or signature |
| Zarr v3 | Directory with root zarr.json |
| Flag | Effect |
|---|---|
--jobs N |
Parallel chunk read workers (0 = host available_parallelism, capped at 64) |
More examples and roadmap: GETTING_STARTED.md.
Documentation map
| Doc | Contents |
|---|---|
GETTING_STARTED.md |
Phased checklist, verification, CLI history, what's next |
docs/layout_v1.md |
Wire layout, superblock, chunk index, footer history |
docs/query_engine.md |
Planning, execution strategies, spill allowlist, JSON security |
fixtures/README.md |
Test tensors, convert fixtures, local bench sizes |
Design stance (short)
Partial I/O is the default case — mmap payload regions, touch only chunks that intersect the selection, parallel decode across disjoint tiles. Full-array loads into RAM are not required for planning or tier-A/B aggregates.
JSON is the control plane, not the storage encoding: hosts validate input, cap size, and enforce spill path policy (security notes).
Non-goals (v1): SQL-on-files, arbitrary codec plugins, GPU codecs in the file format. GPU use is “materialize on CPU (or spill), then copy to device” in bindings—see Phase 9 in GETTING_STARTED.md. Python wheels and a narrow C ABI are planned (Phase 10); the layout spec is the portable floor.
Library use
[]
= "0.1"
use *;
// or: tetration::layout::mmap_file_read, tetration::query::{parse_query_json, …}
Embedders get the full QueryResponse; the CLI uses format_query_response for stdout modes. Today: low-level writers (tetration::catalog) and query plan/execute (tetration::prelude + tetration::query); Phase 7 adds documented create + in-process execute workflows — see GETTING_STARTED.md — Phase 7.