kermit-ds 0.1.1

Data structures used in Kermit
Documentation

kermit-ds

Relation data structures for the Kermit workspace. Provides two trie-based implementations that store tuples of usize keys:

  • TreeTrie — a pointer-based trie where each node owns its sorted children. Simple and direct; preferable for small relations or pedagogical use.
  • ColumnTrie — a column-oriented (flattened) trie that stores each depth in parallel data/interval arrays. More compact and cache-friendly on large relations.

Both implement Relation and TrieIterable, so they're interchangeable in the join algorithms in kermit-algos.

Surface

  • Core traitsRelation, Projectable, RelationFileExt, HeapSize.
  • MetadataRelationHeader, ModelType, RelationError.
  • Data structuresTreeTrie, ColumnTrie, plus the IndexStructure CLI enum.

File loading

Any Relation automatically gains from_csv and from_parquet via the blanket RelationFileExt impl. CSV files must be usize-valued with a header row; Parquet files must be Int64-valued. Both extract the relation name from the file stem.

use kermit_ds::{RelationFileExt, TreeTrie};

let edges = TreeTrie::from_csv("edges.csv")?;

Extending

To add a new data structure:

  1. Implement Relation + TrieIterable + HeapSize in this crate.
  2. Add a variant to IndexStructure in src/ds/mod.rs.
  3. Wire it into the CLI dispatch (run_ds_bench / run_benchmark in ../kermit/src/main.rs).

See ARCHITECTURE.md for the design rationale behind the trie layouts.