Expand description
virtual-frame — Deterministic data pipeline toolkit for LLM training.
Bitmask-filtered virtual views, NFA regex, Kahan summation, NLP primitives, CSV ingestion, and a deterministic RNG. Python bindings via PyO3.
Modules§
- bitmask
- Packed bitmask — one bit per row, 64-bit words.
- column
- Columnar storage — typed vectors, one per column.
- csv
- CSV ingestion:
CsvConfig,CsvReader, andStreamingCsvProcessor. - dataframe
- DataFrame — columnar storage with named columns.
- expr
- Expression system — predicates for filter, computed columns for mutate.
- kahan
- Kahan compensated summation — bit-identical results regardless of platform.
- nlp
- NLP primitives — string distance, n-grams, tokenization.
- regex_
engine - NFA-based regex engine — zero-dependency, deterministic, linear-time.
- rng
- Deterministic RNG — SplitMix64.
- tidyview
- TidyView — the virtual frame engine.