Skip to main content

Module ingest

Module ingest 

Source
Expand description

Handler for the ingest CLI subcommand.

Bulk-ingests every file under a directory that matches a glob pattern. Each matched file is persisted as a separate memory using the same validation, chunking, embedding and persistence pipeline as remember, but executed in-process so the ONNX model is loaded only once per invocation. This is the v1.0.32 Onda 4B (finding A2) refactor that replaced a fork-spawn-per-file pipeline (every file paid the ~17s ONNX cold-start cost) with an in-process loop reusing the warm embedder (daemon when available, in-process Embedder::new otherwise).

Memory names are derived from file basenames (kebab-case, lowercase, ASCII alphanumerics + hyphens). Output is line-delimited JSON: one object per processed file (success or error), followed by a final summary object. Designed for streaming consumption by agents.

§Two-phase pipeline (v1.0.39)

Phase A runs on a rayon thread pool (size = --ingest-parallelism): read + chunk + embed + NER per file, results stored in a pre-sized Vec<Mutex<Option<Result<StagedFile>>>> indexed by submission order.

Phase B runs on the main thread sequentially by index: pulls each StagedFile and writes to SQLite. Connection is not Sync so it never crosses thread boundaries. NDJSON output order equals input order.

Structs§

IngestArgs

Functions§

run