Module ingest

Expand description

Handler for the ingest CLI subcommand.

Bulk-ingests every file under a directory that matches a glob pattern. Each matched file is persisted as a separate memory using the same validation, chunking, embedding and persistence pipeline as remember, but executed in-process so the ONNX model is loaded only once per invocation. This is the v1.0.32 Onda 4B (finding A2) refactor that replaced a fork-spawn-per-file pipeline (every file paid the ~17s ONNX cold-start cost) with an in-process loop reusing the warm embedder (daemon when available, in-process Embedder::new otherwise).

Memory names are derived from file basenames (kebab-case, lowercase, ASCII alphanumerics + hyphens). Output is line-delimited JSON: one object per processed file (success or error), followed by a final summary object. Designed for streaming consumption by agents.

§Two-phase pipeline (v1.0.39)

Phase A runs on a rayon thread pool (size = --ingest-parallelism): read + chunk + embed + NER per file, results stored in a pre-sized Vec<Mutex<Option<Result<StagedFile>>>> indexed by submission order.

Phase B runs on the main thread sequentially by index: pulls each StagedFile and writes to SQLite. Connection is not Sync so it never crosses thread boundaries. NDJSON output order equals input order.

Structs§

IngestArgs

Functions§

run

Module ingest

Module ingest Copy item path

§Two-phase pipeline (v1.0.39)

Structs§

Functions§

Module ingest