Expand description
Handler for the ingest CLI subcommand.
Bulk-ingests every file under a directory that matches a glob pattern.
Each matched file is persisted as a separate memory using the same
validation, chunking, embedding and persistence pipeline as remember,
but executed in-process so the ONNX model is loaded only once per
invocation. This is the v1.0.32 Onda 4B (finding A2) refactor that
replaced a fork-spawn-per-file pipeline (every file paid the ~17s ONNX
cold-start cost) with an in-process loop reusing the warm embedder
(daemon when available, in-process Embedder::new otherwise).
Memory names are derived from file basenames (kebab-case, lowercase, ASCII alphanumerics + hyphens). Output is line-delimited JSON: one object per processed file (success or error), followed by a final summary object. Designed for streaming consumption by agents.
§Two-phase pipeline (v1.0.39)
Phase A runs on a rayon thread pool (size = --ingest-parallelism):
read + chunk + embed + NER per file, results stored in a pre-sized
Vec<Mutex<Option<Result<StagedFile>>>> indexed by submission order.
Phase B runs on the main thread sequentially by index: pulls each
StagedFile and writes to SQLite. Connection is not Sync so it
never crosses thread boundaries. NDJSON output order equals input order.