mnem-ingest
Ingest pipeline for mnem: converts external source artifacts (Markdown, plain text, and - in later sub-waves - PDFs and chat conversations) into content-addressed memory graphs.
Scope
This crate ships in five sub-waves (B5a – B5e). The current release (B5a) provides the pure, deterministic half of the pipeline:
- Markdown parser (
md::parse_markdown) - CommonMark + GFM with heading hierarchy preserved and code blocks kept atomic. - Text parser (
text::parse_text) - single-section pass-through. - Chunkers (
chunk::chunk):Paragraph- splits on blank lines.Recursive- token-budgeted sliding window with overlap.
The repository-commit path (Ingester::run) is stubbed until B5c.
Quick start
use ;
let md = "# Title\n\nFirst para.\n\n## Sub\n\nSecond para.";
let sections = parse_markdown.unwrap;
let chunks = chunk;
for c in chunks
License
Apache-2.0. See ../../LICENSE.