1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
//! Deterministic document extraction.
//!
//! A faithful port of the **parse / regex / rules** core of OpenHuman's
//! `memory/ingestion` pipeline. It takes raw unstructured text and recovers
//! structured knowledge with no model calls:
//!
//! 1. **Chunking** — split the document into manageable pieces ([`chunking`],
//! [`text`]).
//! 2. **Structured extraction** — regex rules for email headers, prefixed
//! fields, and explicit graph facts ([`regex`], [`parse_lines`],
//! [`parse_relations`]).
//! 3. **Heuristic extraction** — recipient / spatial relations over extraction
//! units ([`parse_units`]).
//! 4. **Aggregation** — alias resolution, dedup, thresholding ([`alias`],
//! [`aggregate`], [`rules`]).
//!
//! ## Ownership boundary
//!
//! TinyCortex does **not** own the namespace document/graph store, so the
//! `impl UnifiedMemory` glue (document upsert + graph relation writes) and the
//! live-ingestion singleton runner state are intentionally **not** ported. The
//! public [`extract_document`] entry point performs extraction only and returns
//! a fully populated [`MemoryIngestionResult`]; persistence is the host's job.
pub use extract_document;
pub use ;