Expand description
Document ingestion for Engram (RML-928)
Provides document parsing, chunking, and ingestion into the memory store. Supported formats:
- Markdown (.md): Uses pulldown-cmark for parsing, extracts sections
- PDF (.pdf): Uses pdf-extract for text extraction by page
§Usage
ⓘ
use engram::intelligence::document_ingest::{DocumentIngestor, IngestConfig};
use engram::Storage;
let storage = Storage::open_in_memory()?;
let ingestor = DocumentIngestor::new(&storage);
let result = ingestor.ingest_file("docs/handbook.pdf", IngestConfig::default())?;
println!("Ingested {} chunks", result.chunks_created);Structs§
- Document
Chunk - A chunk ready for ingestion
- Document
Ingestor - Document ingestor
- Document
Section - A section extracted from a document
- Ingest
Config - Configuration for document ingestion
- Ingest
Result - Result of document ingestion
Enums§
- Document
Format - Document format
Constants§
- DEFAULT_
CHUNK_ SIZE - Default chunk size in characters
- DEFAULT_
MAX_ FILE_ SIZE - Maximum file size in bytes (10 MB default)
- DEFAULT_
OVERLAP - Default overlap between chunks in characters