Skip to main content

Module document_ingest

Module document_ingest 

Source
Expand description

Document ingestion for Engram (RML-928)

Provides document parsing, chunking, and ingestion into the memory store. Supported formats:

  • Markdown (.md): Uses pulldown-cmark for parsing, extracts sections
  • PDF (.pdf): Uses pdf-extract for text extraction by page

§Usage

use engram::intelligence::document_ingest::{DocumentIngestor, IngestConfig};
use engram::Storage;

let storage = Storage::open_in_memory()?;
let ingestor = DocumentIngestor::new(&storage);

let result = ingestor.ingest_file("docs/handbook.pdf", IngestConfig::default())?;
println!("Ingested {} chunks", result.chunks_created);

Structs§

DocumentChunk
A chunk ready for ingestion
DocumentIngestor
Document ingestor
DocumentSection
A section extracted from a document
IngestConfig
Configuration for document ingestion
IngestResult
Result of document ingestion

Enums§

DocumentFormat
Document format

Constants§

DEFAULT_CHUNK_SIZE
Default chunk size in characters
DEFAULT_MAX_FILE_SIZE
Maximum file size in bytes (10 MB default)
DEFAULT_OVERLAP
Default overlap between chunks in characters