Skip to main content

Module document

infiniloom_engine

Module document

Expand description

Document ingestion module for converting human-readable documents into LLM-optimized structured formats.

This module provides:

Type system: Document, Section, ContentBlock for representing document structure
Parsers: Format-specific parsers (Markdown, HTML, plain text, CSV, DOCX, PDF)
Distillation: Content compression pipeline that removes filler and optimizes for LLM attention
Output: Document-specific formatters for Claude (XML), GPT (Markdown), agents (JSON)

Re-exports§

pub use types::*;

Modules§

chunking: Document chunking for multi-turn LLM conversations.
distillation: Content distillation pipeline for LLM attention and token optimization.
output: Document-specific output formatters for LLM consumption.
parsers: Format-specific document parsers.
pii: PII (Personally Identifiable Information) detection for documents.
types: Core type definitions for document ingestion.

Structs§

ParseOptions: Options for document parsing.

Functions§

count_document_tokens: Count tokens for a document’s full text content across all model families.
count_output_tokens: Count tokens for formatted output text across all model families.
parse_content: Parse document content from a string with a known format.
parse_document: Parse a document from a file path, auto-detecting the format.