Expand description
Table of Contents (TOC) processing module.
This module provides functionality to extract and verify document structure from PDF Table of Contents:
- Detection — Find TOC in document (regex + LLM fallback)
- Parsing — Convert TOC text to structured entries (LLM)
- Assignment — Map TOC pages to physical pages
- Verification — Sample verification of page assignments
- Repair — Fix incorrect assignments
§Architecture
PDF Pages
│
▼
┌─────────────────────────────────────────────────┐
│ TocProcessor │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Detector │─▶│ Parser │─▶│Assigner │ │
│ └─────────┘ └─────────┘ └────┬────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Verifier │ │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Repairer │ │
│ └─────────────┘ │
└─────────────────────────────────────────────────┘
│
▼
Vec<TocEntry>§Example
use vectorless::parser::toc::TocProcessor;
use vectorless::parser::pdf::{PdfParser, PdfPage};
// Parse PDF
let pdf_parser = PdfParser::new();
let result = pdf_parser.parse_file("document.pdf".as_ref())?;
// Extract TOC
let processor = TocProcessor::new();
let entries = processor.process(&result.pages).await?;
// Use entries
for entry in &entries {
println!("{} - Page {:?}", entry.title, entry.physical_page);
}Structs§
- Index
Repairer - Index repairer - fixes incorrect page assignments.
- Index
Verifier - Index verifier - verifies that TOC entries point to correct pages.
- Page
Assigner - Page assigner - assigns physical page numbers to TOC entries.
- Page
Assigner Config - Page assigner configuration.
- Page
Offset - Page offset calculation result.
- Repairer
Config - Repairer configuration.
- TocDetection
- Result of TOC detection.
- TocDetector
- TOC detector - finds table of contents in PDF documents.
- TocDetector
Config - TOC detector configuration.
- TocEntry
- A single TOC entry.
- TocParser
- TOC parser - converts raw TOC text to structured entries.
- TocParser
Config - TOC parser configuration.
- TocProcessor
- TOC processor - orchestrates the complete TOC extraction pipeline.
- TocProcessor
Config - TOC processor configuration.
- Verification
Error - Verification error for a single entry.
- Verification
Report - Result of TOC verification.
- Verifier
Config - Verifier configuration.
Enums§
- Error
Type - Type of verification error.