Skip to main content

Module toc

Module toc 

Source
Expand description

Table of Contents (TOC) processing module.

This module provides functionality to extract and verify document structure from PDF Table of Contents:

  • Detection — Find TOC in document (regex + LLM fallback)
  • Parsing — Convert TOC text to structured entries (LLM)
  • Assignment — Map TOC pages to physical pages
  • Verification — Sample verification of page assignments
  • Repair — Fix incorrect assignments

§Architecture

PDF Pages
    │
    ▼
┌─────────────────────────────────────────────────┐
│              TocProcessor                        │
│                                                  │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐         │
│  │Detector │─▶│ Parser  │─▶│Assigner │         │
│  └─────────┘  └─────────┘  └────┬────┘         │
│                                │                │
│                                ▼                │
│                         ┌─────────────┐         │
│                         │  Verifier   │         │
│                         └──────┬──────┘         │
│                                │                │
│                                ▼                │
│                         ┌─────────────┐         │
│                         │  Repairer   │         │
│                         └─────────────┘         │
└─────────────────────────────────────────────────┘
    │
    ▼
Vec<TocEntry>

§Example

use vectorless::parser::toc::TocProcessor;
use vectorless::parser::pdf::{PdfParser, PdfPage};

// Parse PDF
let pdf_parser = PdfParser::new();
let result = pdf_parser.parse_file("document.pdf".as_ref())?;

// Extract TOC
let processor = TocProcessor::new();
let entries = processor.process(&result.pages).await?;

// Use entries
for entry in &entries {
    println!("{} - Page {:?}", entry.title, entry.physical_page);
}

Structs§

IndexRepairer
Index repairer - fixes incorrect page assignments.
IndexVerifier
Index verifier - verifies that TOC entries point to correct pages.
PageAssigner
Page assigner - assigns physical page numbers to TOC entries.
PageAssignerConfig
Page assigner configuration.
PageOffset
Page offset calculation result.
RepairerConfig
Repairer configuration.
TocDetection
Result of TOC detection.
TocDetector
TOC detector - finds table of contents in PDF documents.
TocDetectorConfig
TOC detector configuration.
TocEntry
A single TOC entry.
TocParser
TOC parser - converts raw TOC text to structured entries.
TocParserConfig
TOC parser configuration.
TocProcessor
TOC processor - orchestrates the complete TOC extraction pipeline.
TocProcessorConfig
TOC processor configuration.
VerificationError
Verification error for a single entry.
VerificationReport
Result of TOC verification.
VerifierConfig
Verifier configuration.

Enums§

ErrorType
Type of verification error.