1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
//! Structured data extraction from PDF text.
//!
//! This module provides advanced text analysis capabilities to automatically detect
//! and extract structured data patterns from PDF documents:
//!
//! - **Table Detection**: Automatically identifies tables by analyzing text alignment
//! and spatial positioning. Uses clustering algorithms to detect rows and columns.
//! - **Key-Value Pairs**: Extracts form fields and labeled data using pattern matching
//! for colon-separated, spatially-aligned, and tabular formats.
//! - **Multi-Column Layouts**: Detects column boundaries in multi-column text layouts
//! like newspapers or academic papers.
//!
//! # Architecture
//!
//! ```text
//! structured/
//! ├── types.rs - Core data types (Table, KeyValuePair, Config)
//! ├── detector.rs - Main detection engine with builder pattern
//! ├── table.rs - Table detection algorithm (clustering)
//! ├── keyvalue.rs - Key-value pair detection (pattern matching)
//! └── layout.rs - Multi-column layout detection
//! ```
//!
//! # Examples
//!
//! ```rust,no_run
//! use oxidize_pdf::text::structured::{StructuredDataDetector, StructuredDataConfig};
//! use oxidize_pdf::text::extraction::TextFragment;
//!
//! let config = StructuredDataConfig::default();
//! let detector = StructuredDataDetector::new(config);
//!
//! let fragments: Vec<TextFragment> = vec![]; // from PDF extraction
//! let result = detector.detect(&fragments)?;
//!
//! println!("Found {} tables", result.tables.len());
//! println!("Found {} key-value pairs", result.key_value_pairs.len());
//! # Ok::<(), Box<dyn std::error::Error>>(())
//! ```
pub use ;
pub use StructuredDataDetector;