XY-Cut++
High-performance document reading order detection for complex layouts. Implements the XY-Cut++ algorithm with hierarchical mask mechanism for accurate layout ordering in multi-column documents, newspapers, and academic papers.
Paper: XY-Cut++: Advanced Layout Ordering via Hierarchical Mask Mechanism
Authors: Shuai Liu, Youmeng Li, Jizeng Wei (Tianjin University)
Features
- State-of-the-art accuracy: 98.8% BLEU score on DocBench-100 benchmark
- Fast: 514 FPS average (1.06× faster than geometric-only methods)
- Zero-copy design: Efficient memory usage with trait-based abstractions
- Safe Rust: 100% safe code with no
unsafeblocks - Complex layout support: Handles multi-column, nested, and cross-page elements
- Semantic-aware: Uses shallow semantic labels (titles, figures, tables) to improve ordering
Quick Start
Add to your Cargo.toml:
[]
= "0.1"
Basic Example
use ;
// 1. Implement BoundingBox for your element type
// 2. Create elements from your layout detection
let elements = vec!;
// 3. Compute reading order
let xycut = new;
let page_bounds = ; // (x_min, y_min, x_max, y_max)
let ordered_ids = xycut.compute_order;
// ordered_ids = [0, 1, ...] in correct reading order
for id in ordered_ids
Algorithm Overview
XY-Cut++ extends the classic XY-Cut algorithm with three key innovations:
-
Pre-Mask Processing (Equations 1-3): Identifies and temporarily masks high-dynamic-range elements (titles, figures, tables) to prevent segmentation errors
-
Multi-Granularity Segmentation (Equations 4-5): Adaptively switches between horizontal-first and vertical-first cutting based on content density ratio τd
-
Cross-Modal Matching (Equations 7-10): Reintegrates masked elements using geometry-semantic fusion with 4-component distance metric
┌─────────────────────────────────────────────┐
│ Layout Detection (PP-DocLayout, etc.) │
└──────────────────┬──────────────────────────┘
│
┌──────────▼────────────────┐
│ Pre-Mask Processing │ (Eq 1-3)
│ • Adaptive threshold │
│ • Cross-layout detection │
└──────────┬────────────────┘
│
┌──────────────▼──────────────────┐
│ Multi-Granularity Segmentation │ (Eq 4-5)
│ • Density-driven axis selection │
│ • Recursive XY/YX-Cut │
└──────────────┬──────────────────┘
│
┌──────────▼────────────┐
│ Cross-Modal Matching │ (Eq 7-10)
│ • Semantic filtering │
│ • Distance metric │
└──────────┬────────────┘
│
┌──────▼────────┐
│ Reading Order │
└───────────────┘
Performance
DocBench-100 Benchmark (30 complex + 70 regular layouts):
| Method | Complex BLEU-4 | Regular BLEU-4 | Overall | FPS |
|---|---|---|---|---|
| XY-Cut | 74.9% | 81.8% | 79.7% | 685 |
| LayoutReader | 65.6% | 84.4% | 78.8% | 17 |
| MinerU | 70.1% | 94.6% | 87.3% | 10 |
| XY-Cut++ | 98.6% | 98.9% | 98.8% | 781 |
OmniDocBench (larger-scale evaluation):
| Layout Type | XY-Cut++ BLEU-4 | ARD ↓ | Tau ↑ |
|---|---|---|---|
| Single-column | 99.3% | 0.004 | 0.996 |
| Double-column | 95.1% | 0.027 | 0.974 |
| Three-column | 96.7% | 0.033 | 0.984 |
| Complex | 90.1% | 0.064 | 0.942 |
See paper Section 4.3 for full results.
Configuration
Customize behavior with XYCutConfig:
use XYCutConfig;
let config = XYCutConfig ;
let xycut = new;
Tuning Guidelines:
- min_cut_threshold: Increase (20-30) for documents with tight spacing; decrease (5-10) for loose layouts
- histogram_resolution_scale: Higher values (1.0) give finer granularity but slower performance
- same_row_tolerance: Match to your document's line spacing (typically 5-15px)
Use Cases
Perfect for:
- 📄 Academic paper parsing (multi-column PDFs)
- 📰 Newspaper digitization (complex layouts)
- 📚 Book/textbook conversion (varied structures)
- 🔍 RAG preprocessing (reading order matters!)
- 🤖 LLM data preparation (structured documents)
Integration Examples:
- With
pdfium-render: Extract pages → detect layout → order elements → OCR - With
tesseract: Pre-order regions before OCR for better context - With vector DBs: Maintain document structure in embeddings
API Documentation
Full API documentation available at docs.rs/xycut-plus-plus.
Key Types:
XYCutPlusPlus- Main algorithm structXYCutConfig- Configuration parametersBoundingBox- Trait for layout elements (must implement)SemanticLabel- Element type classification
Citation
If you use this implementation in research, please cite:
Contributing
Contributions welcome! Please:
- Open an issue before major changes
- Follow existing code style (run
cargo fmt) - Add tests for new features
- Update documentation as needed
Run tests and checks:
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Acknowledgments
Original algorithm by Shuai Liu, Youmeng Li, and Jizeng Wei at Tianjin University.
Rust implementation maintains 100% fidelity to the published paper (arXiv:2504.10258).