Expand description
superbook-pdf - High-quality PDF converter for scanned books
A complete Rust implementation for converting scanned book PDFs into high-quality digital books with AI enhancement.
§Features
- PDF Reading (
pdf_reader) - Extract metadata, pages, and images from PDFs - PDF Writing (
pdf_writer) - Generate PDFs from images with optional OCR layer - Image Extraction (
image_extract) - Extract page images usingImageMagick - AI Enhancement (
realesrgan) - Upscale images usingRealESRGAN - Deskew Correction (
deskew) - Detect and correct page skew - Margin Detection (
margin) - Detect and trim page margins - Page Number Detection (
page_number) - OCR-based page number recognition - AI Bridge (
ai_bridge) - Python subprocess bridge for AI tools YomiTokuOCR (yomitoku) - Japanese AI-OCR for searchable PDFs
§Quick Start
§Reading a PDF
use superbook_pdf::{LopdfReader, PdfWriterOptions, PrintPdfWriter};
// Read a PDF
let reader = LopdfReader::new("input.pdf").unwrap();
println!("Pages: {}", reader.info.page_count);§Using Builder Patterns
All option structs support fluent builder patterns:
use superbook_pdf::{PdfWriterOptions, DeskewOptions, RealEsrganOptions};
// PDF Writer options
let pdf_opts = PdfWriterOptions::builder()
.dpi(600)
.jpeg_quality(95)
.build();
// Or use presets
let high_quality = PdfWriterOptions::high_quality();
let compact = PdfWriterOptions::compact();
// Deskew options
let deskew_opts = DeskewOptions::builder()
.max_angle(15.0)
.build();
// RealESRGAN options
let upscale_opts = RealEsrganOptions::builder()
.scale(4)
.tile_size(256)
.build();§Architecture
The library is organized into independent modules that can be used separately:
PDF Input -> Image Extraction -> Deskew -> Margin Detection
|
AI Upscaling (RealESRGAN)
|
Page Number Detection -> OCR -> PDF Output§Error Handling
Each module has its own error type that can be matched for specific handling:
PdfReaderError- PDF reading errorsPdfWriterError- PDF writing errorsExtractError- Image extraction errorsDeskewError- Deskew processing errorsMarginError- Margin detection errorsPageNumberError- Page number detection errorsAiBridgeError- AI tool communication errorsRealEsrganError-RealESRGANupscaling errorsYomiTokuError-YomiTokuOCR errors
§CLI Exit Codes
Use ExitCode for type-safe exit code handling:
use superbook_pdf::ExitCode;
let code = ExitCode::Success;
assert_eq!(code.code(), 0);
assert_eq!(code.description(), "Success");§Error Handling Example
use superbook_pdf::{PdfReaderError, MarginError, DeskewError};
use std::path::PathBuf;
fn handle_pdf_error(err: PdfReaderError) -> String {
match err {
PdfReaderError::FileNotFound(path) => format!("File not found: {}", path.display()),
PdfReaderError::InvalidFormat(msg) => format!("Invalid PDF: {}", msg),
PdfReaderError::EncryptedPdf => "Encrypted PDFs are not supported".to_string(),
_ => format!("Other error: {}", err),
}
}
let err = PdfReaderError::FileNotFound(PathBuf::from("/test.pdf"));
assert!(handle_pdf_error(err).contains("/test.pdf"));§License
AGPL-3.0
Re-exports§
pub use ai_bridge::AiBridgeConfig;pub use ai_bridge::AiBridgeConfigBuilder;pub use ai_bridge::AiBridgeError;pub use ai_bridge::AiTool;pub use ai_bridge::SubprocessBridge;pub use cli::create_page_progress_bar;pub use cli::create_progress_bar;pub use cli::create_spinner;pub use cli::CacheInfoArgs;pub use cli::Cli;pub use cli::Commands;pub use cli::ConvertArgs;pub use cli::ExitCode;pub use cli::ReprocessArgs;pub use config::AdvancedConfig;pub use config::CliOverrides;pub use config::Config;pub use config::ConfigError;pub use config::GeneralConfig;pub use config::OcrConfig;pub use config::OutputConfig;pub use config::ProcessingConfig;pub use deskew::DeskewAlgorithm;pub use deskew::DeskewError;pub use deskew::DeskewOptions;pub use deskew::DeskewOptionsBuilder;pub use deskew::DeskewResult;pub use deskew::ImageProcDeskewer;pub use deskew::QualityMode;pub use deskew::SkewDetection;pub use image_extract::ColorSpace;pub use image_extract::ExtractError;pub use image_extract::ExtractOptions;pub use image_extract::ExtractOptionsBuilder;pub use image_extract::ExtractedPage;pub use image_extract::ImageFormat;pub use image_extract::LopdfExtractor;pub use image_extract::MagickExtractor;pub use margin::ContentDetectionMode;pub use margin::ContentRect;pub use margin::GroupCropAnalyzer;pub use margin::GroupCropRegion;pub use margin::ImageMarginDetector;pub use margin::MarginDetection;pub use margin::MarginError;pub use margin::MarginOptions;pub use margin::MarginOptionsBuilder;pub use margin::Margins;pub use margin::PageBoundingBox;pub use margin::TrimResult;pub use margin::UnifiedCropRegions;pub use margin::UnifiedMargins;pub use page_number::calc_group_reference_position;pub use page_number::calc_overlap_center;pub use page_number::find_page_number_with_fallback;pub use page_number::find_page_numbers_batch;pub use page_number::BookOffsetAnalysis;pub use page_number::DetectedPageNumber;pub use page_number::FallbackMatchStats;pub use page_number::MatchStage;pub use page_number::OffsetCorrection;pub use page_number::PageNumberAnalysis;pub use page_number::PageNumberCandidate;pub use page_number::PageNumberError;pub use page_number::PageNumberMatch;pub use page_number::PageNumberOptions;pub use page_number::PageNumberOptionsBuilder;pub use page_number::PageNumberPosition;pub use page_number::PageNumberRect;pub use page_number::PageOffsetAnalyzer;pub use page_number::PageOffsetResult;pub use page_number::Point;pub use page_number::Rectangle;pub use page_number::TesseractPageDetector;pub use pdf_reader::LopdfReader;pub use pdf_reader::PdfDocument;pub use pdf_reader::PdfMetadata;pub use pdf_reader::PdfPage;pub use pdf_reader::PdfReaderError;pub use pdf_writer::PdfWriterError;pub use pdf_writer::PdfWriterOptions;pub use pdf_writer::PdfWriterOptionsBuilder;pub use pdf_writer::PrintPdfWriter;pub use realesrgan::RealEsrgan;pub use realesrgan::RealEsrganError;pub use realesrgan::RealEsrganOptions;pub use realesrgan::RealEsrganOptionsBuilder;pub use reprocess::PageStatus;pub use reprocess::ReprocessError;pub use reprocess::ReprocessOptions;pub use reprocess::ReprocessResult;pub use reprocess::ReprocessState;pub use util::clamp;pub use util::ensure_dir_writable;pub use util::ensure_file_exists;pub use util::format_duration;pub use util::format_file_size;pub use util::load_image;pub use util::mm_to_pixels;pub use util::mm_to_points;pub use util::percentage;pub use util::pixels_to_mm;pub use util::points_to_mm;pub use yomitoku::BatchOcrResult;pub use yomitoku::OcrResult;pub use yomitoku::TextBlock;pub use yomitoku::TextDirection;pub use yomitoku::YomiToku;pub use yomitoku::YomiTokuError;pub use yomitoku::YomiTokuOptions;pub use yomitoku::YomiTokuOptionsBuilder;pub use color_stats::ColorAnalyzer;pub use color_stats::ColorStats;pub use color_stats::ColorStatsError;pub use color_stats::GlobalColorParam;pub use finalize::FinalizeError;pub use finalize::FinalizeOptions;pub use finalize::FinalizeOptionsBuilder;pub use finalize::FinalizeResult;pub use finalize::PageFinalizer;pub use normalize::ImageNormalizer;pub use normalize::NormalizeError;pub use normalize::NormalizeOptions;pub use normalize::NormalizeOptionsBuilder;pub use normalize::NormalizeResult;pub use normalize::PaddingMode;pub use normalize::PaperColor;pub use normalize::Resampler;pub use vertical_detect::detect_book_vertical_writing;pub use vertical_detect::detect_vertical_probability;pub use vertical_detect::BookVerticalResult;pub use vertical_detect::VerticalDetectError;pub use vertical_detect::VerticalDetectOptions;pub use vertical_detect::VerticalDetectResult;pub use parallel::parallel_map;pub use parallel::parallel_process;pub use parallel::ParallelError;pub use parallel::ParallelOptions;pub use parallel::ParallelProcessor;pub use parallel::ParallelResult;pub use progress::build_progress_bar;pub use progress::OutputMode;pub use progress::ProcessingStage;pub use progress::ProgressTracker;pub use cache::should_skip_processing;pub use cache::CacheDigest;pub use cache::ProcessingCache;pub use cache::ProcessingResult;pub use cache::CACHE_EXTENSION;pub use cache::CACHE_VERSION;pub use pipeline::calculate_optimal_chunk_size;pub use pipeline::process_in_chunks;pub use pipeline::PdfPipeline;pub use pipeline::PipelineConfig;pub use pipeline::PipelineError;pub use pipeline::PipelineResult;pub use pipeline::ProcessingContext;pub use pipeline::ProgressCallback;pub use pipeline::SilentProgress;
Modules§
- ai_
bridge - AI Tools Bridge module
- cache
- Processing cache module for smart re-processing skip
- cli
- CLI interface module
- color_
stats - Color Statistics and Global Color Adjustment module
- config
- Configuration file support for superbook-pdf
- deskew
- Deskew (Skew Correction) module
- exit_
codes - Exit codes for CLI (deprecated: prefer using
ExitCodeenum) - finalize
- Final Output Processing module
- image_
extract - Image Extraction module
- margin
- Margin Detection & Trimming module
- normalize
- Resolution Normalization module
- page_
number - Page Number Detection module
- parallel
- Parallel processing utilities for image pipeline
- pdf_
reader - PDF Reader module
- pdf_
writer - PDF Writer module
- pipeline
- Pipeline processing module
- progress
- Progress tracking module for PDF processing.
- realesrgan
- RealESRGAN Integration module
- reprocess
- Partial Reprocessing module
- util
- Common utilities for superbook-pdf
- vertical_
detect - Vertical text detection for Japanese books
- yomitoku
- YomiToku Japanese AI-OCR module