Skip to main content

Crate superbook_pdf

Crate superbook_pdf 

Source
Expand description

superbook-pdf - High-quality PDF converter for scanned books

A complete Rust implementation for converting scanned book PDFs into high-quality digital books with AI enhancement.

§Features

  • PDF Reading (pdf_reader) - Extract metadata, pages, and images from PDFs
  • PDF Writing (pdf_writer) - Generate PDFs from images with optional OCR layer
  • Image Extraction (image_extract) - Extract page images using ImageMagick
  • AI Enhancement (realesrgan) - Upscale images using RealESRGAN
  • Deskew Correction (deskew) - Detect and correct page skew
  • Margin Detection (margin) - Detect and trim page margins
  • Page Number Detection (page_number) - OCR-based page number recognition
  • AI Bridge (ai_bridge) - Python subprocess bridge for AI tools
  • YomiToku OCR (yomitoku) - Japanese AI-OCR for searchable PDFs

§Quick Start

§Reading a PDF

use superbook_pdf::{LopdfReader, PdfWriterOptions, PrintPdfWriter};

// Read a PDF
let reader = LopdfReader::new("input.pdf").unwrap();
println!("Pages: {}", reader.info.page_count);

§Using Builder Patterns

All option structs support fluent builder patterns:

use superbook_pdf::{PdfWriterOptions, DeskewOptions, RealEsrganOptions};

// PDF Writer options
let pdf_opts = PdfWriterOptions::builder()
    .dpi(600)
    .jpeg_quality(95)
    .build();

// Or use presets
let high_quality = PdfWriterOptions::high_quality();
let compact = PdfWriterOptions::compact();

// Deskew options
let deskew_opts = DeskewOptions::builder()
    .max_angle(15.0)
    .build();

// RealESRGAN options
let upscale_opts = RealEsrganOptions::builder()
    .scale(4)
    .tile_size(256)
    .build();

§Architecture

The library is organized into independent modules that can be used separately:

PDF Input -> Image Extraction -> Deskew -> Margin Detection
                                   |
                           AI Upscaling (RealESRGAN)
                                   |
                        Page Number Detection -> OCR -> PDF Output

§Error Handling

Each module has its own error type that can be matched for specific handling:

§CLI Exit Codes

Use ExitCode for type-safe exit code handling:

use superbook_pdf::ExitCode;

let code = ExitCode::Success;
assert_eq!(code.code(), 0);
assert_eq!(code.description(), "Success");

§Error Handling Example

use superbook_pdf::{PdfReaderError, MarginError, DeskewError};
use std::path::PathBuf;

fn handle_pdf_error(err: PdfReaderError) -> String {
    match err {
        PdfReaderError::FileNotFound(path) => format!("File not found: {}", path.display()),
        PdfReaderError::InvalidFormat(msg) => format!("Invalid PDF: {}", msg),
        PdfReaderError::EncryptedPdf => "Encrypted PDFs are not supported".to_string(),
        _ => format!("Other error: {}", err),
    }
}

let err = PdfReaderError::FileNotFound(PathBuf::from("/test.pdf"));
assert!(handle_pdf_error(err).contains("/test.pdf"));

§License

AGPL-3.0

Re-exports§

pub use ai_bridge::AiBridgeConfig;
pub use ai_bridge::AiBridgeConfigBuilder;
pub use ai_bridge::AiBridgeError;
pub use ai_bridge::AiTool;
pub use ai_bridge::SubprocessBridge;
pub use cli::create_page_progress_bar;
pub use cli::create_progress_bar;
pub use cli::create_spinner;
pub use cli::CacheInfoArgs;
pub use cli::Cli;
pub use cli::Commands;
pub use cli::ConvertArgs;
pub use cli::ExitCode;
pub use cli::ReprocessArgs;
pub use config::AdvancedConfig;
pub use config::CliOverrides;
pub use config::Config;
pub use config::ConfigError;
pub use config::GeneralConfig;
pub use config::OcrConfig;
pub use config::OutputConfig;
pub use config::ProcessingConfig;
pub use deskew::DeskewAlgorithm;
pub use deskew::DeskewError;
pub use deskew::DeskewOptions;
pub use deskew::DeskewOptionsBuilder;
pub use deskew::DeskewResult;
pub use deskew::ImageProcDeskewer;
pub use deskew::QualityMode;
pub use deskew::SkewDetection;
pub use image_extract::ColorSpace;
pub use image_extract::ExtractError;
pub use image_extract::ExtractOptions;
pub use image_extract::ExtractOptionsBuilder;
pub use image_extract::ExtractedPage;
pub use image_extract::ImageFormat;
pub use image_extract::LopdfExtractor;
pub use image_extract::MagickExtractor;
pub use margin::ContentDetectionMode;
pub use margin::ContentRect;
pub use margin::GroupCropAnalyzer;
pub use margin::GroupCropRegion;
pub use margin::ImageMarginDetector;
pub use margin::MarginDetection;
pub use margin::MarginError;
pub use margin::MarginOptions;
pub use margin::MarginOptionsBuilder;
pub use margin::Margins;
pub use margin::PageBoundingBox;
pub use margin::TrimResult;
pub use margin::UnifiedCropRegions;
pub use margin::UnifiedMargins;
pub use page_number::calc_group_reference_position;
pub use page_number::calc_overlap_center;
pub use page_number::find_page_number_with_fallback;
pub use page_number::find_page_numbers_batch;
pub use page_number::BookOffsetAnalysis;
pub use page_number::DetectedPageNumber;
pub use page_number::FallbackMatchStats;
pub use page_number::MatchStage;
pub use page_number::OffsetCorrection;
pub use page_number::PageNumberAnalysis;
pub use page_number::PageNumberCandidate;
pub use page_number::PageNumberError;
pub use page_number::PageNumberMatch;
pub use page_number::PageNumberOptions;
pub use page_number::PageNumberOptionsBuilder;
pub use page_number::PageNumberPosition;
pub use page_number::PageNumberRect;
pub use page_number::PageOffsetAnalyzer;
pub use page_number::PageOffsetResult;
pub use page_number::Point;
pub use page_number::Rectangle;
pub use page_number::TesseractPageDetector;
pub use pdf_reader::LopdfReader;
pub use pdf_reader::PdfDocument;
pub use pdf_reader::PdfMetadata;
pub use pdf_reader::PdfPage;
pub use pdf_reader::PdfReaderError;
pub use pdf_writer::PdfWriterError;
pub use pdf_writer::PdfWriterOptions;
pub use pdf_writer::PdfWriterOptionsBuilder;
pub use pdf_writer::PrintPdfWriter;
pub use realesrgan::RealEsrgan;
pub use realesrgan::RealEsrganError;
pub use realesrgan::RealEsrganOptions;
pub use realesrgan::RealEsrganOptionsBuilder;
pub use reprocess::PageStatus;
pub use reprocess::ReprocessError;
pub use reprocess::ReprocessOptions;
pub use reprocess::ReprocessResult;
pub use reprocess::ReprocessState;
pub use util::clamp;
pub use util::ensure_dir_writable;
pub use util::ensure_file_exists;
pub use util::format_duration;
pub use util::format_file_size;
pub use util::load_image;
pub use util::mm_to_pixels;
pub use util::mm_to_points;
pub use util::percentage;
pub use util::pixels_to_mm;
pub use util::points_to_mm;
pub use yomitoku::BatchOcrResult;
pub use yomitoku::OcrResult;
pub use yomitoku::TextBlock;
pub use yomitoku::TextDirection;
pub use yomitoku::YomiToku;
pub use yomitoku::YomiTokuError;
pub use yomitoku::YomiTokuOptions;
pub use yomitoku::YomiTokuOptionsBuilder;
pub use color_stats::ColorAnalyzer;
pub use color_stats::ColorStats;
pub use color_stats::ColorStatsError;
pub use color_stats::GlobalColorParam;
pub use finalize::FinalizeError;
pub use finalize::FinalizeOptions;
pub use finalize::FinalizeOptionsBuilder;
pub use finalize::FinalizeResult;
pub use finalize::PageFinalizer;
pub use normalize::ImageNormalizer;
pub use normalize::NormalizeError;
pub use normalize::NormalizeOptions;
pub use normalize::NormalizeOptionsBuilder;
pub use normalize::NormalizeResult;
pub use normalize::PaddingMode;
pub use normalize::PaperColor;
pub use normalize::Resampler;
pub use vertical_detect::detect_book_vertical_writing;
pub use vertical_detect::detect_vertical_probability;
pub use vertical_detect::BookVerticalResult;
pub use vertical_detect::VerticalDetectError;
pub use vertical_detect::VerticalDetectOptions;
pub use vertical_detect::VerticalDetectResult;
pub use parallel::parallel_map;
pub use parallel::parallel_process;
pub use parallel::ParallelError;
pub use parallel::ParallelOptions;
pub use parallel::ParallelProcessor;
pub use parallel::ParallelResult;
pub use progress::build_progress_bar;
pub use progress::OutputMode;
pub use progress::ProcessingStage;
pub use progress::ProgressTracker;
pub use cache::should_skip_processing;
pub use cache::CacheDigest;
pub use cache::ProcessingCache;
pub use cache::ProcessingResult;
pub use cache::CACHE_EXTENSION;
pub use cache::CACHE_VERSION;
pub use pipeline::calculate_optimal_chunk_size;
pub use pipeline::process_in_chunks;
pub use pipeline::PdfPipeline;
pub use pipeline::PipelineConfig;
pub use pipeline::PipelineError;
pub use pipeline::PipelineResult;
pub use pipeline::ProcessingContext;
pub use pipeline::ProgressCallback;
pub use pipeline::SilentProgress;

Modules§

ai_bridge
AI Tools Bridge module
cache
Processing cache module for smart re-processing skip
cli
CLI interface module
color_stats
Color Statistics and Global Color Adjustment module
config
Configuration file support for superbook-pdf
deskew
Deskew (Skew Correction) module
exit_codes
Exit codes for CLI (deprecated: prefer using ExitCode enum)
finalize
Final Output Processing module
image_extract
Image Extraction module
margin
Margin Detection & Trimming module
normalize
Resolution Normalization module
page_number
Page Number Detection module
parallel
Parallel processing utilities for image pipeline
pdf_reader
PDF Reader module
pdf_writer
PDF Writer module
pipeline
Pipeline processing module
progress
Progress tracking module for PDF processing.
realesrgan
RealESRGAN Integration module
reprocess
Partial Reprocessing module
util
Common utilities for superbook-pdf
vertical_detect
Vertical text detection for Japanese books
yomitoku
YomiToku Japanese AI-OCR module